Computational Methods for Haplotype Inference with Application to Haplotype Block Characterization in Cattle
Date
2009-07-02T17:10:47Z
Authors
Angulo, Rafael Villa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Genetic haplotype analysis is important in the identification of DNA variations relevant to several common and complex human diseases, and for the identification of Quantitative Trait Loci genes in animal models. Haplotype analysis is now considered one of the most promising methods for studying gene-disease and gene-phenotype association studies. In this dissertation, we address the problem of haplotype inference from cattle genotypes, which has significant differences with human genotype data. Using data derived by the International Bovine HapMap Consortium, we provide the first high-resolution haplotype block characterization in the cattle genome. In addition, a new genetic algorithm method for haplotype inference in large and complex pedigrees was developed. Novel results indicate that cattle and humans share high similarity in linkage disequilibrium and haplotype block structure in the scale of 1-100 kb. Effective populations size estimated from linkage disequilibrium reflects the period of domestication ~12,000 years ago, and the current bottleneck in breeds during the last ~700 years. Analysis of haplotype block density correlation, block boundary discordances, and haplotype sharing show clear differentiation between indicus, African, and composite breed subgroups, but not between dairy and beef subgoups. Our results support the hypothesis that historic geographic ancestry plays a stronger role in explaining genotypic variation, and haplotype block structure in cattle, than does the more recent selection into breeds with specific agriculture function. Another significant contribution from this dissertation is the development of new method for haplotype inference in large and complex cattle pedigrees. A new representation of the search space for valid haplotype configurations was developed, and a genetic algorithm was used to optimize features of the haplotype assignments. The genetic algorithm includes a novel population initialization method, new crossover and mutation operators, and a fitness function that minimizes the inferred recombinations in the pedigree. The new method outperformed the current available methods capable of handling large and complex pedigrees, and has the advantage of being scalable to larger datasets.
Description
Keywords
Haplotype inference, Cattle genome, Linkage disequilibrium, Genetic algorithms, Genotype analysis, Haplotype blocks