Wessel Jennifer, Schork Nicholas J
Polymorphism Research Laboratory, Department of Psychiatry, Divisions of Epidemiology, Center for Human Genetics and Genomics, University of California at San Diego, La Jolla, CA 92093-0603, USA.
Am J Hum Genet. 2006 Nov;79(5):792-806. doi: 10.1086/508346. Epub 2006 Sep 21.
Large-scale, multilocus genetic association studies require powerful and appropriate statistical-analysis tools that are designed to relate genotype and haplotype information to phenotypes of interest. Many analysis approaches consider relating allelic, haplotypic, or genotypic information to a trait through use of extensions of traditional analysis techniques, such as contingency-table analysis, regression methods, and analysis-of-variance techniques. In this work, we consider a complementary approach that involves the characterization and measurement of the similarity and dissimilarity of the allelic composition of a set of individuals' diploid genomes at multiple loci in the regions of interest. We describe a regression method that can be used to relate variation in the measure of genomic dissimilarity (or "distance") among a set of individuals to variation in their trait values. Weighting factors associated with functional or evolutionary conservation information of the loci can be used in the assessment of similarity. The proposed method is very flexible and is easily extended to complex multilocus-analysis settings involving covariates. In addition, the proposed method actually encompasses both single-locus and haplotype-phylogeny analysis methods, which are two of the most widely used approaches in genetic association analysis. We showcase the method with data described in the literature. Ultimately, our method is appropriate for high-dimensional genomic data and anticipates an era when cost-effective exhaustive DNA sequence data can be obtained for a large number of individuals, over and above genotype information focused on a few well-chosen loci.
大规模、多位点基因关联研究需要强大且合适的统计分析工具,这些工具旨在将基因型和单倍型信息与感兴趣的表型联系起来。许多分析方法考虑通过使用传统分析技术的扩展,如列联表分析、回归方法和方差分析技术,将等位基因、单倍型或基因型信息与性状联系起来。在这项工作中,我们考虑一种互补的方法,该方法涉及对感兴趣区域内多个位点上一组个体的二倍体基因组等位基因组成的相似性和差异性进行表征和测量。我们描述了一种回归方法,可用于将一组个体之间基因组差异(或“距离”)测量值的变化与其性状值的变化联系起来。与位点的功能或进化保守信息相关的加权因子可用于相似性评估。所提出的方法非常灵活,并且很容易扩展到涉及协变量的复杂多位点分析设置。此外,所提出的方法实际上涵盖了单一位点和单倍型系统发育分析方法,这是基因关联分析中使用最广泛的两种方法。我们用文献中描述的数据展示了该方法。最终,我们的方法适用于高维基因组数据,并预示着一个时代的到来,那时除了关注少数精心挑选位点的基因型信息外,还可以为大量个体获得具有成本效益的详尽DNA序列数据。