Nelson M R, Kardia S L, Ferrell R E, Sing C F
Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109-0618, USA.
Genome Res. 2001 Mar;11(3):458-70. doi: 10.1101/gr.172901.
Recent advances in genome research have accelerated the process of locating candidate genes and the variable sites within them and have simplified the task of genotype measurement. The development of statistical and computational strategies to utilize information on hundreds -- soon thousands -- of variable loci to investigate the relationships between genome variation and phenotypic variation has not kept pace, particularly for quantitative traits that do not follow simple Mendelian patterns of inheritance. We present here the combinatorial partitioning method (CPM) that examines multiple genes, each containing multiple variable loci, to identify partitions of multilocus genotypes that predict interindividual variation in quantitative trait levels. We illustrate this method with an application to plasma triglyceride levels collected on 188 males, ages 20--60 yr, ascertained without regard to health status, from Rochester, Minnesota. Genotype information included measurements at 18 diallelic loci in six coronary heart disease--candidate susceptibility gene regions: APOA1--C3--A4, APOB, APOE, LDLR, LPL, and PON1. To illustrate the CPM, we evaluated all possible partitions of two-locus genotypes into two to nine partitions (approximately 10(6) evaluations). We found that many combinations of loci are involved in sets of genotypic partitions that predict triglyceride variability and that the most predictive sets show nonadditivity. These results suggest that traditional methods of building multilocus models that rely on statistically significant marginal, single-locus effects, may fail to identify combinations of loci that best predict trait variability. The CPM offers a strategy for exploring the high-dimensional genotype state space so as to predict the quantitative trait variation in the population at large that does not require the conditioning of the analysis on a prespecified genetic model.
基因组研究的最新进展加速了寻找候选基因及其内部可变位点的过程,并简化了基因型测量的任务。利用数百个(很快将达到数千个)可变位点的信息来研究基因组变异与表型变异之间关系的统计和计算策略的发展却未能跟上步伐,尤其是对于不遵循简单孟德尔遗传模式的数量性状。我们在此介绍组合划分方法(CPM),该方法可检查多个基因,每个基因包含多个可变位点,以识别能够预测数量性状水平个体间变异的多位点基因型划分。我们通过应用该方法分析了从明尼苏达州罗切斯特市招募的188名年龄在20至60岁之间、未考虑健康状况的男性的血浆甘油三酯水平,来说明此方法。基因型信息包括在六个冠心病候选易感基因区域(APOA1 - C3 - A4、APOB、APOE、LDLR、LPL和PON1)的18个双等位基因位点的测量值。为了说明CPM,我们评估了两位点基因型的所有可能划分,划分为两到九个分区(约10^6次评估)。我们发现许多位点组合参与了预测甘油三酯变异性的基因型划分集合,并且最具预测性的集合显示出非加性。这些结果表明,依赖于具有统计学意义的边际单基因座效应来构建多位点模型的传统方法,可能无法识别出最能预测性状变异性的位点组合。CPM提供了一种探索高维基因型状态空间的策略,以便预测总体人群中的数量性状变异,而无需在预先指定的遗传模型基础上进行分析。