Department of Computer Science, Columbia University, 505 Computer Science Building, 1214 Amsterdam Ave.: Mailcode 0401, New York, NY 10027-7003, USA.
Hum Mol Genet. 2011 Feb 15;20(4):827-39. doi: 10.1093/hmg/ddq510. Epub 2010 Nov 30.
The potential benefits of using population isolates in genetic mapping, such as reduced genetic, phenotypic and environmental heterogeneity, are offset by the challenges posed by the large amounts of direct and cryptic relatedness in these populations confounding basic assumptions of independence. We have evaluated four representative specialized methods for association testing in the presence of relatedness; (i) within-family (ii) within- and between-family and (iii) mixed-models methods, using simulated traits for 2906 subjects with known genome-wide genotype data from an extremely isolated population, the Island of Kosrae, Federated States of Micronesia. We report that mixed models optimally extract association information from such samples, demonstrating 88% power to rank the true variant as among the top 10 genome-wide with 56% achieving genome-wide significance, a >80% improvement over the other methods, and demonstrate that population isolates have similar power to non-isolate populations for observing variants of known effects. We then used the mixed-model method to reanalyze data for 17 published phenotypes relating to metabolic traits and electrocardiographic measures, along with another 8 previously unreported. We replicate nine genome-wide significant associations with known loci of plasma cholesterol, high-density lipoprotein, low-density lipoprotein, triglycerides, thyroid stimulating hormone, homocysteine, C-reactive protein and uric acid, with only one detected in the previous analysis of the same traits. Further, we leveraged shared identity-by-descent genetic segments in the region of the uric acid locus to fine-map the signal, refining the known locus by a factor of 4. Finally, we report a novel associations for height (rs17629022, P< 2.1 × 10⁻⁸).
利用群体隔离进行遗传图谱构建的潜在益处,如降低遗传、表型和环境异质性,被这些群体中大量直接和隐藏的亲缘关系带来的挑战所抵消,这些亲缘关系使独立性的基本假设变得复杂。我们评估了在存在亲缘关系的情况下,用于关联测试的四种代表性专门方法:(i)家系内、(ii)家系内和家系间以及(iii)混合模型方法,使用模拟特征对来自密克罗尼西亚联邦科斯雷岛这一极度隔离群体的 2906 名已知全基因组基因型数据的个体进行了测试。我们报告称,混合模型可以从这些样本中最优地提取关联信息,以 88%的功效将真实变异排名为前 10 个全基因组范围内的变异,其中 56%达到全基因组显著水平,比其他方法提高了 80%以上,并表明隔离群体在观察已知效应变异方面具有与非隔离群体相似的功效。然后,我们使用混合模型方法重新分析了 17 个与代谢特征和心电图测量相关的已发表表型以及另外 8 个以前未报告的表型的数据。我们复制了与血浆胆固醇、高密度脂蛋白、低密度脂蛋白、甘油三酯、促甲状腺激素、同型半胱氨酸、C 反应蛋白和尿酸等已知基因座相关的 9 个全基因组显著关联,其中只有一个在之前对相同特征的分析中检测到。此外,我们利用尿酸基因座区域共享的身份相关遗传片段对信号进行精细定位,将已知基因座缩小了 4 倍。最后,我们报告了一个新的与身高相关的关联(rs17629022,P<2.1×10⁻⁸)。