选择对非洲、美洲印第安人和欧洲血统有信息价值的单核苷酸多态性:应用于肾病与糖尿病家族调查(FIND)。

Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND).

作者信息

Williams Robert C, Elston Robert C, Kumar Pankaj, Knowler William C, Abboud Hanna E, Adler Sharon, Bowden Donald W, Divers Jasmin, Freedman Barry I, Igo Robert P, Ipp Eli, Iyengar Sudha K, Kimmel Paul L, Klag Michael J, Kohn Orly, Langefeld Carl D, Leehey David J, Nelson Robert G, Nicholas Susanne B, Pahl Madeleine V, Parekh Rulan S, Rotter Jerome I, Schelling Jeffrey R, Sedor John R, Shah Vallabh O, Smith Michael W, Taylor Kent D, Thameem Farook, Thornley-Brown Denyse, Winkler Cheryl A, Guo Xiuqing, Zager Phillip, Hanson Robert L

机构信息

Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ, 85014, USA.

Genetic Analysis and Data Coordinating Center, Case Western Reserve University, Cleveland, OH, 44104, USA.

出版信息

BMC Genomics. 2016 May 4;17:325. doi: 10.1186/s12864-016-2654-x.

Abstract

BACKGROUND

The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample.

RESULTS

A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy.

CONCLUSIONS

The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.

摘要

背景

样本中群体结构的存在可能会混淆与疾病相关的重要基因位点的搜索。我们在肾病与糖尿病家族调查(FIND)中的四个样本,即欧裔美国人、墨西哥裔美国人、非裔美国人和美洲印第安人,是全基因组关联研究的一部分,在该研究中群体结构可能尤为重要。因此,我们决定详细研究其中一个组成部分,即个体遗传血统(IGA)。通过Affymetrix 6.0人类SNP芯片上的单核苷酸多态性(SNP),我们确定了3组祖先信息标记(AIM),每组针对祖先群体中的三种对比之一(欧洲人(HAPMAP,CEU)、非洲人(HAPMAP,YRI和LWK)以及美洲原住民(全血统皮马印第安人))的信息进行了最大化处理。我们估计了IGA并给出了其标准误差的算法,将IGA与主成分进行了比较,强调了在祖先信息标记(AIM)中平衡信息的重要性,并在合并样本中测试了IGA与糖尿病肾病的关联。

结果

应用固定亲本等位基因最大似然算法对FIND中的四个样本估计IGA:869名美洲印第安人;1385名非裔美国人;1451名墨西哥裔美国人;以及826名欧裔美国人。当AIM中的信息不平衡时,估计值会出现较大误差且不正确。个体遗传混合与用于捕捉群体结构的主成分高度相关。需要约700个SNP才能将个体混合的平均标准误差降低到0.01以下。当样本合并时,由此产生的群体结构会在IGA与糖尿病肾病之间产生关联。

结论

所确定的一组AIM,其中包括美洲印第安人的亲本等位基因频率,可能对估计美洲人群体中的遗传混合特别有用。在最大似然多祖先模型中未能平衡信息会导致个体混合估计出现偏差且误差较大。在使用STRUCTURE程序中实现的贝叶斯聚类方法估计IGA时也会出现这种情况。IGA与疾病关联的优势比与这些人群中糖尿病肾病的发病率和患病率情况相符。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56df/4855449/e95f1127ca5c/12864_2016_2654_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索