Peterson Roseann E, Edwards Alexis C, Bacanu Silviu-Alin, Dick Danielle M, Kendler Kenneth S, Webb Bradley T
Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia.
Departments of Psychology, African American Studies, and Human and Molecular Genetics, Virginia Commonwealth University, Richmond, Virginia.
Am J Addict. 2017 Aug;26(5):494-501. doi: 10.1111/ajad.12586. Epub 2017 Jul 17.
Given moderate heritability and significant heterogeneity among addiction phenotypes, successful genome-wide association studies (GWAS) are expected to need very large samples. As sample sizes grow, so can genetic diversity leading to challenges in analyzing these data. Methods for empirically assigning individuals to genetically informed ancestry groups are needed.
We describe a strategy for empirically assigning ancestry groups in ethnically diverse GWAS data including extensions of principal component analysis (PCA) and population matching through minimum Mahalanobis distance. We apply these methods to data from Spit for Science (S4S): the University Student Survey, a study following college students longitudinally that includes genetic and environmental data on substance use and mental health (n = 7,603).
The genetic-based population assignments for S4S were 48.7% European, 22.5% African, 10.4% Americas, 9.2% East Asian, and 9.2% South Asian descent. Self-reported census categories "More than one race" and "Unknown"as well as "Hawaiian/Pacific Islander" and "American-Indian/Native Alaskan" were empirically assigned representing a +9% sample retention over conventional methods. Although there was high concordance between self-reported race and empirical population-match (+.924), there was reduction in variance for most ancestry PCs for genetic-based population assignments.
We were able to create more genetically homogenous groups and reduce sample and marker loss through cross-ancestry meta-analysis, potentially increasing power to detect etiologically relevant variation. Our approach provides a framework for empirically assigning genetic ancestry groups which can be applied to other ethnically diverse genetic studies.
Given the important public health impact and demonstrable gains in statistical power from studying diverse populations, empirically sound practices for genetic studies are needed. (Am J Addict 2017;26:494-501).
鉴于成瘾表型具有中等遗传性且存在显著异质性,成功的全基因组关联研究(GWAS)预计需要非常大的样本量。随着样本量的增加,遗传多样性也会增加,从而给这些数据分析带来挑战。因此需要采用经验性方法将个体分配到基于遗传信息的祖先群体中。
我们描述了一种在种族多样化的GWAS数据中经验性分配祖先群体的策略,包括主成分分析(PCA)的扩展以及通过最小马氏距离进行群体匹配。我们将这些方法应用于“吐唾为科学”(S4S):大学生调查的数据,这是一项对大学生进行纵向跟踪的研究,包括关于物质使用和心理健康的遗传和环境数据(n = 7603)。
S4S基于遗传的群体分配情况为:48.7%为欧洲血统,22.5%为非洲血统,10.4%为美洲血统,9.2%为东亚血统,9.2%为南亚血统。自我报告的人口普查类别“不止一个种族”和“未知”以及“夏威夷/太平洋岛民”和“美洲印第安人/阿拉斯加原住民”通过经验性分配得以体现,相较于传统方法样本保留率提高了9%。尽管自我报告的种族与经验性群体匹配之间具有高度一致性(+.924),但基于遗传的群体分配使大多数祖先主成分的方差有所降低。
通过跨祖先元分析,我们能够创建更多遗传上同质的群体,并减少样本和标记损失,这可能会增强检测病因相关变异的能力。我们的方法提供了一个经验性分配遗传祖先群体的框架,可应用于其他种族多样化的遗传研究。
鉴于研究多样化人群对公共卫生具有重要影响且能显著提高统计效能,因此需要遗传研究采用基于经验的合理方法。(《美国成瘾杂志》2017年;26:494 - 501)