Conomos Matthew P, Laurie Cecelia A, Stilp Adrienne M, Gogarten Stephanie M, McHugh Caitlin P, Nelson Sarah C, Sofer Tamar, Fernández-Rhodes Lindsay, Justice Anne E, Graff Mariaelisa, Young Kristin L, Seyerle Amanda A, Avery Christy L, Taylor Kent D, Rotter Jerome I, Talavera Gregory A, Daviglus Martha L, Wassertheil-Smoller Sylvia, Schneiderman Neil, Heiss Gerardo, Kaplan Robert C, Franceschini Nora, Reiner Alex P, Shaffer John R, Barr R Graham, Kerr Kathleen F, Browning Sharon R, Browning Brian L, Weir Bruce S, Avilés-Santa M Larissa, Papanicolaou George J, Lumley Thomas, Szpiro Adam A, North Kari E, Rice Ken, Thornton Timothy A, Laurie Cathy C
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2016 Jan 7;98(1):165-84. doi: 10.1016/j.ajhg.2015.12.001.
US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a "genetic-analysis group" variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.
美国西班牙裔/拉丁裔个体在遗传血统、文化和环境暴露方面存在差异。在此,我们在西班牙裔社区健康研究/拉丁裔研究(HCHS/SOL)的全基因组关联研究(GWAS)中对这种多样性进行了表征和控制。我们同时估计了对家族相关性稳健的群体结构主成分(PCs)和对群体结构、混合和哈迪-温伯格偏离稳健的成对亲缘系数(KCs)。这些主成分揭示了六个自我认定背景群体(古巴、多米尼加、波多黎各、墨西哥以及中美洲和南美洲)内部和之间存在的显著遗传分化。为了控制群体间的变异,我们开发了一种多维聚类方法来定义一个“遗传分析群体”变量,该变量保留了自我认定背景的许多属性,同时在群体内部实现了更大的遗传同质性,并纳入了自我认定不明确的参与者。在对22种生物医学性状的GWAS中,我们使用了一种线性混合模型(LMM),包括成对经验KCs以考虑家族相关性、用于祖先的PCs以及用于额外群体相关效应的遗传分析群体。将遗传分析群体作为协变量即使在我们拟合了20个PCs之后,仍解释了22个性状中8个性状的显著性状变异。此外,遗传分析群体对22个性状中的20个性状具有显著的残差方差异质性,并且在LMM中对这种异方差进行建模降低了19个性状的基因组膨胀。此外,拟合一个使用遗传分析群体而非自我认定背景群体的LMM能够获得更高的能力来检测先前报道的关联。我们预计这里应用的方法将对其他涉及多个种族群体、混合和相关性的研究有用。