Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Division of Rheumatology, Immunology, and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Hum Mol Genet. 2021 Jul 28;30(16):1521-1534. doi: 10.1093/hmg/ddab130.
It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.
研究复杂性状在不同人群中的遗传学是很重要的。在这里,我们引入了协变量调整的连锁不平衡(LD)得分回归(cov-LDSC),这是一种使用汇总统计数据和样本内 LD 估计值来估计同质和混合人群中 SNP 遗传力(${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}})$及其富集的方法。样本内 LD 可以从全基因组关联研究样本的子集估计,使我们的方法能够有效地应用于非常大的队列。在模拟中,我们发现未经调整的 LDSC 在混合人群中低估了${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$10-60%;相比之下,cov-LDSC 则是稳健准确的。我们将 cov-LDSC 应用于来自 Slim Initiative in Genomic Medicine for the Americas 研究的 8124 名主要为混合血统的个体的基因分型数据,以及来自 23andMe 分类的约 161000 名拉丁裔血统个体、47000 名非裔美国人血统个体和 135000 名欧洲裔血统个体的基因分型数据。我们估计了${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$并检测了三个定量性状和五个二分性状的遗传力富集,这是迄今为止对混合个体进行的最全面的基于遗传力的分析。大多数性状的${\boldsymbol{h}}{\boldsymbol{g}}^{\mathbf{2}}$具有高度一致性,并且在不同人群中具有一致的组织特异性遗传力富集。然而,对于初潮年龄,我们观察到人口特异性的${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$遗传力估计值。我们观察到跨人群的组织特异性遗传力富集的一致模式;例如,在 BMI 的边缘系统中,每个标准化注释效应大小$\tau$*在拉丁裔、非裔美国人和欧洲裔人群中的值分别为 0.16±0.04、0.28±0.11 和 0.18±0.03。我们的方法是分析混合人群中复杂性状遗传数据的一种强大方法。