Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, United States of America.
Department of Biology, Loyola University Chicago, Chicago, IL, United States of America.
PLoS One. 2022 Feb 24;17(2):e0264341. doi: 10.1371/journal.pone.0264341. eCollection 2022.
Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.
基因调控的基因表达帮助阐明了复杂性状的生物学机制。改进的高通量技术允许对基因调控的蛋白质组进行类似的研究,以了解复杂性状的机制。在这里,我们使用了 Trans-omics for Precision Medicine (TOPMed) Multi-omics 试点研究,该研究由多民族动脉粥样硬化研究 (MESA) 中的数据组成,以优化用于不同人群基因调控蛋白质组全基因组关联研究 (PWAS) 的血浆蛋白质组的遗传预测因子。我们使用通过 TOPMed MESA 收集的数据构建了蛋白质丰度的预测模型,我们已经通过 SOMAscan 测定法测量了 1,305 种蛋白质。我们将通过弹性网络回归构建的预测模型与通过精细映射 SNP 之前的弹性网络集成后验纳入概率构建的模型进行了比较。为了研究预测模型在不同祖源中的可转移性,我们在 TOPMed MESA 的四个人群中构建了蛋白质预测模型,分别是非洲裔美国人(n=183)、中国人(n=71)、欧洲人(n=416)和西班牙裔/拉丁裔(n=301),以及所有人群的组合。正如预期的那样,精细映射产生了更显著的蛋白质预测模型,尤其是在非洲祖源人群中,这可能增加了发现的机会。当我们在独立的欧洲 INTERVAL 研究中测试我们的 TOPMed MESA 模型时,精细映射提高了一些蛋白质的跨祖源预测。使用包含约 50,000 名西班牙裔/拉丁裔、非裔美国人、亚洲人、夏威夷原住民和美洲原住民的基因组学和流行病学使用人群结构 (PAGE) 研究的 GWAS 汇总统计数据,我们应用 S-PrediXcan 对 28 个复杂性状进行 PWAS。使用与 PAGE 相似祖源的蛋白质预测模型训练人群发现了最多的蛋白质 - 性状关联,共定位和复制了大的独立 GWAS。在当前的训练人群样本量下,PWAS 中基线和精细映射蛋白质预测模型之间的性能相似,突出了弹性网络的实用性。我们在不同人群中的预测模型可在 https://doi.org/10.5281/zenodo.4837327 处公开获取,用于蛋白质组映射方法。