Suppr超能文献

“我们所有人”项目的多样性和规模在情境中改善了多基因预测,对代表性不足的人群改善最大。

All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for under-represented populations.

作者信息

Tsuo Kristin, Shi Zhuozheng, Ge Tian, Mandla Ravi, Hou Kangcheng, Ding Yi, Pasaniuc Bogdan, Wang Ying, Martin Alicia R

机构信息

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.

Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

bioRxiv. 2024 Aug 6:2024.08.06.606846. doi: 10.1101/2024.08.06.606846.

Abstract

Recent studies have demonstrated that polygenic risk scores (PRS) trained on multi-ancestry data can improve prediction accuracy in groups historically underrepresented in genomic studies, but the availability of linked health and genetic data from large-scale diverse cohorts representative of a wide spectrum of human diversity remains limited. To address this need, the All of Us research program (AoU) generated whole-genome sequences of 245,388 individuals who collectively reflect the diversity of the USA. Leveraging this resource and another widely-used population-scale biobank, the UK Biobank (UKB) with a half million participants, we developed PRS trained on multi-ancestry and multi-biobank data with up to ~750,000 participants for 32 common, complex traits and diseases across a range of genetic architectures. We then compared effects of ancestry, PRS methodology, and genetic architecture on PRS accuracy across a held out subset of ancestrally diverse AoU participants. Due to the more heterogeneous study design of AoU, we found lower heritability on average compared to UKB (0.075 vs 0.165), which limited the maximal achievable PRS accuracy in AoU. Overall, we found that the increased diversity of AoU significantly improved PRS performance in some participants in AoU, especially underrepresented individuals, across multiple phenotypes. Notably, maximizing sample size by combining discovery data across AoU and UKB is not the optimal approach for predicting some phenotypes in African ancestry populations; rather, using data from only AoU for these traits resulted in the greatest accuracy. This was especially true for less polygenic traits with large ancestry-enriched effects, such as neutrophil count ( : 0.055 vs. 0.035 using AoU vs. cross-biobank meta-analysis, respectively, because of e.g. ). Lastly, we calculated individual-level PRS accuracies rather than grouping by continental ancestry, a critical step towards interpretability in precision medicine. Individualized PRS accuracy decays linearly as a function of ancestry divergence, but the slope was smaller using multi-ancestry GWAS compared to using European GWAS. Our results highlight the potential of biobanks with more balanced representations of human diversity to facilitate more accurate PRS for the individuals least represented in genomic studies.

摘要

最近的研究表明,在多祖先数据上训练的多基因风险评分(PRS)可以提高基因组研究中历史上代表性不足群体的预测准确性,但来自代表广泛人类多样性的大规模不同队列的关联健康和遗传数据仍然有限。为满足这一需求,“我们所有人”研究计划(AoU)生成了245388人的全基因组序列,这些人共同反映了美国的多样性。利用这一资源以及另一个广泛使用的人群规模生物样本库——拥有50万参与者的英国生物样本库(UKB),我们针对一系列遗传结构中的32种常见复杂性状和疾病,开发了在多祖先和多生物样本库数据上训练的PRS,参与人数多达约75万。然后,我们在 ancestrally diverse AoU参与者的一个保留子集中,比较了祖先、PRS方法和遗传结构对PRS准确性的影响。由于AoU的研究设计更加异质,我们发现与UKB相比,平均遗传力较低(0.075对0.165),这限制了AoU中可实现的最大PRS准确性。总体而言,我们发现AoU增加的多样性显著提高了AoU中一些参与者的PRS性能,特别是代表性不足的个体,在多种表型上均如此。值得注意的是,通过合并AoU和UKB的发现数据来最大化样本量,并非预测非洲祖先人群某些表型的最佳方法;相反,仅使用AoU的数据来预测这些性状,准确性最高。对于具有较大祖先富集效应的少基因性状尤其如此,例如中性粒细胞计数(例如,分别使用AoU和跨生物样本库荟萃分析时, :0.055对0.035)。最后,我们计算了个体水平的PRS准确性,而不是按大陆祖先进行分组,这是迈向精准医学可解释性的关键一步。个性化的PRS准确性随着祖先差异呈线性下降,但与使用欧洲全基因组关联研究(GWAS)相比,使用多祖先GWAS时斜率更小。我们的结果凸显了具有更平衡人类多样性代表性的生物样本库在促进为基因组研究中代表性最少的个体提供更准确PRS方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f48/12218722/6754c4e0412c/nihpp-2024.08.06.606846v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验