Xu Leqi, Zhou Geyu, Jiang Wei, Zhang Haoyu, Dong Yikai, Guan Leying, Zhao Hongyu
Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
bioRxiv. 2024 Sep 12:2023.10.29.564615. doi: 10.1101/2023.10.29.564615.
Genetic prediction accuracy for non-European populations is hindered by the limited sample size of Genome-wide association studies (GWAS) data in these populations. Additionally, it is challenging to tune model parameters with a small tuning dataset for methods that require tuning data, which is often the case for non-European samples. To address these challenges, we propose JointPRS, a novel, data-adaptive framework that simultaneously models multiple populations using GWAS summary statistics. JointPRS incorporates genetic correlation structures into the prediction framework, enabling accurate performance even without individual-level tuning data. Additionally, it uniquely employs a data-adaptive approach, providing a robust solution when only a small tuning dataset is available. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in five continental populations (European (EUR); East Asian (EAS); African (AFR); South Asian (SAS); and Admixed American (AMR)) evaluated using the UK Biobank (UKBB) and All of Us (AoU), we demonstrate that JointPRS outperforms six other state-of-art methods across three different data scenarios (no tuning data, tuning and testing data from the same cohort, and tuning and testing data from different cohorts) for most traits in non-European populations, while maintaining model simplicity and computational efficiency.
全基因组关联研究(GWAS)数据在非欧洲人群中的样本量有限,这阻碍了对这些人群的遗传预测准确性。此外,对于需要调整数据的方法,使用小的调整数据集来调整模型参数具有挑战性,非欧洲样本通常就是这种情况。为应对这些挑战,我们提出了JointPRS,这是一种新颖的数据自适应框架,它使用GWAS汇总统计数据同时对多个群体进行建模。JointPRS将遗传相关结构纳入预测框架,即使没有个体水平的调整数据也能实现准确的性能。此外,它独特地采用了数据自适应方法,当只有一个小的调整数据集可用时,能提供一个稳健的解决方案。通过对五个大陆人群(欧洲人(EUR);东亚人(EAS);非洲人(AFR);南亚人(SAS);以及混血美洲人(AMR))的22个数量性状和四个二元性状进行广泛的模拟和实际数据应用,并使用英国生物银行(UKBB)和“我们所有人”(AoU)进行评估,我们证明,在三种不同的数据场景(无调整数据、来自同一队列的调整和测试数据、以及来自不同队列的调整和测试数据)下,对于非欧洲人群的大多数性状,JointPRS优于其他六种最先进的方法,同时保持了模型的简单性和计算效率。