Tian Peixin, Chan Tsai Hor, Wang Yong-Fei, Yang Wanling, Yin Guosheng, Zhang Yan Dora
Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China.
Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China.
Front Genet. 2022 Aug 19;13:906965. doi: 10.3389/fgene.2022.906965. eCollection 2022.
Polygenic risk scores (PRS) leverage the genetic contribution of an individual's genotype to a complex trait by estimating disease risk. Traditional PRS prediction methods are predominantly for the European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting a transfer learning framework to learn useful knowledge from the European population to correct the bias for non-European populations. We considered non-European GWAS data as the target data and European GWAS data as the informative auxiliary data. TL-Multi borrows useful information from the auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict the risk of systemic lupus erythematosus (SLE) in the Asian population and the risk of asthma in the Indian population by borrowing information from the European population. TL-Multi achieved better prediction accuracy than the competing methods, including Lassosum and meta-analysis in both simulations and real applications.
多基因风险评分(PRS)通过估计疾病风险来衡量个体基因型对复杂性状的遗传贡献。传统的PRS预测方法主要针对欧洲人群。由于全基因组关联研究(GWAS)在非欧洲人群中的样本量小得多,PRS在非欧洲人群中的预测准确性会降低。在本文中,我们介绍了一种为非欧洲人群构建PRS的新方法,简称为TL-Multi,该方法通过转移学习框架从欧洲人群中学习有用知识,以纠正非欧洲人群的偏差。我们将非欧洲GWAS数据视为目标数据,将欧洲GWAS数据视为信息辅助数据。TL-Multi从辅助数据中借用有用信息,以提高目标数据的学习准确性,同时保持效率和准确性。为了证明所提出方法的实际适用性,我们应用TL-Multi通过借鉴欧洲人群的信息来预测亚洲人群中的系统性红斑狼疮(SLE)风险和印度人群中的哮喘风险。在模拟和实际应用中,TL-Multi都比包括Lassosum和荟萃分析在内的竞争方法取得了更好的预测准确性。