Wang Chen, Markus Havell, Diwadkar Avantika R, Khunsriraksakul Chachrit, Carrel Laura, Li Bingshan, Zhong Xue, Wang Xingyan, Zhan Xiaowei, Foulke Galen T, Olsen Nancy J, Liu Dajiang J, Jiang Bibo
Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA.
Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA.
Nat Commun. 2025 Jan 2;16(1):180. doi: 10.1038/s41467-024-55636-6.
Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction and the resulting PRS yields the strongest correlation with progression prevalence.
自身免疫性疾病在诊断前通常会经历一个临床前期阶段。基于电子健康记录(EHR)的生物样本库包含遗传数据和诊断信息,可识别有病情进展风险的临床前期个体。生物样本库中的病例数量通常较少,不足以构建准确的多基因风险评分(PRS)。重要的是,疾病进展和病例对照表型可能具有共同的遗传基础,我们可以利用这一点来提高预测准确性。我们提出了一种新方法——遗传进展评分(GPS),它整合了生物样本库和病例对照研究来预测疾病进展风险。通过惩罚回归,GPS将病例对照研究的PRS权重作为先验纳入,并在该先验提高预测准确性时,迫使模型参数与先验相似。在模拟中,与仅依赖生物样本库或病例对照样本以及结合生物样本库和病例对照样本的其他策略相比,GPS始终具有更高的预测准确性。当生物样本库样本量较小或遗传相关性较低时,这种改进尤为明显。我们在BioVU生物样本库中推导了临床前期类风湿性关节炎和系统性红斑狼疮进展的PRS,并在“我们所有人”项目中对其进行了验证。对于这两种疾病,GPS都实现了最高的预测,并且所得的PRS与进展患病率的相关性最强。