Suppr超能文献

整合电子健康记录和全基因组关联研究汇总统计数据以预测自身免疫性疾病从临床前阶段的进展。

Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages.

作者信息

Wang Chen, Markus Havell, Diwadkar Avantika R, Khunsriraksakul Chachrit, Carrel Laura, Li Bingshan, Zhong Xue, Wang Xingyan, Zhan Xiaowei, Foulke Galen T, Olsen Nancy J, Liu Dajiang J, Jiang Bibo

机构信息

Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA.

Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA.

出版信息

Nat Commun. 2025 Jan 2;16(1):180. doi: 10.1038/s41467-024-55636-6.

Abstract

Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction and the resulting PRS yields the strongest correlation with progression prevalence.

摘要

自身免疫性疾病在诊断前通常会经历一个临床前期阶段。基于电子健康记录(EHR)的生物样本库包含遗传数据和诊断信息,可识别有病情进展风险的临床前期个体。生物样本库中的病例数量通常较少,不足以构建准确的多基因风险评分(PRS)。重要的是,疾病进展和病例对照表型可能具有共同的遗传基础,我们可以利用这一点来提高预测准确性。我们提出了一种新方法——遗传进展评分(GPS),它整合了生物样本库和病例对照研究来预测疾病进展风险。通过惩罚回归,GPS将病例对照研究的PRS权重作为先验纳入,并在该先验提高预测准确性时,迫使模型参数与先验相似。在模拟中,与仅依赖生物样本库或病例对照样本以及结合生物样本库和病例对照样本的其他策略相比,GPS始终具有更高的预测准确性。当生物样本库样本量较小或遗传相关性较低时,这种改进尤为明显。我们在BioVU生物样本库中推导了临床前期类风湿性关节炎和系统性红斑狼疮进展的PRS,并在“我们所有人”项目中对其进行了验证。对于这两种疾病,GPS都实现了最高的预测,并且所得的PRS与进展患病率的相关性最强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de36/11695684/984daee5d36b/41467_2024_55636_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验