Lape Michael, Schnell Daniel, Parameswaran Sreeja, Ernst Kevin, O'Connor Shannon, Salomonis Nathan, Martin Lisa J, Harnett Brett M, Kottyan Leah C, Weirauch Matthew T
Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
Commun Med (Lond). 2025 Jun 20;5(1):242. doi: 10.1038/s43856-025-00956-x.
Many relationships between pathogens and human disease are well-established. However, only a small fraction involve diseases considered non-communicable (NCDs). In this study, we sought to leverage the vast amount of newly available electronic health record data to identify potentially novel pathogen-NCD associations and find additional evidence supporting known associations.
We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies using a logistic regression-based statistical approach.
Our approach identifies 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including Helicobacter pylori, with several gastroenterological diseases and connections between Epstein-Barr virus and both multiple sclerosis and lupus. Overall, our approach identifies evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate the CMV-UC connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection.
Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in the processes underlying NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease .
病原体与人类疾病之间的许多关系已得到充分证实。然而,只有一小部分涉及被认为是非传染性疾病(NCDs)。在本研究中,我们试图利用大量新获得的电子健康记录数据来识别潜在的新型病原体与非传染性疾病的关联,并找到支持已知关联的更多证据。
我们利用来自英国生物银行和TriNetX的数据,对20种病原体和426种疾病(主要是非传染性疾病)进行系统调查。为此,我们使用基于逻辑回归的统计方法评估疾病状态与感染史代理之间的关联。
我们的方法识别出在两个队列中都能重复的206种病原体 - 疾病对。我们重现了许多已确立的关系,包括幽门螺杆菌与几种胃肠疾病的关系,以及爱泼斯坦 - 巴尔病毒与多发性硬化症和狼疮之间的联系。总体而言,我们的方法识别出15种病原体与96种不同疾病之间存在关联的证据,包括目前人类巨细胞病毒(CMV)与溃疡性结肠炎(UC)之间存在争议的联系。我们通过两项正交分析验证了CMV - UC的联系,揭示了UC患者中CMV基因表达增加,以及在CMV感染后表达发生改变的人类基因附近UC遗传风险信号的富集。
总的来说,这些结果为未来研究病原体在非传染性疾病潜在过程中所起的机制作用奠定了基础。所有结果均可在我们的网站https://tf.cchmc.org/pathogen-disease上轻松获取。