将电子健康记录嵌入知识网络可识别多发性硬化症的前驱特征并预测诊断。

Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis.

机构信息

Integrated Program in Quantitative Biology, University of California San Francisco, San Francisco, California, USA.

Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA.

出版信息

J Am Med Inform Assoc. 2022 Jan 29;29(3):424-434. doi: 10.1093/jamia/ocab270.

DOI:10.1093/jamia/ocab270

PMID:34915552

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8800523/

Abstract

OBJECTIVE

Early identification of chronic diseases is a pillar of precision medicine as it can lead to improved outcomes, reduction of disease burden, and lower healthcare costs. Predictions of a patient's health trajectory have been improved through the application of machine learning approaches to electronic health records (EHRs). However, these methods have traditionally relied on "black box" algorithms that can process large amounts of data but are unable to incorporate domain knowledge, thus limiting their predictive and explanatory power. Here, we present a method for incorporating domain knowledge into clinical classifications by embedding individual patient data into a biomedical knowledge graph.

MATERIALS AND METHODS

A modified version of the Page rank algorithm was implemented to embed millions of deidentified EHRs into a biomedical knowledge graph (SPOKE). This resulted in high-dimensional, knowledge-guided patient health signatures (ie, SPOKEsigs) that were subsequently used as features in a random forest environment to classify patients at risk of developing a chronic disease.

RESULTS

Our model predicted disease status of 5752 subjects 3 years before being diagnosed with multiple sclerosis (MS) (AUC = 0.83). SPOKEsigs outperformed predictions using EHRs alone, and the biological drivers of the classifiers provided insight into the underpinnings of prodromal MS.

CONCLUSION

Using data from EHR as input, SPOKEsigs describe patients at both the clinical and biological levels. We provide a clinical use case for detecting MS up to 5 years prior to their documented diagnosis in the clinic and illustrate the biological features that distinguish the prodromal MS state.

摘要

目的

慢性病的早期识别是精准医学的一个支柱，因为它可以改善治疗效果、减轻疾病负担和降低医疗成本。通过将机器学习方法应用于电子健康记录（EHR），可以改善对患者健康轨迹的预测。然而，这些方法传统上依赖于“黑箱”算法，这些算法可以处理大量数据，但无法纳入领域知识，从而限制了其预测和解释能力。在这里，我们提出了一种通过将个体患者数据嵌入生物医学知识图来将领域知识纳入临床分类的方法。

材料和方法

我们实施了一种经过修改的 Page rank 算法，将数百万份去识别的 EHR 嵌入生物医学知识图（SPOKE）中。这导致了高维的、受知识指导的患者健康特征（即 SPOKEsigs），随后这些特征被用作随机森林环境中的特征来对有发展慢性病风险的患者进行分类。

结果

我们的模型预测了 5752 名受试者在被诊断为多发性硬化症（MS）之前 3 年的疾病状况（AUC = 0.83）。SPOKE 特征优于仅使用 EHR 的预测，并且分类器的生物学驱动因素提供了对前驱 MS 潜在机制的深入了解。

结论

使用 EHR 中的数据作为输入，SPOKE 特征可以描述患者的临床和生物学水平。我们提供了一个临床应用案例，用于在诊所记录的诊断前长达 5 年检测 MS，并说明了区分前驱 MS 状态的生物学特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/101b/8800523/416902f5037c/ocab270f1.jpg

相似文献

Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis.将电子健康记录嵌入知识网络可识别多发性硬化症的前驱特征并预测诊断。

J Am Med Inform Assoc. 2022 Jan 29;29(3):424-434. doi: 10.1093/jamia/ocab270.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+：利用异构知识资源丰富人类表型本体的节点嵌入。

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

Early detection of Parkinson's disease through enriching the electronic health record using a biomedical knowledge graph.通过使用生物医学知识图谱丰富电子健康记录来早期检测帕金森病。

Front Med (Lausanne). 2023 May 12;10:1081087. doi: 10.3389/fmed.2023.1081087. eCollection 2023.

Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings.将生物医学研究和电子健康记录相结合，创建基于知识的具有生物学意义的可机读嵌入式。

Nat Commun. 2019 Jul 10;10(1):3045. doi: 10.1038/s41467-019-11069-0.

Leveraging electronic health records data to predict multiple sclerosis disease activity.利用电子健康记录数据预测多发性硬化症的疾病活动。

Ann Clin Transl Neurol. 2021 Apr;8(4):800-810. doi: 10.1002/acn3.51324. Epub 2021 Feb 24.

Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record.基于规则和机器学习算法可在电子健康记录中准确识别系统性硬化症患者。

Arthritis Res Ther. 2019 Dec 30;21(1):305. doi: 10.1186/s13075-019-2092-7.

Time-aware Embeddings of Clinical Data using a Knowledge Graph.基于知识图谱的临床数据时间感知嵌入方法。

Pac Symp Biocomput. 2023;28:97-108.

Generative transfer learning for measuring plausibility of EHR diagnosis records.基于生成式迁移学习的电子病历诊断记录可信度评估

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

Unlocking the Power of EHRs: Harnessing Unstructured Data for Machine Learning-based Outcome Predictions.释放电子健康记录的力量：利用非结构化数据进行基于机器学习的结果预测。

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340232.

A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

引用本文的文献

Creating an Interactive Web Interface for Networks Stored in Knowledge Graph Databases.为存储在知识图谱数据库中的网络创建交互式网络界面。

Curr Protoc. 2025 Sep;5(9):e70200. doi: 10.1002/cpz1.70200.

A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics.生物统计学中使用学生t检验、曼-惠特尼U检验、卡方检验和克鲁斯卡尔-沃利斯检验的简易指南。

BioData Min. 2025 Aug 20;18(1):56. doi: 10.1186/s13040-025-00465-6.

KGG: a fully automated workflow for creating disease-specific knowledge graphs.KGG：一种用于创建疾病特定知识图谱的全自动工作流程。

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf383.

Phenotyping Healthcare Use 2-3 Decades Before the First Multiple Sclerosis Demyelinating Event.在首次多发性硬化脱髓鞘事件发生前20至30年的医疗保健使用情况表型分析。

Ann Clin Transl Neurol. 2025 Aug;12(8):1585-1594. doi: 10.1002/acn3.70092. Epub 2025 Jun 12.

Development and validation of a distributed representation model of Japanese high-dimensional administrative claims data for clinical epidemiology studies.用于临床流行病学研究的日本高维行政索赔数据分布式表示模型的开发与验证

BMC Med Res Methodol. 2025 Apr 11;25(1):95. doi: 10.1186/s12874-025-02549-7.

The Barancik award lecture: Multi-disciplinary research will be the key to stop, restore, and end MS.巴兰西克奖讲座：多学科研究将是阻止、恢复并终结多发性硬化症的关键。

Mult Scler. 2025 Apr;31(4):384-391. doi: 10.1177/13524585251314756. Epub 2025 Jan 28.

DOME: Directional medical embedding vectors from Electronic Health Records.DOME：来自电子健康记录的定向医学嵌入向量。

J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.

Machine learning based algorithms for virtual early detection and screening of neurodegenerative and neurocognitive disorders: a systematic-review.基于机器学习的神经退行性和神经认知障碍虚拟早期检测与筛查算法：一项系统综述。

Front Neurol. 2024 Dec 9;15:1413071. doi: 10.3389/fneur.2024.1413071. eCollection 2024.

Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine.用于推进精准医学的统一临床词汇嵌入

medRxiv. 2024 Dec 10:2024.12.03.24318322. doi: 10.1101/2024.12.03.24318322.

Biomedical knowledge graph-optimized prompt generation for large language models.生物医学知识图谱优化的大语言模型提示生成。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae560.

本文引用的文献

Constructing knowledge graphs and their biomedical applications.构建知识图谱及其生物医学应用。

Comput Struct Biotechnol J. 2020 Jun 2;18:1414-1428. doi: 10.1016/j.csbj.2020.05.017. eCollection 2020.

A myelin basic protein fragment induces sexually dimorphic transcriptome signatures of neuropathic pain in mice.髓鞘碱性蛋白片段在小鼠中诱导性别二态性神经病理性疼痛的转录组特征。

J Biol Chem. 2020 Jul 31;295(31):10807-10821. doi: 10.1074/jbc.RA120.013696. Epub 2020 Jun 12.

Harmonizing Clinical Sequencing and Interpretation for the eMERGE III Network.协调 eMERGE III 网络的临床测序和解读。

Am J Hum Genet. 2019 Sep 5;105(3):588-605. doi: 10.1016/j.ajhg.2019.07.018. Epub 2019 Aug 22.

Nat Commun. 2019 Jul 10;10(1):3045. doi: 10.1038/s41467-019-11069-0.

The prevalence of MS in the United States: A population-based estimate using health claims data.美国多发性硬化症的患病率：基于健康索赔数据的人群估计。

Neurology. 2019 Mar 5;92(10):e1029-e1040. doi: 10.1212/WNL.0000000000007035. Epub 2019 Feb 15.

Random forest versus logistic regression: a large-scale benchmark experiment.随机森林与逻辑回归：大规模基准实验。

BMC Bioinformatics. 2018 Jul 17;19(1):270. doi: 10.1186/s12859-018-2264-5.

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis.深度电子健康记录（EHR）：深度学习技术在电子健康记录（EHR）分析中的最新进展综述。

IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604. doi: 10.1109/JBHI.2017.2767063. Epub 2017 Oct 27.

Five years before multiple sclerosis onset: Phenotyping the prodrome.多发性硬化症发病前五年：前驱期表型。

Mult Scler. 2019 Jul;25(8):1092-1101. doi: 10.1177/1352458518783662. Epub 2018 Jul 6.

Prodromal symptoms of multiple sclerosis in primary care.多发性硬化症的初级保健前驱症状。

Ann Neurol. 2018 Jun;83(6):1162-1173. doi: 10.1002/ana.25247. Epub 2018 May 30.

Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning.预测未来一年内高血压的发病情况：一项使用全州电子健康记录和机器学习的前瞻性研究。

J Med Internet Res. 2018 Jan 30;20(1):e22. doi: 10.2196/jmir.9268.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将电子健康记录嵌入知识网络可识别多发性硬化症的前驱特征并预测诊断。

Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献