PheWP2V：一种利用电子健康记录的加权患者表征进行全表型预测的框架。

PheWP2V: a phenome-wide prediction framework with weighted patient representations using electronic health records.

作者信息

Guo Jia, Kiryluk Krzysztof, Wang Shuang

机构信息

Department of Biostatistics, Columbia University, New York, NY 10032, United States.

Department of Medicine, Columbia University, New York, NY 10032, United States.

出版信息

JAMIA Open. 2024 Sep 14;7(3):ooae084. doi: 10.1093/jamiaopen/ooae084. eCollection 2024 Oct.

DOI:10.1093/jamiaopen/ooae084

PMID:39282083

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11401611/

Abstract

OBJECTIVE

Electronic health records (EHRs) provide opportunities for the development of computable predictive tools. Conventional machine learning methods and deep learning methods have been widely used for this task, with the approach of usually designing one tool for one clinical outcome. Here we developed PheWP2V, a nome-ide prediction framework using eighted atient ectors. PheWP2V conducts tailored predictions for phenome-wide phenotypes using numeric representations of patients' past medical records weighted based on their similarities with individual phenotypes.

MATERIALS AND METHODS

PheWP2V defines clinical disease phenotypes using Phecode mapping based on International Classification of Disease codes, which reduces redundancy and case-control misclassification in real-life EHR datasets. Through upweighting medical records of patients that are more relevant to a phenotype of interest in calculating patient vectors, PheWP2V achieves tailored incidence risk prediction of a phenotype. The calculation of weighted patient vectors is computationally efficient, and the weighting mechanism ensures tailored predictions across the phenome. We evaluated prediction performance of PheWP2V and baseline methods with simulation studies and clinical applications using the MIMIC-III database.

RESULTS

Across 942 phenome-wide predictions using the MIMIC-III database, PheWP2V has median area under the receiver operating characteristic curve (AUC-ROC) 0.74 (baseline methods have values ≤0.72), median max F-score 0.20 (baseline methods have values ≤0.19), and median area under the precision-recall curve (AUC-PR) 0.10 (baseline methods have values ≤0.10).

DISCUSSION

PheWP2V can predict phenotypes efficiently by using medical concept embeddings and upweighting relevant past medical histories. By leveraging both labeled and unlabeled data, PheWP2V reduces overfitting and improves predictions for rare phenotypes, making it a useful screening tool for early diagnosis of high-risk conditions, though further research is needed to assess the transferability of embeddings across different databases.

CONCLUSIONS

PheWP2V is fast, flexible, and has superior prediction performance for many clinical disease phenotypes across the phenome of the MIMIC-III database compared to that of several popular baseline methods.

摘要

目的

电子健康记录（EHRs）为可计算预测工具的开发提供了机会。传统机器学习方法和深度学习方法已广泛用于此任务，通常是针对一种临床结果设计一种工具。在此，我们开发了PheWP2V，一种使用加权患者向量的全表型预测框架。PheWP2V使用基于患者既往病历的数字表示形式（根据其与个体表型的相似性进行加权），对全表型范围内的表型进行定制预测。

材料与方法

PheWP2V使用基于疾病国际分类代码的Phecode映射来定义临床疾病表型，这减少了现实生活EHR数据集中的冗余和病例对照错误分类。通过在计算患者向量时对与感兴趣表型更相关的患者病历进行加权，PheWP2V实现了对表型的定制发病率风险预测。加权患者向量的计算在计算上是高效的，并且加权机制确保了全表型范围内的定制预测。我们使用MIMIC-III数据库通过模拟研究和临床应用评估了PheWP2V和基线方法的预测性能。

结果

在使用MIMIC-III数据库进行的942次全表型预测中，PheWP2V的受试者操作特征曲线下面积（AUC-ROC）中位数为0.74（基线方法的值≤0.72），最大F分数中位数为0.20（基线方法的值≤0.19），精确召回率曲线下面积（AUC-PR）中位数为0.10（基线方法的值≤0.10）。

讨论

PheWP2V可以通过使用医学概念嵌入和对相关既往病史进行加权来有效预测表型。通过利用标记数据和未标记数据，PheWP2V减少了过拟合并改善了对罕见表型的预测，使其成为早期诊断高危疾病的有用筛查工具，不过需要进一步研究来评估嵌入在不同数据库之间的可转移性。

结论

与几种流行的基线方法相比，PheWP2V快速、灵活，并且在MIMIC-III数据库的全表型范围内对许多临床疾病表型具有卓越的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bd4/11401611/581a3ee87bdb/ooae084f1.jpg

相似文献

PheWP2V: a phenome-wide prediction framework with weighted patient representations using electronic health records.PheWP2V：一种利用电子健康记录的加权患者表征进行全表型预测的框架。

JAMIA Open. 2024 Sep 14;7(3):ooae084. doi: 10.1093/jamiaopen/ooae084. eCollection 2024 Oct.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果：来自系统评价和意大利医院数据评估的证据]

Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

本文引用的文献

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT：基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

J Biomed Inform. 2021 Apr;116:103711. doi: 10.1016/j.jbi.2021.103711. Epub 2021 Feb 19.

Novel Machine Learning Can Predict Acute Asthma Exacerbation.新型机器学习可预测哮喘急性加重

Chest. 2021 May;159(5):1747-1757. doi: 10.1016/j.chest.2020.12.051. Epub 2021 Jan 10.

Big data in digital healthcare: lessons learnt and recommendations for general practice.数字医疗中的大数据：全科医学的经验教训和建议。

Heredity (Edinb). 2020 Apr;124(4):525-534. doi: 10.1038/s41437-020-0303-2. Epub 2020 Mar 5.

Diabetic Retinopathy-An Underdiagnosed and Undertreated Inflammatory, Neuro-Vascular Complication of Diabetes.糖尿病性视网膜病变——一种诊断不足且治疗不充分的糖尿病炎症性神经血管并发症。

Front Endocrinol (Lausanne). 2019 Dec 13;10:843. doi: 10.3389/fendo.2019.00843. eCollection 2019.

Predicting emergency department orders with multilabel machine learning techniques and simulating effects on length of stay.使用多标签机器学习技术预测急诊科医嘱并模拟对住院时间的影响。

J Am Med Inform Assoc. 2019 Dec 1;26(12):1427-1436. doi: 10.1093/jamia/ocz171.

Readmission prediction using deep learning on electronic health records.基于电子健康记录的深度学习再入院预测。

J Biomed Inform. 2019 Sep;97:103256. doi: 10.1016/j.jbi.2019.103256. Epub 2019 Jul 24.

Phenotyping through Semi-Supervised Tensor Factorization (PSST).通过半监督张量分解进行表型分析（PSST）。

AMIA Annu Symp Proc. 2018 Dec 5;2018:564-573. eCollection 2018.

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.机器学习模型在电子健康记录中可以优于传统的生存模型，用于预测冠心病患者的死亡率。

PLoS One. 2018 Aug 31;13(8):e0202344. doi: 10.1371/journal.pone.0202344. eCollection 2018.

Recurrent Neural Networks for Multivariate Time Series with Missing Values.具有缺失值的多元时间序列的递归神经网络。

Sci Rep. 2018 Apr 17;8(1):6085. doi: 10.1038/s41598-018-24271-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PheWP2V：一种利用电子健康记录的加权患者表征进行全表型预测的框架。

PheWP2V: a phenome-wide prediction framework with weighted patient representations using electronic health records.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

材料与方法

结果

讨论

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献