School of Biomedical Engineering, Capital Medical University, Beijing, China.
Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China.
J Med Internet Res. 2022 Aug 3;24(8):e37486. doi: 10.2196/37486.
The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention.
We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI).
Medical concepts, including patients' age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score.
Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice.
The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.
电子病历(EMR)的广泛二次使用促进了医疗质量的改进。能够自动从 EMR 数据中提取隐藏信息的表示学习受到了越来越多的关注。
我们旨在提出一种具有更多特征关联和特定于任务的特征重要性的患者表示,以提高急性心肌梗死(AMI)住院患者的预后预测性能。
首先,使用改进的 skip-gram 算法将医疗概念(包括患者的年龄、性别、疾病诊断、实验室检查、结构化放射学特征、程序和药物)嵌入到实值向量中,其中上下文窗口中的概念是通过关联规则置信度测量的特征关联强度选择的。然后,将每个患者表示为任务特定特征重要性加权的特征嵌入的总和,该总和用于从全局和局部角度促进预测模型的预测。最后,我们将提出的患者表示应用于来自公共数据集和私有数据集的 3010 名和 1671 名 AMI 住院患者的死亡率风险预测中,并在 AUC(ROC)、AUPRC(精度-召回曲线)和 F1 分数方面与几个参考表示方法进行比较。
与参考方法相比,基于嵌入的表示在两个数据集上均表现出一致的卓越预测性能,分别实现了 AUC 的平均值为 0.878 和 0.973,AUPRC 的平均值为 0.220 和 0.505,以及 F1 分数的平均值为 0.376 和 0.674,而参考方法中的 AUC 最大值、AUPRC 最大值和 F1 分数分别为 0.847 和 0.939、0.196 和 0.283 以及 0.344 和 0.361。患者表示中集成的特征重要性反映了在预测任务和临床实践中也很关键的特征。
特征关联和特征重要性的引入促进了有效的患者表示,并有助于提高预测性能和模型解释。