• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于半监督学习的异质电子病历中患者相似性研究。

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.

机构信息

School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, People's Republic of China.

Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, 100069, People's Republic of China.

出版信息

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.

DOI:10.1186/s12911-021-01432-x
PMID:34330261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8323210/
Abstract

BACKGROUND

A new learning-based patient similarity measurement was proposed to measure patients' similarity for heterogeneous electronic medical records (EMRs) data.

METHODS

We first calculated feature-level similarities according to the features' attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss.

RESULTS

As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models.

CONCLUSIONS

This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.

摘要

背景

为了测量异构电子病历(EMR)数据中患者的相似性,提出了一种新的基于学习的患者相似性测量方法。

方法

我们首先根据特征的属性计算特征级相似性。一位领域专家提供了 30 名随机选择患者的患者相似性评分。这些相似性评分和 30 名患者的特征级相似性构成了有标签的样本集,用于半监督学习算法学习所有患者的患者级相似性。然后,我们使用 k-最近邻(kNN)分类器预测四种肝脏状况。在四种不同情况下比较了预测性能。我们还比较了个性化 kNN 模型和其他机器学习模型之间的性能。我们通过接收者操作特征曲线下的面积(AUC)、F1 分数和交叉熵(CE)损失来评估预测性能。

结果

随着随机训练样本数量的增加,使用学习到的患者相似性选择近邻的 kNN 模型始终优于使用欧几里得距离选择近邻的 kNN 模型(所有 P 值均<0.001)。使用学习到的患者相似性从随机训练样本中识别前 k 个最近邻的 kNN 模型也具有更高的最佳性能(AUC:0.95 与 0.89,F1 分数:0.84 与 0.67,CE 损失:1.22 与 1.82)比使用欧几里得距离的模型。随着相似训练样本数量的增加,由学习到的患者相似性确定的最相似样本的数量增加,使用简单欧几里得距离选择近邻的 kNN 模型的性能逐渐下降。当在选择近邻和相似训练样本时交换欧几里得距离和学习到的患者相似性的角色时,kNN 模型的性能逐渐提高。这两种 kNN 模型的 AUC 最佳性能均为 0.95,F1 分数均为 0.84,CE 损失均为 1.22。在四个参考模型中,最高的 AUC 和 F1 分数分别为 0.94 和 0.80,均低于简单和基于相似性的 kNN 模型。

结论

这种基于学习的方法为基于异构 EMR 数据的相似性测量开辟了机会,并支持 EMR 数据的二次使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/4c9a78e35cfe/12911_2021_1432_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/640b2db4fac5/12911_2021_1432_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/d54e93e828f9/12911_2021_1432_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/c25a37bdbac8/12911_2021_1432_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/7d5320875472/12911_2021_1432_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/4c9a78e35cfe/12911_2021_1432_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/640b2db4fac5/12911_2021_1432_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/d54e93e828f9/12911_2021_1432_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/c25a37bdbac8/12911_2021_1432_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/7d5320875472/12911_2021_1432_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3de7/8323210/4c9a78e35cfe/12911_2021_1432_Fig5_HTML.jpg

相似文献

1
Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.
2
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
3
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.
4
Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估,作为一种用于纵向医疗记录的相似性度量方法。
J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.
5
Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.距离度量选择对 K-最近邻分类器性能的影响:综述
Big Data. 2019 Dec;7(4):221-248. doi: 10.1089/big.2018.0175. Epub 2019 Aug 14.
6
Sequential Data-Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development.基于序列数据的患者相似性框架用于患者预后预测:算法开发。
J Med Internet Res. 2022 Jan 6;24(1):e30720. doi: 10.2196/30720.
7
Machine learning-based prediction models for home discharge in patients with COVID-19: Development and evaluation using electronic health records.基于机器学习的 COVID-19 患者出院预测模型:利用电子健康记录进行开发和评估。
PLoS One. 2023 Oct 20;18(10):e0292888. doi: 10.1371/journal.pone.0292888. eCollection 2023.
8
Supervised learning applied to classifying fallers versus non-fallers among older adults with cancer.应用于对老年癌症患者中跌倒者和非跌倒者进行分类的有监督学习。
J Geriatr Oncol. 2023 May;14(4):101498. doi: 10.1016/j.jgo.2023.101498. Epub 2023 Apr 19.
9
Landslide susceptibility prediction improvements based on a semi-integrated supervised machine learning model.基于半集成监督机器学习模型的滑坡易发性预测改进
Environ Sci Pollut Res Int. 2023 Apr;30(17):50280-50294. doi: 10.1007/s11356-023-25650-0. Epub 2023 Feb 15.
10
Developing nonlinear k-nearest neighbors classification algorithms to identify patients at high risk of increased length of hospital stay following spine surgery.开发非线性 k-最近邻分类算法,以识别脊柱手术后住院时间延长风险较高的患者。
Neurosurg Focus. 2023 Jun;54(6):E7. doi: 10.3171/2023.3.FOCUS22651.

引用本文的文献

1
Uncovering the Understanding of the Concept of Patient Similarity in Cancer Research and Treatment: Scoping Review.揭示癌症研究与治疗中患者相似性概念的理解:范围综述
J Med Internet Res. 2025 Aug 18;27:e71906. doi: 10.2196/71906.
2
A Personalized Predictive Model That Jointly Optimizes Discrimination and Calibration.一种联合优化区分度和校准度的个性化预测模型。
Stat Med. 2025 May;44(10-12):e70077. doi: 10.1002/sim.70077.
3
Sequential Data-Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development.

本文引用的文献

1
A patient-similarity-based model for diagnostic prediction.基于患者相似性的诊断预测模型。
Int J Med Inform. 2020 Mar;135:104073. doi: 10.1016/j.ijmedinf.2019.104073. Epub 2019 Dec 30.
2
Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records.基于电子病历的个性化预测建模中患者相似性的测量和应用。
Biomed Eng Online. 2019 Oct 11;18(1):98. doi: 10.1186/s12938-019-0718-2.
3
Study on Patient Similarity Measurement Based on Electronic Medical Records.基于电子病历的患者相似度测量研究
基于序列数据的患者相似性框架用于患者预后预测:算法开发。
J Med Internet Res. 2022 Jan 6;24(1):e30720. doi: 10.2196/30720.
Stud Health Technol Inform. 2019 Aug 21;264:1484-1485. doi: 10.3233/SHTI190496.
4
Global liver disease burdens and research trends: Analysis from a Chinese perspective.全球肝脏疾病负担与研究趋势:中国视角分析。
J Hepatol. 2019 Jul;71(1):212-221. doi: 10.1016/j.jhep.2019.03.004. Epub 2019 Mar 12.
5
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.全球癌症统计数据 2018:GLOBOCAN 对全球 185 个国家/地区 36 种癌症的发病率和死亡率的估计。
CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492. Epub 2018 Sep 12.
6
Patient similarity for precision medicine: A systematic review.精准医学中的患者相似性:系统评价。
J Biomed Inform. 2018 Jul;83:87-96. doi: 10.1016/j.jbi.2018.06.001. Epub 2018 Jun 1.
7
Symptom-based network classification identifies distinct clinical subgroups of liver diseases with common molecular pathways.基于症状的网络分类确定了具有共同分子途径的肝脏疾病的不同临床亚群。
Comput Methods Programs Biomed. 2019 Jun;174:41-50. doi: 10.1016/j.cmpb.2018.02.014. Epub 2018 Feb 22.
8
A case-based reasoning system based on weighted heterogeneous value distance metric for breast cancer diagnosis.一种基于加权异构值距离度量的乳腺癌诊断案例推理系统。
Artif Intell Med. 2017 Mar;77:31-47. doi: 10.1016/j.artmed.2017.02.003. Epub 2017 Feb 11.
9
Prevalence of Nonalcoholic Fatty Liver Disease and its Related Metabolic Risk Factors in Isfahan, Iran.伊朗伊斯法罕非酒精性脂肪性肝病及其相关代谢危险因素的患病率
Adv Biomed Res. 2017 Apr 17;6:47. doi: 10.4103/2277-9175.204590. eCollection 2017.
10
Patient Similarity in Prediction Models Based on Health Data: A Scoping Review.基于健康数据的预测模型中的患者相似性:一项范围综述。
JMIR Med Inform. 2017 Mar 3;5(1):e7. doi: 10.2196/medinform.6730.