投影词嵌入在病历评分系统中的应用

The Application of Projection Word Embeddings on Medical Records Scoring System.

作者信息

Lin Chin, Lee Yung-Tsai, Wu Feng-Jen, Lin Shing-An, Hsu Chia-Jung, Lee Chia-Cheng, Tsai Dung-Jang, Fang Wen-Hui

机构信息

School of Medicine, National Defense Medical Center, Taipei 114, Taiwan.

School of Public Health, National Defense Medical Center, Taipei 114, Taiwan.

出版信息

Healthcare (Basel). 2021 Sep 29;9(10):1298. doi: 10.3390/healthcare9101298.

DOI:10.3390/healthcare9101298

PMID:34682978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8544381/

Abstract

Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set ( = 74,959) and testing set ( = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement ( < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.

摘要

病历评分在医疗保健系统中很重要。具有投影词嵌入的人工智能（AI）在其性能疾病编码任务中已得到验证，该任务保持了开放互联网数据库的词汇多样性以及电子健康记录（EHR）的医学术语理解。我们认为人工智能增强系统也可能适用于自动对病历进行评分。本研究旨在开发一系列深度学习模型（DLM）并验证它们在病历评分任务中的性能。我们还分析了最佳模型的实用价值。我们使用了2016年1月至2020年5月期间三军总医院的入院病历，这些病历由来自不同科室的不同级别的来访工作人员进行评分。病历评分范围为0至10分。所有样本根据时间分为训练集（=74,959）和测试集（=152,730），分别用于训练和验证DLM。使用平均绝对误差（MAE）来评估每个DLM的性能。在原始的人工智能病历评分中，与投影词嵌入和LSTM架构相比，BERT架构预测的分数更接近实际评审员的分数。使用BERT模型时原始MAE为0.84±0.27，使用LSTM模型时MAE为1.00±0.32。线性混合模型可用于提高模型性能，调整后的预测分数与原始分数相比更接近。然而，在线性混合模型增强后，LSTM模型的投影词嵌入（0.66±0.39）比BERT（0.70±0.33）表现更好（<0.001）。除了比较不同架构对病历进行评分外，本研究还进一步使用混合线性模型成功调整了人工智能病历评分，使其更接近实际医生的评分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dd3/8544381/44d77348b965/healthcare-09-01298-g001.jpg

相似文献

The Application of Projection Word Embeddings on Medical Records Scoring System.

Healthcare (Basel). 2021 Sep 29;9(10):1298. doi: 10.3390/healthcare9101298.

Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.

JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.

A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.

BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.

Medical Specialty Recommendations by an Artificial Intelligence Chatbot on a Smartphone: Development and Deployment.

J Med Internet Res. 2021 May 6;23(5):e27460. doi: 10.2196/27460.

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records.

Adv Ther. 2023 Mar;40(3):934-950. doi: 10.1007/s12325-022-02397-7. Epub 2022 Dec 22.

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding.

Comput Intell Neurosci. 2022 Feb 15;2022:8467349. doi: 10.1155/2022/8467349. eCollection 2022.

Model-based clinical note entity recognition for rheumatoid arthritis using bidirectional encoder representation from transformers.

Quant Imaging Med Surg. 2022 Jan;12(1):184-195. doi: 10.21037/qims-21-90.

Identifying the Perceived Severity of Patient-Generated Telemedical Queries Regarding COVID: Developing and Evaluating a Transfer Learning-Based Solution.

JMIR Med Inform. 2022 Sep 2;10(9):e37770. doi: 10.2196/37770.

Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

引用本文的文献

Development and validation of a dynamic deep learning algorithm using electrocardiogram to predict dyskalaemias in patients with multiple visits.

Eur Heart J Digit Health. 2022 Nov 22;4(1):22-32. doi: 10.1093/ehjdh/ztac072. eCollection 2023 Jan.

本文引用的文献

Leveraging electronic health records data to predict multiple sclerosis disease activity.

Ann Clin Transl Neurol. 2021 Apr;8(4):800-810. doi: 10.1002/acn3.51324. Epub 2021 Feb 24.

Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.

PLoS One. 2020 Aug 21;15(8):e0237911. doi: 10.1371/journal.pone.0237911. eCollection 2020.

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

Analyses of electronic health records utilization in a large community hospital.

PLoS One. 2020 Jul 1;15(7):e0233004. doi: 10.1371/journal.pone.0233004. eCollection 2020.

Electronic health records and polygenic risk scores for predicting disease risk.

Nat Rev Genet. 2020 Aug;21(8):493-502. doi: 10.1038/s41576-020-0224-1. Epub 2020 Mar 31.

Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm.

Int J Med Inform. 2020 May;137:104105. doi: 10.1016/j.ijmedinf.2020.104105. Epub 2020 Mar 3.

Development and Validation of a Prediction Model for Atrial Fibrillation Using Electronic Health Records.

JACC Clin Electrophysiol. 2019 Nov;5(11):1331-1341. doi: 10.1016/j.jacep.2019.07.016. Epub 2019 Oct 2.

Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.

JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.

A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.

JAMA. 2017 Dec 12;318(22):2199-2210. doi: 10.1001/jama.2017.14585.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

投影词嵌入在病历评分系统中的应用

The Application of Projection Word Embeddings on Medical Records Scoring System.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献