Lin Chin, Lee Yung-Tsai, Wu Feng-Jen, Lin Shing-An, Hsu Chia-Jung, Lee Chia-Cheng, Tsai Dung-Jang, Fang Wen-Hui
School of Medicine, National Defense Medical Center, Taipei 114, Taiwan.
School of Public Health, National Defense Medical Center, Taipei 114, Taiwan.
Healthcare (Basel). 2021 Sep 29;9(10):1298. doi: 10.3390/healthcare9101298.
Medical records scoring is important in a health care system. Artificial intelligence (AI) with projection word embeddings has been validated in its performance disease coding tasks, which maintain the vocabulary diversity of open internet databases and the medical terminology understanding of electronic health records (EHRs). We considered that an AI-enhanced system might be also applied to automatically score medical records. This study aimed to develop a series of deep learning models (DLMs) and validated their performance in medical records scoring task. We also analyzed the practical value of the best model. We used the admission medical records from the Tri-Services General Hospital during January 2016 to May 2020, which were scored by our visiting staffs with different levels from different departments. The medical records were scored ranged 0 to 10. All samples were divided into a training set ( = 74,959) and testing set ( = 152,730) based on time, which were used to train and validate the DLMs, respectively. The mean absolute error (MAE) was used to evaluate each DLM performance. In original AI medical record scoring, the predicted score by BERT architecture is closer to the actual reviewer score than the projection word embedding and LSTM architecture. The original MAE is 0.84 ± 0.27 using the BERT model, and the MAE is 1.00 ± 0.32 using the LSTM model. Linear mixed model can be used to improve the model performance, and the adjusted predicted score was closer compared to the original score. However, the project word embedding with the LSTM model (0.66 ± 0.39) provided better performance compared to BERT (0.70 ± 0.33) after linear mixed model enhancement ( < 0.001). In addition to comparing different architectures to score the medical records, this study further uses a mixed linear model to successfully adjust the AI medical record score to make it closer to the actual physician's score.
病历评分在医疗保健系统中很重要。具有投影词嵌入的人工智能(AI)在其性能疾病编码任务中已得到验证,该任务保持了开放互联网数据库的词汇多样性以及电子健康记录(EHR)的医学术语理解。我们认为人工智能增强系统也可能适用于自动对病历进行评分。本研究旨在开发一系列深度学习模型(DLM)并验证它们在病历评分任务中的性能。我们还分析了最佳模型的实用价值。我们使用了2016年1月至2020年5月期间三军总医院的入院病历,这些病历由来自不同科室的不同级别的来访工作人员进行评分。病历评分范围为0至10分。所有样本根据时间分为训练集(=74,959)和测试集(=152,730),分别用于训练和验证DLM。使用平均绝对误差(MAE)来评估每个DLM的性能。在原始的人工智能病历评分中,与投影词嵌入和LSTM架构相比,BERT架构预测的分数更接近实际评审员的分数。使用BERT模型时原始MAE为0.84±0.27,使用LSTM模型时MAE为1.00±0.32。线性混合模型可用于提高模型性能,调整后的预测分数与原始分数相比更接近。然而,在线性混合模型增强后,LSTM模型的投影词嵌入(0.66±0.39)比BERT(0.70±0.33)表现更好(<0.001)。除了比较不同架构对病历进行评分外,本研究还进一步使用混合线性模型成功调整了人工智能病历评分,使其更接近实际医生的评分。