Han Benedict Choonghyun, Kim Jimin, Choi Jinwook
Interdisciplinary Program in Bioengineering, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826 Republic of Korea.
English Language and Literature, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826 Republic of Korea.
Biomed Eng Lett. 2023 Oct 6;14(1):163-171. doi: 10.1007/s13534-023-00322-7. eCollection 2024 Jan.
: This study aims to predict the progression of Diabetes Mellitus (DM) from the clinical notes through machine learning based on latent Dirichlet allocation (LDA) topic modeling. Particularly, 174,427 clinical notes of DM patients were collected from the electronic medical record (EMR) system of the Seoul National University Hospital outpatient clinic. : We developed a model to predict the development of DM complications. Topics developed by the topic model were exploited as the key feature of our machine-learning model. The proposed model generalized a correlation between topic structures and complications. : The model provided acceptable predictive performance for all four types of complications (diabetic retinopathy, diabetic nephropathy, nonalcoholic fatty liver disease, and cerebrovascular accident). Upon employing extreme gradient boosting (XGBoost), we obtained the F1 scores of the predictions for each complication type as 0.844, 0.921, 0.831, and 0.762. : This study shows that a machine learning project based on topic modeling can effectively predict the progress of a disease. Furthermore, a unique way of topic model transplanting, which matches the dimension of the topic structures of the two data sets, is presented.
The online version contains supplementary material available at 10.1007/s13534-023-00322-7.
本研究旨在通过基于潜在狄利克雷分配(LDA)主题建模的机器学习,从临床记录中预测糖尿病(DM)的进展。具体而言,从首尔国立大学医院门诊的电子病历(EMR)系统中收集了174427份糖尿病患者的临床记录。我们开发了一个模型来预测糖尿病并发症的发生。主题模型所生成的主题被用作我们机器学习模型的关键特征。所提出的模型概括了主题结构与并发症之间的相关性。该模型对所有四种并发症(糖尿病视网膜病变、糖尿病肾病、非酒精性脂肪性肝病和脑血管意外)都提供了可接受的预测性能。在采用极端梯度提升(XGBoost)时,我们获得了每种并发症类型预测的F1分数分别为0.844、0.921、0.831和0.762。本研究表明,基于主题建模的机器学习项目可以有效地预测疾病的进展。此外,还提出了一种独特的主题模型移植方法,该方法使两个数据集的主题结构维度相匹配。
在线版本包含可在10.1007/s13534-023-00322-7获取的补充材料。