基于电子健康记录的机器学习预测2型糖尿病患者低血糖风险:开发与验证
Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record-Based Machine Learning: Development and Validation.
作者信息
Yang Hao, Li Jiaxi, Liu Siru, Yang Xiaoling, Liu Jialin
机构信息
Information Center, West China Hospital, Sichuan University, Chengdu, China.
Department of Clinical Laboratory Medicine, Jinniu Maternity and Child Health Hospital of Chengdu, Chengdu, China.
出版信息
JMIR Med Inform. 2022 Jun 16;10(6):e36958. doi: 10.2196/36958.
BACKGROUND
Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed.
OBJECTIVE
The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes.
METHODS
We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance.
RESULTS
We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA.
CONCLUSIONS
The Paragraph Vector-Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes.
背景
低血糖是糖尿病治疗中常见的不良事件。为有效应对低血糖,需要开发有效的低血糖预测模型。
目的
本研究旨在开发并验证机器学习模型,以预测成年2型糖尿病患者的低血糖风险。
方法
我们使用了2019年11月至2021年12月期间入住华西医院的所有成年2型糖尿病患者的电子健康记录。基于XGBoost和自然语言处理开发预测模型。F1分数、受试者操作特征曲线下面积(AUC)和决策曲线分析(DCA)用作评估模型性能的主要标准。
结果
我们纳入了29843例2型糖尿病患者,其中2804例(9.4%)发生了低血糖。在本研究中,嵌入机器学习模型(XGBoost3)在所有模型中表现最佳。XGBoost的AUC和准确率分别为0.82和0.93。XGboost3在DCA方面也优于其他模型。
结论
段落向量分布式内存模型可以有效提取特征并提高XGBoost模型的性能,进而有效预测2型糖尿病患者的低血糖。