Zhou Yuan, Swoboda Thomas K, Ye Zehao, Barbaro Michael, Blalock Jake, Zheng Danny, Wang Hao
Department of Industrial, Manufacturing, and Systems Engineering, The University of Texas at Arlington, Arlington, TX 76109, USA.
Department of Emergency Medicine, The Valley Health System, Touro University Nevada School of Osteopathic Medicine, Las Vegas, NV 89144, USA.
J Clin Med Res. 2023 Mar;15(3):133-138. doi: 10.14740/jocmr4862. Epub 2023 Mar 28.
Different machine learning (ML) technologies have been applied in healthcare systems with diverse applications. We aimed to determine the model feasibility and accuracy of predicting patient portal use among diabetic patients by using six different ML algorithms. In addition, we also compared model performance accuracy with the use of only essential variables.
This was a single-center retrospective observational study. From March 1, 2019 to February 28, 2020, we included all diabetic patients from the study emergency department (ED). The primary outcome was the status of patient portal use. A total of 18 variables consisting of patient sociodemographic characteristics, ED and clinic information, and patient medical conditions were included to predict patient portal use. Six ML algorithms (logistic regression, random forest (RF), deep forest, decision tree, multilayer perception, and support vector machine) were used for such predictions. During the initial step, ML predictions were performed with all variables. Then, the essential variables were chosen via feature selection. Patient portal use predictions were repeated with only essential variables. The performance accuracies (overall accuracy, sensitivity, specificity, and area under receiver operating characteristic curve (AUC)) of patient portal predictions were compared.
A total of 77,977 unique patients were placed in our final analysis. Among them, 23.4% (18,223) patients were diabetic mellitus (DM). Patient portal use was found in 26.9% of DM patients. Overall, the accuracy of predicting patient portal use was above 80% among five out of six ML algorithms. The RF outperformed the others when all variables were used for patient portal predictions (accuracy 0.9876, sensitivity 0.9454, specificity 0.9969, and AUC 0.9712). When only eight essential variables were chosen, RF still outperformed the others (accuracy 0.9876, sensitivity 0.9374, specificity 0.9932, and AUC 0.9769).
It is possible to predict patient portal use outcomes when different ML algorithms are used with fair performance accuracy. However, with similar prediction accuracies, the use of feature selection techniques can improve the interpretability of the model by addressing the most relevant features.
不同的机器学习(ML)技术已应用于医疗保健系统,具有多种应用。我们旨在通过使用六种不同的ML算法来确定预测糖尿病患者使用患者门户网站的模型可行性和准确性。此外,我们还将模型性能准确性与仅使用基本变量的情况进行了比较。
这是一项单中心回顾性观察研究。从2019年3月1日至2020年2月28日,我们纳入了研究急诊科(ED)的所有糖尿病患者。主要结局是患者使用门户网站的情况。总共纳入了18个变量,包括患者的社会人口统计学特征、急诊科和诊所信息以及患者的医疗状况,以预测患者使用门户网站的情况。使用六种ML算法(逻辑回归、随机森林(RF)、深度森林、决策树、多层感知器和支持向量机)进行此类预测。在初始步骤中,使用所有变量进行ML预测。然后,通过特征选择选择基本变量。仅使用基本变量重复进行患者门户网站使用情况的预测。比较了患者门户网站预测的性能准确性(总体准确性、敏感性、特异性和受试者工作特征曲线下面积(AUC))。
共有77977名独特患者纳入我们的最终分析。其中,23.4%(18223名)患者患有糖尿病(DM)。在26.9%的DM患者中发现了患者使用门户网站的情况。总体而言,六种ML算法中有五种预测患者使用门户网站的准确性高于80%。当所有变量用于患者门户网站预测时,随机森林的表现优于其他算法(准确性0.9876,敏感性0.9454,特异性0.9969,AUC 0.9712)。当仅选择八个基本变量时,随机森林仍然优于其他算法(准确性0.9876,敏感性0.9374,特异性0.9932,AUC 0.9769)。
当使用不同的ML算法时,可以以相当的性能准确性预测患者使用门户网站的结果。然而,在预测准确性相似的情况下,使用特征选择技术可以通过关注最相关的特征来提高模型的可解释性。