Pan Cheng, Luo Hao, Cheung Gary, Zhou Huiquan, Cheng Reynold, Cullum Sarah, Wu Chuan
Department of Computer Science, The University of Hong Kong, Hong Kong, China (Hong Kong).
Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong, China (Hong Kong).
JMIR AI. 2024 Jan 31;3:e44185. doi: 10.2196/44185.
Machine learning techniques are starting to be used in various health care data sets to identify frail persons who may benefit from interventions. However, evidence about the performance of machine learning techniques compared to conventional regression is mixed. It is also unclear what methodological and database factors are associated with performance.
This study aimed to compare the mortality prediction accuracy of various machine learning classifiers for identifying frail older adults in different scenarios.
We used deidentified data collected from older adults (65 years of age and older) assessed with interRAI-Home Care instrument in New Zealand between January 1, 2012, and December 31, 2016. A total of 138 interRAI assessment items were used to predict 6-month and 12-month mortality, using 3 machine learning classifiers (random forest [RF], extreme gradient boosting [XGBoost], and multilayer perceptron [MLP]) and regularized logistic regression. We conducted a simulation study comparing the performance of machine learning models with logistic regression and interRAI Home Care Frailty Scale and examined the effects of sample sizes, the number of features, and train-test split ratios.
A total of 95,042 older adults (median age 82.66 years, IQR 77.92-88.76; n=37,462, 39.42% male) receiving home care were analyzed. The average area under the curve (AUC) and sensitivities of 6-month mortality prediction showed that machine learning classifiers did not outperform regularized logistic regressions. In terms of AUC, regularized logistic regression had better performance than XGBoost, MLP, and RF when the number of features was ≤80 and the sample size ≤16,000; MLP outperformed regularized logistic regression in terms of sensitivities when the number of features was ≥40 and the sample size ≥4000. Conversely, RF and XGBoost demonstrated higher specificities than regularized logistic regression in all scenarios.
The study revealed that machine learning models exhibited significant variation in prediction performance when evaluated using different metrics. Regularized logistic regression was an effective model for identifying frail older adults receiving home care, as indicated by the AUC, particularly when the number of features and sample sizes were not excessively large. Conversely, MLP displayed superior sensitivity, while RF exhibited superior specificity when the number of features and sample sizes were large.
机器学习技术开始应用于各种医疗保健数据集,以识别可能从干预措施中受益的体弱人群。然而,与传统回归相比,机器学习技术性能的证据好坏参半。目前也不清楚哪些方法学和数据库因素与性能相关。
本研究旨在比较各种机器学习分类器在不同场景下识别体弱老年人的死亡率预测准确性。
我们使用了2012年1月1日至2016年12月31日期间在新西兰使用interRAI家庭护理工具对老年人(65岁及以上)进行评估时收集的去识别化数据。总共138个interRAI评估项目用于预测6个月和12个月的死亡率,使用3种机器学习分类器(随机森林[RF]、极端梯度提升[XGBoost]和多层感知器[MLP])以及正则化逻辑回归。我们进行了一项模拟研究,比较机器学习模型与逻辑回归以及interRAI家庭护理体弱量表的性能,并研究样本量、特征数量和训练-测试分割比例的影响。
共分析了95,042名接受家庭护理的老年人(中位年龄82.66岁,IQR 77.92-88.76;n = 37,462,男性占39.42%)。6个月死亡率预测的曲线下平均面积(AUC)和敏感性表明,机器学习分类器的表现并不优于正则化逻辑回归。在AUC方面,当特征数量≤80且样本量≤16,000时,正则化逻辑回归的性能优于XGBoost、MLP和RF;当特征数量≥40且样本量≥4000时,MLP在敏感性方面优于正则化逻辑回归。相反,在所有场景中,RF和XGBoost的特异性均高于正则化逻辑回归。
该研究表明,使用不同指标评估时,机器学习模型的预测性能存在显著差异。正则化逻辑回归是识别接受家庭护理的体弱老年人的有效模型,AUC表明了这一点,特别是当特征数量和样本量不过大时。相反,当特征数量和样本量较大时,MLP表现出更高的敏感性,而RF表现出更高的特异性。