Hughes Charmayne Mary Lee, Zhang Yan, Pourhossein Ali, Jurasova Terezia
Age-Appropriate Human-Machine Systems, Institute of Psychology and Ergonomics, Technische Universität Berlin, Berlin, Germany.
Front Aging. 2025 Apr 22;6:1501168. doi: 10.3389/fragi.2025.1501168. eCollection 2025.
Physical frailty is a pressing public health issue that significantly increases the risk of disability, hospitalization, and mortality. Early and accurate detection of frailty is essential for timely intervention, reducing its widespread impact on healthcare systems, social support networks, and economic stability.
This study aimed to classify frailty status into binary (frail vs. non-frail) and multi-class (frail vs. pre-frail vs. non-frail) categories. The goal was to detect and classify frailty status at a specific point in time. Model development and internal validation were conducted using data from wave 8 of the English Longitudinal Study of Ageing (ELSA), with external validation using wave 6 data to assess model generalizability.
Nine classification algorithms, including Logistic Regression, Random Forest, K-nearest Neighbor, Gradient Boosting, AdaBoost, XGBoost, LightGBM, CatBoost, and Multi-Layer Perceptron, were evaluated and their performance compared.
CatBoost demonstrated the best overall performance in binary classification, achieving high recall (0.951), balanced accuracy (0.928), and the lowest Brier score (0.049) on the internal validation set, and maintaining strong performance externally with a recall of 0.950, balanced accuracy of 0.913, and F1-score of 0.951. Multi-class classification was more challenging, with Gradient Boosting emerging as the top model, achieving the highest recall (0.666) and precision (0.663) on the external validation set, with a strong F1-score (0.664) and reasonable calibration (Brier Score = 0.223).
Machine learning algorithms show promise for the detection of current frailty status, particularly in binary classification. However, distinguishing between frailty subcategories remains challenging, highlighting the need for improved models and feature selection strategies to enhance multi-class classification accuracy.
身体虚弱是一个紧迫的公共卫生问题,会显著增加残疾、住院和死亡风险。早期准确检测虚弱对于及时干预至关重要,可减少其对医疗系统、社会支持网络和经济稳定的广泛影响。
本研究旨在将虚弱状态分为二元类别(虚弱与非虚弱)和多类别(虚弱与脆弱前期与非虚弱)。目标是在特定时间点检测并分类虚弱状态。使用英国老年纵向研究(ELSA)第8波的数据进行模型开发和内部验证,使用第6波数据进行外部验证以评估模型的通用性。
评估了九种分类算法,包括逻辑回归、随机森林、K近邻、梯度提升、AdaBoost、XGBoost、LightGBM、CatBoost和多层感知器,并比较了它们的性能。
CatBoost在二元分类中表现出最佳的整体性能,在内部验证集上实现了高召回率(0.951)、平衡准确率(0.928)和最低布里尔分数(0.049),在外部验证中也保持了强劲性能,召回率为0.950,平衡准确率为0.913,F1分数为0.951。多类别分类更具挑战性,梯度提升成为顶级模型,在外部验证集上实现了最高召回率(0.666)和精确率(0.663),F1分数较高(0.664)且校准合理(布里尔分数 = 0.223)。
机器学习算法在检测当前虚弱状态方面显示出前景,特别是在二元分类中。然而,区分虚弱亚类仍然具有挑战性,这突出了需要改进模型和特征选择策略以提高多类别分类准确性。