Koo Dongjun, Lee Ah Ra, Lee Eunjoo, Kim Il Kon
School of Computer Science & Engineering, College of IT Engineering, Kyungpook National University, Daegu, Korea.
College of Nursing, Research Institute of Nursing Science, Kyungpook National University, Daegu, Korea.
Healthc Inform Res. 2022 Jul;28(3):231-239. doi: 10.4258/hir.2022.28.3.231. Epub 2022 Jul 31.
This paper aimed to use machine learning to identify a new group of factors predicting frailty in the elderly population by utilizing the existing frailty criteria as a basis, as well as to validate the obtained results.
This study was conducted using data from the Korean Frailty and Aging Cohort Study (KFACS). The KFACS participants were classified as robust or frail based on Fried's frailty phenotype and excluded if they did not properly answer the questions, resulting in 1,066 robust and 165 frail participants. We then selected influential features through feature selection and trained the model using support vector machine, random forest, and gradient boosting algorithms with the prepared dataset. Due to the imbalanced distribution in the dataset with a low sample size, holdout was applied with stratified 10-fold and cross-validation for estimating the model performance. The reliability of the constructed model was validated using an unseen test set. The model was then trained with hyperparameter optimization.
During the feature selection process, 27 features were identified as meaningful factors for frailty. The model was trained based on the selected features, and the weighted average F1-score reached 95.30% with the random forest algorithm.
The results of the study demonstrated the possibility of adopting machine learning to strengthen existing frailty criteria. As the method analyzes questionnaire responses in a short time, it can support higher volumes of data on participants' health conditions and alert them regarding potential risks in advance.
本文旨在以现有的衰弱标准为基础,利用机器学习识别预测老年人群衰弱的一组新因素,并验证所得结果。
本研究使用了韩国衰弱与老龄化队列研究(KFACS)的数据。根据弗里德衰弱表型将KFACS参与者分为健壮或衰弱两类,若未正确回答问题则被排除,最终得到1066名健壮参与者和165名衰弱参与者。然后通过特征选择挑选出有影响力的特征,并使用支持向量机、随机森林和梯度提升算法对准备好的数据集进行模型训练。由于数据集样本量小且分布不均衡,采用留出法并进行分层10折交叉验证来评估模型性能。使用一个未见过的测试集验证构建模型的可靠性。然后对模型进行超参数优化训练。
在特征选择过程中,确定了27个特征为衰弱的有意义因素。基于所选特征对模型进行训练,随机森林算法的加权平均F1分数达到95.30%。
研究结果表明采用机器学习强化现有衰弱标准具有可能性。由于该方法能在短时间内分析问卷回复,它可以支持更多关于参与者健康状况的数据,并提前提醒他们潜在风险。