Suppr超能文献

用于改善肌肉减少症预测任务的插补策略的比较研究。

Comparative study of imputation strategies to improve the sarcopenia prediction task.

作者信息

Karimov Shakhzod, Turimov Dilmurod, Kim Wooseong, Kim Jiyoun

机构信息

Department of Computer Engineering, Gachon University, Seongnam-si, Republic of Korea.

Department of Exercise Rehabilitation & Welfare, Gachon University, Incheon, Republic of Korea.

出版信息

Digit Health. 2025 Jan 17;11:20552076241301960. doi: 10.1177/20552076241301960. eCollection 2025 Jan-Dec.

Abstract

OBJECTIVE

Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study conducts a comparative analysis of three advanced methods-multiple imputation by chained equations (MICE), support vector regression, and K-nearest neighbors (KNN)-to address data completeness issues in sarcopenia research.

METHODS

Following imputation, we utilized machine learning models, including logistic regression, gradient boosting, support vector machine, and random forest, to classify sarcopenia. The methodology encompassed rigorous data preprocessing, normalization, and the synthetic minority oversampling technique to address class imbalance and ensure unbiased model performance.

RESULTS

The results revealed substantial variations in model accuracy based on the imputation method employed. The gradient boosting model consistently exhibited superior performance across all imputation strategies, demonstrating its robustness with imputed datasets. Additionally, KNN and MICE emerged as effective imputation techniques, preserving the original data distribution and enabling more accurate classification outcomes.

CONCLUSION

This study underscores the pivotal role of imputation methods in maintaining data integrity and enhancing predictive accuracy in sarcopenia research. The gradient boosting model's reliability across all strategies highlights its potential as a robust classifier, while the suitability of KNN and MICE for preserving data distribution supports their application in similar research contexts. These findings contribute to more reliable and valid insights in sarcopenia studies, ultimately supporting improved clinical outcomes.

摘要

目的

肌肉减少症是一种以骨骼肌质量和力量逐渐丧失为特征的病症,由于数据缺失,在研究中带来了重大挑战。不完整的数据集会破坏研究的准确性和可靠性,因此需要有效的插补技术。本研究对三种先进方法——链式方程多重插补(MICE)、支持向量回归和K近邻(KNN)——进行了比较分析,以解决肌肉减少症研究中的数据完整性问题。

方法

在插补之后,我们使用了包括逻辑回归、梯度提升、支持向量机和随机森林在内的机器学习模型来对肌肉减少症进行分类。该方法包括严格的数据预处理、归一化以及合成少数类过采样技术,以解决类别不平衡问题并确保模型性能无偏。

结果

结果显示,基于所采用的插补方法,模型准确性存在显著差异。梯度提升模型在所有插补策略中始终表现出卓越的性能,证明了其在插补数据集上的稳健性。此外,KNN和MICE成为有效的插补技术,保留了原始数据分布并实现了更准确的分类结果。

结论

本研究强调了插补方法在维持肌肉减少症研究中的数据完整性和提高预测准确性方面的关键作用。梯度提升模型在所有策略中的可靠性凸显了其作为稳健分类器的潜力,而KNN和MICE在保留数据分布方面的适用性支持了它们在类似研究背景中的应用。这些发现有助于在肌肉减少症研究中获得更可靠和有效的见解,最终支持改善临床结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2527/11748086/84987696844b/10.1177_20552076241301960-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验