Chen Ming-Shu, Liu Tzu-Chi, Jhou Mao-Jhen, Yang Chih-Te, Lu Chi-Jie
Department of Healthcare Administration, College of Healthcare & Management, Asia Eastern University of Science and Technology, New Taipei City 220, Taiwan.
Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan.
Diagnostics (Basel). 2024 Apr 17;14(8):825. doi: 10.3390/diagnostics14080825.
Longitudinal data, while often limited, contain valuable insights into features impacting clinical outcomes. To predict the progression of chronic kidney disease (CKD) in patients with metabolic syndrome, particularly those transitioning from stage 3a to 3b, where data are scarce, utilizing feature ensemble techniques can be advantageous. It can effectively identify crucial risk factors, influencing CKD progression, thereby enhancing model performance. Machine learning (ML) methods have gained popularity due to their ability to perform feature selection and handle complex feature interactions more effectively than traditional approaches. However, different ML methods yield varying feature importance information. This study proposes a multiphase hybrid risk factor evaluation scheme to consider the diverse feature information generated by ML methods. The scheme incorporates variable ensemble rules (VERs) to combine feature importance information, thereby aiding in the identification of important features influencing CKD progression and supporting clinical decision making. In the proposed scheme, we employ six ML models-Lasso, RF, MARS, LightGBM, XGBoost, and CatBoost-each renowned for its distinct feature selection mechanisms and widespread usage in clinical studies. By implementing our proposed scheme, thirteen features affecting CKD progression are identified, and a promising AUC score of 0.883 can be achieved when constructing a model with them.
纵向数据虽然通常有限,但包含了对影响临床结果的特征的宝贵见解。为了预测代谢综合征患者慢性肾脏病(CKD)的进展,特别是那些从3a期过渡到3b期且数据稀缺的患者,利用特征集成技术可能具有优势。它可以有效地识别影响CKD进展的关键风险因素,从而提高模型性能。机器学习(ML)方法因其能够比传统方法更有效地进行特征选择和处理复杂的特征交互而受到欢迎。然而,不同的ML方法会产生不同的特征重要性信息。本研究提出了一种多阶段混合风险因素评估方案,以考虑ML方法生成的各种特征信息。该方案纳入了可变集成规则(VER)来组合特征重要性信息,从而有助于识别影响CKD进展的重要特征并支持临床决策。在所提出的方案中,我们使用了六种ML模型——套索回归(Lasso)、随机森林(RF)、多元自适应回归样条(MARS)、轻量级梯度提升机(LightGBM)、极端梯度提升(XGBoost)和类别提升(CatBoost)——每种模型都以其独特的特征选择机制和在临床研究中的广泛应用而闻名。通过实施我们提出的方案,识别出了13个影响CKD进展的特征,使用这些特征构建模型时可以获得0.883的良好曲线下面积(AUC)分数。