Big Data Decision Institute (BDDI), Jinan University, Guangzhou, 510632, China.
Guangdong Engineering Technology Research Center for Big Data Precision Healthcare, Guangzhou, 510632, China.
Sci Rep. 2018 Nov 23;8(1):17298. doi: 10.1038/s41598-018-35487-0.
Acute Kidney Injury (AKI) is a common complication encountered among hospitalized patients, imposing significantly increased cost, morbidity, and mortality. Early prediction of AKI has profound clinical implications because currently no treatment exists for AKI once it develops. Feature selection (FS) is an essential process for building accurate and interpretable prediction models, but to our best knowledge no study has investigated the robustness and applicability of such selection process for AKI. In this study, we compared eight widely-applied FS methods for AKI prediction using nine-years of electronic medical records (EMR) and examined heterogeneity in feature rankings produced by the methods. FS methods were compared in terms of stability with respect to data sampling variation, similarity between selection results, and AKI prediction performance. Prediction accuracy did not intrinsically guarantee the feature ranking stability. Across different FS methods, the prediction performance did not change significantly, while the importance rankings of features were quite different. A positive correlation was observed between the complexity of suitable FS method and sample size. This study provides several practical implications, including recognizing the importance of feature stability as it is desirable for model reproducibility, identifying important AKI risk factors for further investigation, and facilitating early prediction of AKI.
急性肾损伤 (AKI) 是住院患者中常见的并发症,会显著增加成本、发病率和死亡率。AKI 的早期预测具有重要的临床意义,因为一旦 AKI 发生,目前尚无治疗方法。特征选择 (FS) 是构建准确且可解释的预测模型的必要过程,但据我们所知,尚无研究探讨该选择过程在 AKI 中的稳健性和适用性。在这项研究中,我们使用九年的电子病历 (EMR) 比较了 AKI 预测的八种广泛应用的 FS 方法,并检查了方法产生的特征排序的异质性。FS 方法在数据采样变化的稳定性、选择结果的相似性和 AKI 预测性能方面进行了比较。预测准确性并不能保证特征排序的稳定性。在不同的 FS 方法中,预测性能没有显著变化,而特征的重要性排名却大不相同。合适的 FS 方法的复杂性与样本量之间存在正相关关系。本研究提供了一些实际意义,包括认识到特征稳定性的重要性,因为它有利于模型的可重复性,确定 AKI 的重要风险因素以进行进一步研究,并促进 AKI 的早期预测。