Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an, 710049, People's Republic of China.
Department of Industrial and Systems Engineering, University of Washington, Seattle, USA.
BMC Med Inform Decis Mak. 2021 Nov 28;21(1):334. doi: 10.1186/s12911-021-01690-9.
Sepsis, defined as life-threatening organ dysfunction caused by a dysregulated host response to infection, has become one of the major causes of death in Intensive Care Units (ICUs). The heterogeneity and complexity of this syndrome lead to the absence of golden standards for its diagnosis, treatment, and prognosis. The early prediction of in-hospital mortality for sepsis patients is not only meaningful to medical decision making, but more importantly, relates to the well-being of patients.
In this paper, a rule discovery and analysis (rule-based) method is used to predict the in-hospital death events of 2021 ICU patients diagnosed with sepsis using the MIMIC-III database. The method mainly includes two phases: rule discovery phase and rule analysis phase. In the rule discovery phase, the RuleFit method is employed to mine multiple hidden rules which are capable to predict individual in-hospital death events. In the rule analysis phase, survival analysis and decomposition analysis are carried out to test and justify the risk prediction ability of these rules. Then by leveraging a subset of these rules, we establish a prediction model that is both more accurate at the in-hospital death prediction task and more interpretable than most comparable methods.
In our experiment, RuleFit generates 77 risk prediction rules, and the average area under the curve (AUC) of the prediction model based on 62 of these rules reaches 0.781 ([Formula: see text]) which is comparable to or even better than the AUC of existing methods (i.e., commonly used medical scoring system and benchmark machine learning models). External validation of the prediction power of these 62 rules on another 1468 sepsis patients not included in MIMIC-III in ICU provides further supporting evidence for the superiority of the rule-based method. In addition, we discuss and explain in detail the rules with better risk prediction ability. Glasgow Coma Scale (GCS), serum potassium, and serum bilirubin are found to be the most important risk factors for predicting patient death.
Our study demonstrates that, with the rule-based method, we could not only make accurate prediction on in-hospital death events of sepsis patients, but also reveal the complex relationship between sepsis-related risk factors through the rules themselves, so as to improve our understanding of the complexity of sepsis as well as its population.
败血症是一种由宿主对感染的失调反应导致危及生命的器官功能障碍的疾病,已成为重症监护病房(ICU)死亡的主要原因之一。该综合征的异质性和复杂性导致其诊断、治疗和预后均无金标准。早期预测败血症患者的院内死亡率不仅对医疗决策具有重要意义,更重要的是与患者的健康状况息息相关。
本研究使用 MIMIC-III 数据库,采用规则发现和分析(基于规则)的方法,对 2021 年 ICU 确诊败血症的患者的院内死亡事件进行预测。该方法主要包括两个阶段:规则发现阶段和规则分析阶段。在规则发现阶段,使用 RuleFit 方法挖掘能够预测个体院内死亡事件的多个隐藏规则。在规则分析阶段,进行生存分析和分解分析,以检验和证明这些规则的风险预测能力。然后,利用这些规则的一个子集,建立一个预测模型,该模型在院内死亡预测任务中的准确性更高,且比大多数可比方法更具可解释性。
在我们的实验中,RuleFit 生成了 77 条风险预测规则,基于其中 62 条规则的预测模型的平均曲线下面积(AUC)达到 0.781([Formula: see text]),与现有方法(即常用的医疗评分系统和基准机器学习模型)的 AUC 相当或更好。在 MIMIC-III 中未包含的另外 1468 名 ICU 败血症患者中对这 62 条规则的预测能力进行外部验证,为基于规则的方法的优越性提供了进一步的支持证据。此外,我们还详细讨论和解释了具有更好风险预测能力的规则。格拉斯哥昏迷量表(GCS)、血清钾和血清胆红素被发现是预测患者死亡的最重要的风险因素。
本研究表明,通过基于规则的方法,我们不仅可以对败血症患者的院内死亡事件进行准确预测,还可以通过规则本身揭示败血症相关风险因素之间的复杂关系,从而提高我们对败血症复杂性及其人群的认识。