Department of Biostatistics & Health Informatics, King's College London, London, UK.
Institute of Health Informatics, University College London, London, UK.
J Am Med Inform Assoc. 2020 Mar 1;27(3):437-443. doi: 10.1093/jamia/ocz211.
Current machine learning models aiming to predict sepsis from electronic health records (EHR) do not account 20 for the heterogeneity of the condition despite its emerging importance in prognosis and treatment. This work demonstrates the added value of stratifying the types of organ dysfunction observed in patients who develop sepsis in the intensive care unit (ICU) in improving the ability to recognize patients at risk of sepsis from their EHR data.
Using an ICU dataset of 13 728 records, we identify clinically significant sepsis subpopulations with distinct organ dysfunction patterns. We perform classification experiments with random forest, gradient boost trees, and support vector machines, using the identified subpopulations to distinguish patients who develop sepsis in the ICU from those who do not.
The classification results show that features selected using sepsis subpopulations as background knowledge yield a superior performance in distinguishing septic from non-septic patients regardless of the classification model used. The improved performance is especially pronounced in specificity, which is a current bottleneck in sepsis prediction machine learning models.
Our findings can steer machine learning efforts toward more personalized models for complex conditions including sepsis.
目前旨在通过电子健康记录(EHR)预测脓毒症的机器学习模型并未考虑到该疾病的异质性,尽管其在预后和治疗方面的重要性日益凸显。本研究旨在通过对在重症监护病房(ICU)发生脓毒症的患者所观察到的器官功能障碍类型进行分层,来提高从 EHR 数据中识别脓毒症高危患者的能力,从而展示其附加价值。
我们使用包含 13728 条记录的 ICU 数据集,确定了具有不同器官功能障碍模式的临床显著脓毒症亚群。我们使用随机森林、梯度提升树和支持向量机进行分类实验,使用所确定的亚群来区分在 ICU 中发生脓毒症的患者和未发生脓毒症的患者。
分类结果表明,无论使用哪种分类模型,使用脓毒症亚群作为背景知识选择的特征在区分脓毒症患者和非脓毒症患者方面都具有更好的性能。在特异性方面,这种改进的性能尤其明显,特异性是目前脓毒症预测机器学习模型的一个瓶颈。
我们的研究结果可以指导机器学习工作,针对包括脓毒症在内的复杂疾病开发更具个性化的模型。