Schvetz Maya, Fuchs Lior, Novack Victor, Moskovitch Robert
Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer-Sheva, Israel.
Medical Intensive Care Unit and Clinical Research Center, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
J Biomed Inform. 2021 May;117:103734. doi: 10.1016/j.jbi.2021.103734. Epub 2021 Mar 9.
Outcomes' prediction in Electronic Health Records (EHR) and specifically in Critical Care is increasingly attracting more exploration and research. In this study, we used clinical data from the Intensive Care Unit (ICU), focusing on ICU acquired sepsis. Looking at the current literature, several evaluation approaches are reported, inspired by epidemiological designs, in which some do not always reflect real-life application's conditions. This problem seems relevant generally to outcomes' prediction in longitudinal EHR data, or generally longitudinal data, while in this study we focused on ICU data. Unlike in most previous studies that investigated all sepsis admissions, we focused specifically on ICU-Acquired Sepsis. Due to the sparse nature of the longitudinal data, we employed the use of Temporal Abstraction and Time Interval-Related Patterns discovery, which are further used as classification features. Two experiments were designed using three different outcomes prediction study designs from the literature, implementing various levels of real-life conditions to evaluate the prediction models. The first experiment focused on predicting whether a patient would suffer from ICU-acquired sepsis and when during her admission, given a sliding observation time window, and the comparison of the three study designs behavior. The second experiment focused only on predicting whether the patient will suffer from ICU-acquired sepsis, based on data taken relatively to his admission start time. Our results show that using Temporal Discretization for Classification (TD4C) led to better performance than using the Equal-Width Discretization, Knowledge-Based, or SAX. Also, using two states abstraction was better than three or four. Using the default Binary TIRP representation method performed better than Mean Duration, Horizontal Support, and horizontally normalized horizontal support. Using XGBoost as a classifier performed better than Logistic Regression, Neural Net, or Random Forest. Additionally, it is demonstrated why the use of case-crossover-control is most appropriate for real life application conditions evaluation, unlike other incomplete designs that may even result in "better performance".
电子健康记录(EHR)中的结果预测,尤其是重症监护中的结果预测,正越来越多地吸引着更多的探索和研究。在本研究中,我们使用了重症监护病房(ICU)的临床数据,重点关注ICU获得性脓毒症。纵观当前文献,受流行病学设计启发,报告了几种评估方法,其中一些并不总是反映实际应用的情况。这个问题似乎普遍与纵向EHR数据或一般纵向数据中的结果预测相关,而在本研究中我们专注于ICU数据。与大多数先前研究调查所有脓毒症入院病例不同,我们特别关注ICU获得性脓毒症。由于纵向数据的稀疏性,我们采用了时间抽象和时间间隔相关模式发现方法,并将其进一步用作分类特征。使用文献中的三种不同结果预测研究设计进行了两个实验,实施了不同程度的实际情况来评估预测模型。第一个实验重点是在给定滑动观察时间窗口的情况下,预测患者是否会患上ICU获得性脓毒症以及在其住院期间何时患病,并比较三种研究设计的行为。第二个实验仅基于相对于患者入院开始时间的数据,重点预测患者是否会患上ICU获得性脓毒症。我们的结果表明,使用时间离散化分类(TD4C)比使用等宽离散化、基于知识的方法或SAX具有更好的性能。此外,使用两种状态抽象比三种或四种状态更好。使用默认的二元时间间隔相关模式(TIRP)表示方法比平均持续时间、水平支持度和水平归一化水平支持度表现更好。使用XGBoost作为分类器比逻辑回归、神经网络或随机森林表现更好。此外,还证明了为什么与其他可能甚至导致“更好性能”的不完整设计不同,使用病例交叉对照最适合实际应用情况评估。