Department of Biomedical Engineering, Amirkabir University of Technology, Iran.
J Biomed Inform. 2018 Mar;79:48-59. doi: 10.1016/j.jbi.2018.02.008. Epub 2018 Feb 19.
Electronic health records (EHRs) contain critical information useful for clinical studies. Early assessment of patients' mortality in intensive care units is of great importance. In this paper, a Deep Rule-Based Fuzzy System (DRBFS) was proposed to develop an accurate in-hospital mortality prediction in the intensive care unit (ICU) patients employing a large number of input variables. Our main contribution is proposing a system, which is capable of dealing with big data with heterogeneous mixed categorical and numeric attributes. In DRBFS, the hidden layer in each unit is represented by interpretable fuzzy rules. Benefiting the strength of soft partitioning, a modified supervised fuzzy k-prototype clustering has been employed for fuzzy rule generation. According to the stacked approach, the same input space is kept in every base building unit of DRBFS. The training set in addition to random shifts, obtained from random projections of prediction results of the current base building unit is presented as the input of the next base building unit. A cohort of 10,972 adult admissions was selected from Medical Information Mart for Intensive Care (MIMIC-III) data set, where 9.31% of patients have died in the hospital. A heterogeneous feature set of first 48 h from ICU admissions, were extracted for in-hospital mortality rate. Required preprocessing and appropriate feature extraction were applied. To avoid biased assessments, performance indexes were calculated using holdout validation. We have evaluated our proposed method with several common classifiers including naïve Bayes (NB), decision trees (DT), Gradient Boosting (GB), Deep Belief Networks (DBN) and D-TSK-FC. The area under the receiver operating characteristics curve (AUROC) for NB, DT, GB, DBN, D-TSK-FC and our proposed method were 73.51%, 61.81%, 72.98%, 70.07%, 66.74% and 73.90% respectively. Our results have demonstrated that DRBFS outperforms various methods, while maintaining interpretable rule bases. Besides, benefiting from specific clustering methods, DRBFS can be well scaled up for large heterogeneous data sets.
电子健康记录 (EHR) 包含对临床研究有用的关键信息。早期评估重症监护病房患者的死亡率具有重要意义。本文提出了一种基于深度规则的模糊系统 (DRBFS),用于利用大量输入变量开发重症监护病房 (ICU) 患者的准确住院死亡率预测。我们的主要贡献是提出了一种能够处理具有异质混合分类和数值属性的大数据的系统。在 DRBFS 中,每个单元的隐藏层由可解释的模糊规则表示。受益于软分区的优势,采用改进的有监督模糊 k-原型聚类进行模糊规则生成。根据堆叠方法,相同的输入空间保留在 DRBFS 的每个基础构建单元中。训练集除了随机移位外,还从当前基础构建单元的预测结果的随机投影中获得,并作为下一个基础构建单元的输入。从医疗信息重症监护 (MIMIC-III) 数据集选择了 10972 名成人入院患者的队列,其中 9.31%的患者在医院死亡。从 ICU 入院的前 48 小时提取了一组异质特征集,用于计算住院死亡率。应用了所需的预处理和适当的特征提取。为了避免有偏见的评估,使用留一验证计算性能指标。我们使用几种常见的分类器评估了我们提出的方法,包括朴素贝叶斯 (NB)、决策树 (DT)、梯度提升 (GB)、深度置信网络 (DBN) 和 D-TSK-FC。NB、DT、GB、DBN、D-TSK-FC 和我们提出的方法的接收器操作特征曲线下面积 (AUROC) 分别为 73.51%、61.81%、72.98%、70.07%、66.74%和 73.90%。我们的结果表明,DRBFS 优于各种方法,同时保持可解释的规则基础。此外,受益于特定的聚类方法,DRBFS 可以很好地扩展到大型异质数据集。