Dept. of Electronics & Communication Engineering, National Institute of Technology, Goa, India.
School of Industrial Engineering, Purdue University, USA.
Comput Biol Med. 2021 Jul;134:104430. doi: 10.1016/j.compbiomed.2021.104430. Epub 2021 May 7.
Early detection of sepsis can facilitate early clinical intervention with effective treatment and may reduce sepsis mortality rates. In view of this, machine learning-based automated diagnosis of sepsis using easily recordable physiological data can be more promising as compared to the gold standard rule-based clinical criteria in current practice. This study aims to develop such a machine learning framework that demonstrates the quantification of heterogeneity within the tabular electronic health records (EHR) data of clinical covariates to capture both linear relationships and nonlinear correlation for the early prediction of sepsis. Here, the statistics of pairwise association for each hour-covariate pair within the EHR data for every 6-hours window-duration with selected 24 covariates is described using pointwise mutual information (PMI) matrix. This matrix gives the heterogeneity of data as a two-dimensional map. Such matrices are fused horizontally along the z-axis as vertical slices in the xy plane to form a 3-way tensor for each record with the corresponding Length of Stay (L). Tensor factorization of such fused tensor for every record is performed using Tucker decomposition, and only the core tensors are retained later, excluding the 3 unitary matrices to provide the latent feature set for the prediction of sepsis onset. A five-fold cross-validation scheme is employed wherein the obtained 120 latent features from the reshaped core tensor, are fed to Light Gradient Boosting Machine Learning models (LightGBM) for binary classification, further alleviating the involved class imbalance. The machine-learning framework is designed via Bayesian optimization, yielding an average normalized utility score of 0.4519 as defined by challenge organizers and area under the receiver operating characteristic curve (AUROC) of 0.8621 on publicly available PhysioNet/Computing in Cardiology Challenge 2019 training data. The proposed tensor decomposition of 3-way fused tensor formulated using PMI matrices leverages higher-order temporal interactions between the pairwise associations among the clinical values for early prediction of sepsis. This is validated with improved risk prediction power for every hour of admission to the ICU in terms of utility score, AUROC, and F1 score. The results obtained show a significant improvement particularly in terms of utility score of ~1.5-2% under a 5-fold cross-validation scheme on entire training data as compared to a top entrant research study that participated in the challenge.
早期发现脓毒症可以通过有效的治疗进行早期临床干预,并可能降低脓毒症的死亡率。鉴于此,与当前实践中的基于金标准规则的临床标准相比,使用易于记录的生理数据基于机器学习的脓毒症自动诊断可能更有希望。本研究旨在开发这样一种机器学习框架,该框架展示了对表型电子健康记录(EHR)数据中临床协变量的异质性进行量化,以捕捉线性关系和非线性相关性,从而实现脓毒症的早期预测。在这里,使用点互信息(PMI)矩阵描述了 EHR 数据中每个 6 小时窗口持续时间内每小时协变量对之间的统计信息,以选择 24 个协变量。该矩阵给出了数据的异质性作为二维图。沿 z 轴将这些矩阵水平融合作为 xy 平面中的垂直切片,为每个记录形成一个 3 向张量,对应于相应的住院时间(L)。对每个记录的这种融合张量进行张量分解,使用 Tucker 分解,仅保留后面的核心张量,排除 3 个单位矩阵,为脓毒症发作的预测提供潜在特征集。采用五折交叉验证方案,从重新成形的核心张量中获得 120 个潜在特征,将其输入到轻梯度提升机器学习模型(LightGBM)中进行二分类,进一步减轻了所涉及的类别不平衡。通过贝叶斯优化设计机器学习框架,根据挑战组织者的定义,平均归一化效用评分 0.4519,以及在公开可用的 PhysioNet/Computing in Cardiology Challenge 2019 训练数据上的接收者操作特征曲线(AUROC)为 0.8621。使用 PMI 矩阵构建的 3 向融合张量的张量分解利用了临床值之间的成对关联之间的高阶时间交互,以实现脓毒症的早期预测。这通过在 ICU 入院的每小时的效用评分、AUROC 和 F1 评分方面的风险预测能力的提高得到了验证。与参加挑战的一项顶级参赛研究相比,在整个训练数据的 5 折交叉验证方案下,获得的结果在效用评分方面有显著提高,尤其是在~1.5-2%左右。