Jajcay Nikola, Bezak Branislav, Segev Amitai, Matetzky Shlomi, Jankova Jana, Spartalis Michael, El Tahlawi Mohammad, Guerra Federico, Friebel Julian, Thevathasan Tharusan, Berta Imrich, Pölzl Leo, Nägele Felix, Pogran Edita, Cader F Aaysha, Jarakovic Milana, Gollmann-Tepeköylü Can, Kollarova Marta, Petrikova Katarina, Tica Otilia, Krychtiuk Konstantin A, Tavazzi Guido, Skurk Carsten, Huber Kurt, Böhm Allan
Premedix Academy, Bratislava, Slovakia.
Department of Complex Systems, Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic.
Front Cardiovasc Med. 2023 Mar 23;10:1132680. doi: 10.3389/fcvm.2023.1132680. eCollection 2023.
Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS.
We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction.
We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization.
We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
机器学习的最新进展为处理和分析观察性患者数据以预测患者预后提供了新的可能性。在本文中,我们介绍了一种用于从患有急性冠状动脉综合征的重症监护病房患者的MIMIC III数据库中预测心源性休克(CS)的数据处理流程。识别高危患者的能力可能会使我们能够采取预防措施,从而防止心源性休克的发生。
我们主要关注通过生成插补流程并比较各种多变量插补算法(包括k近邻算法、两种基于奇异值分解(SVD)的方法以及链式方程多重插补法)来插补缺失数据的技术。插补后,我们从插补数据集中选择最终的研究对象和变量,并展示使用基于树的分类器的心源性休克预测梯度提升框架的性能。
由于进行了数据清理和插补,我们在未进行超参数优化的情况下取得了良好的分类性能(交叉验证的曲线下平均面积为0.805)。
我们相信我们的预处理流程对其他分类和回归实验也将有所帮助。