Ayad Ahmad, Hallawa Ahmed, Peine Arne, Martin Lukas, Fazlic Lejla Begic, Dartmann Guido, Marx Gernot, Schmeink Anke
Chair of Information Theory and Data Analytics, Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany.
Department of Intensive Care and Intermediate Care, University Hospital Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany.
JMIR Med Inform. 2022 Aug 24;10(8):e37658. doi: 10.2196/37658.
In recent years, the volume of medical knowledge and health data has increased rapidly. For example, the increased availability of electronic health records (EHRs) provides accurate, up-to-date, and complete information about patients at the point of care and enables medical staff to have quick access to patient records for more coordinated and efficient care. With this increase in knowledge, the complexity of accurate, evidence-based medicine tends to grow all the time. Health care workers must deal with an increasing amount of data and documentation. Meanwhile, relevant patient data are frequently overshadowed by a layer of less relevant data, causing medical staff to often miss important values or abnormal trends and their importance to the progression of the patient's case.
The goal of this work is to analyze the current laboratory results for patients in the intensive care unit (ICU) and classify which of these lab values could be abnormal the next time the test is done. Detecting near-future abnormalities can be useful to support clinicians in their decision-making process in the ICU by drawing their attention to the important values and focus on future lab testing, saving them both time and money. Additionally, it will give doctors more time to spend with patients, rather than skimming through a long list of lab values.
We used Structured Query Language to extract 25 lab values for mechanically ventilated patients in the ICU from the MIMIC-III and eICU data sets. Additionally, we applied time-windowed sampling and holding, and a support vector machine to fill in the missing values in the sparse time series, as well as the Tukey range to detect and delete anomalies. Then, we used the data to train 4 deep learning models for time series classification, as well as a gradient boosting-based algorithm and compared their performance on both data sets.
The models tested in this work (deep neural networks and gradient boosting), combined with the preprocessing pipeline, achieved an accuracy of at least 80% on the multilabel classification task. Moreover, the model based on the multiple convolutional neural network outperformed the other algorithms on both data sets, with the accuracy exceeding 89%.
In this work, we show that using machine learning and deep neural networks to predict near-future abnormalities in lab values can achieve satisfactory results. Our system was trained, validated, and tested on 2 well-known data sets to ensure that our system bridged the reality gap as much as possible. Finally, the model can be used in combination with our preprocessing pipeline on real-life EHRs to improve patients' diagnosis and treatment.
近年来,医学知识和健康数据量迅速增长。例如,电子健康记录(EHR)可用性的提高提供了有关患者在护理点的准确、最新和完整信息,并使医务人员能够快速访问患者记录以进行更协调和高效的护理。随着知识的增加,准确的循证医学的复杂性往往一直在增长。医护人员必须处理越来越多的数据和文档。与此同时,相关患者数据经常被一层不太相关的数据所掩盖,导致医务人员经常错过重要值或异常趋势及其对患者病情进展的重要性。
这项工作的目标是分析重症监护病房(ICU)患者的当前实验室结果,并对下次进行测试时哪些实验室值可能异常进行分类。检测近期异常有助于支持ICU临床医生的决策过程,吸引他们关注重要值并专注于未来的实验室检测,节省他们的时间和金钱。此外,这将使医生有更多时间与患者相处,而不是浏览一长串实验室值。
我们使用结构化查询语言从MIMIC-III和eICU数据集中提取ICU中机械通气患者的25个实验室值。此外,我们应用了时间窗口采样和保留,以及支持向量机来填充稀疏时间序列中的缺失值,以及使用Tukey范围来检测和删除异常值。然后,我们使用这些数据训练4个用于时间序列分类的深度学习模型,以及一种基于梯度提升的算法,并比较它们在两个数据集上的性能。
在这项工作中测试的模型(深度神经网络和梯度提升)与预处理管道相结合,在多标签分类任务上实现了至少80%的准确率。此外,基于多卷积神经网络的模型在两个数据集上均优于其他算法,准确率超过89%。
在这项工作中,我们表明使用机器学习和深度神经网络来预测实验室值的近期异常可以取得令人满意的结果。我们的系统在2个知名数据集上进行了训练、验证和测试,以确保我们的系统尽可能弥合与现实的差距。最后,该模型可与我们的预处理管道结合用于实际的电子健康记录,以改善患者的诊断和治疗。