Jauk Stefanie, Kramer Diether, Quehenberger Franz, Veeranki Sai Pavan Kumar, Hayn Dieter, Schreier Günter, Leodolter Werner
CBmed, Graz, Austria.
Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Graz, Austria.
Stud Health Technol Inform. 2019;260:65-72.
In a database of electronic health records, the amount of available information varies widely between patients. In a real-time prediction scenario, a machine learning model may receive limited information for some patients.
Our aim was to evaluate the influence of missing data on real-time prediction of delirium, and detect changes in prediction performance when training separate models for patients with missing data.
We compared a model trained specifically on data with missing values to the currently implemented model predicting delirium. Also, we simulated five test data sets with different amount of missing data and compared the prediction results to the prediction on complete data set when using the same model.
For patients with missing laboratory and nursing assessment data, a model trained especially for this scenario performed significantly better than the implemented model. The combination of procedure data and demographic data achieved the closest results to a prediction with a complete data set.
An ongoing evaluation of real-time prediction is indispensable. Additional models adapted to the information available might improve prediction performance.
在电子健康记录数据库中,患者之间可用信息的数量差异很大。在实时预测场景中,机器学习模型可能会收到一些患者的有限信息。
我们的目的是评估缺失数据对谵妄实时预测的影响,并检测为缺失数据患者训练单独模型时预测性能的变化。
我们将专门针对有缺失值的数据训练的模型与当前实施的谵妄预测模型进行比较。此外,我们模拟了五个具有不同缺失数据量的测试数据集,并在使用相同模型时将预测结果与完整数据集上的预测进行比较。
对于缺失实验室和护理评估数据的患者,专门针对此场景训练的模型表现明显优于实施的模型。程序数据和人口统计学数据的组合取得了与完整数据集预测最接近的结果。
对实时预测进行持续评估是必不可少的。适应可用信息的额外模型可能会提高预测性能。