Department of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Department of Physiology, McGill University, 3655 Promenade Sir William Osler, Montreal, QC, H3G1Y6, Canada.
Sci Rep. 2023 Feb 17;13(1):2827. doi: 10.1038/s41598-023-29334-0.
Medical machine learning frameworks have received much attention in recent years. The recent COVID-19 pandemic was also accompanied by a surge in proposed machine learning algorithms for tasks such as diagnosis and mortality prognosis. Machine learning frameworks can be helpful medical assistants by extracting data patterns that are otherwise hard to detect by humans. Efficient feature engineering and dimensionality reduction are major challenges in most medical machine learning frameworks. Autoencoders are novel unsupervised tools that can perform data-driven dimensionality reduction with minimum prior assumptions. This study, in a novel approach, investigated the predictive power of latent representations obtained from a hybrid autoencoder (HAE) framework combining variational autoencoder (VAE) characteristics with mean squared error (MSE) and triplet loss for forecasting COVID-19 patients with high mortality risk in a retrospective framework. Electronic laboratory and clinical data of 1474 patients were used in the study. Logistic regression with elastic net regularization (EN) and random forest (RF) models were used as final classifiers. Moreover, we also investigated the contribution of utilized features towards latent representations via mutual information analysis. HAE Latent representations model achieved decent performance with an area under ROC curve of 0.921 (±0.027) and 0.910 (±0.036) with EN and RF predictors, respectively, over the hold-out data in comparison with the raw (AUC EN: 0.913 (±0.022); RF: 0.903 (±0.020)) models. The study aims to provide an interpretable feature engineering framework for the medical environment with the potential to integrate imaging data for efficient feature engineering in rapid triage and other clinical predictive models.
近年来,医学机器学习框架受到了广泛关注。最近的 COVID-19 大流行也伴随着大量针对诊断和死亡率预测等任务的机器学习算法的提出。机器学习框架可以通过提取人类难以察觉的数据模式,成为有帮助的医疗助手。高效的特征工程和降维是大多数医学机器学习框架面临的主要挑战。自动编码器是一种新颖的无监督工具,可以在最小的先验假设下进行数据驱动的降维。本研究采用一种新颖的方法,探讨了一种混合自动编码器(HAE)框架从电子实验室和临床数据中提取的潜在表示的预测能力,该框架结合了变分自动编码器(VAE)的特征和均方误差(MSE)和三重损失,以在回顾性框架中预测具有高死亡风险的 COVID-19 患者。该研究使用了 1474 名患者的电子实验室和临床数据。逻辑回归与弹性网络正则化(EN)和随机森林(RF)模型被用作最终分类器。此外,我们还通过互信息分析研究了所利用的特征对潜在表示的贡献。与原始模型(EN:AUC 0.913(±0.022);RF:0.903(±0.020))相比,HAE 潜在表示模型在保留数据上的表现相当出色,其接受者操作特征曲线下面积(ROC)分别为 0.921(±0.027)和 0.910(±0.036)与 EN 和 RF 预测器相比。该研究旨在为医学环境提供一个可解释的特征工程框架,具有整合成像数据的潜力,以便在快速分诊和其他临床预测模型中进行有效的特征工程。