Department of Occupational Disease Prevention, Jiangsu Provincial Center for Disease Control and Prevention, Nanjing, China.
Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China.
J Med Internet Res. 2021 Jan 6;23(1):e25535. doi: 10.2196/25535.
Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19.
We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection.
In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants' clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia.
Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%).
Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study's hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features.
利用非聚合酶链反应生物医学数据有效识别 COVID-19 患者对于实现最佳临床结果至关重要。目前,对于各种生物医学特征和适当的分析方法,还缺乏全面的了解,这些方法可用于实现 COVID-19 患者的早期检测和有效诊断。
本研究旨在结合低维临床和实验室检测数据以及高维计算机断层扫描(CT)成像数据,准确区分健康个体、COVID-19 患者和非 COVID 病毒性肺炎患者,尤其是在感染的早期阶段。
本研究纳入了 214 例非重症 COVID-19 患者、148 例重症 COVID-19 患者、198 例未感染的健康参与者和 129 例非 COVID 病毒性肺炎患者。采集了参与者的临床信息(即 23 个特征)、实验室检测结果(即 10 个特征)和入院时的 CT 扫描,并将其作为 3 种输入特征模态。为了实现多模态特征的后期融合,我们构建了一个深度学习模型,以提取 CT 扫描的 10 个特征高级表示。然后,我们基于所有 3 种模态的 43 个组合特征,开发了 3 种机器学习模型(即 k-最近邻、随机森林和支持向量机模型),以区分以下 4 个类别:非重症、重症、健康和病毒性肺炎。
多模态特征的使用提供了比任何单一特征模态更高的性能增益。所有 3 种机器学习模型的整体预测准确率均较高(95.4%-97.7%),且具有较高的类别特异性预测准确率(90.6%-99.9%)。
与目前通常侧重于单一特征模态的二元分类基准相比,本研究的混合深度学习-机器学习框架为临床应用提供了新颖而有效的突破。我们的研究结果来自相对较大的样本量和分析工作流程,将为当前 COVID-19 诊断方法和其他具有高维多模态生物医学特征的临床应用提供补充和辅助,以支持临床决策。