Department of Medical Physics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
Cellular and Molecular Research Center, Yasuj University of Medical Sciences, Yasuj, Iran.
J Xray Sci Technol. 2021;29(2):229-243. doi: 10.3233/XST-200831.
Radiomics has been widely used in quantitative analysis of medical images for disease diagnosis and prognosis assessment. The objective of this study is to test a machine-learning (ML) method based on radiomics features extracted from chest CT images for screening COVID-19 cases.
The study is carried out on two groups of patients, including 138 patients with confirmed and 140 patients with suspected COVID-19. We focus on distinguishing pneumonia caused by COVID-19 from the suspected cases by segmentation of whole lung volume and extraction of 86 radiomics features. Followed by feature extraction, nine feature-selection procedures are used to identify valuable features. Then, ten ML classifiers are applied to classify and predict COVID-19 cases. Each ML models is trained and tested using a ten-fold cross-validation method. The predictive performance of each ML model is evaluated using the area under the curve (AUC) and accuracy.
The range of accuracy and AUC is from 0.32 (recursive feature elimination [RFE]+Multinomial Naive Bayes [MNB] classifier) to 0.984 (RFE+bagging [BAG], RFE+decision tree [DT] classifiers) and 0.27 (mutual information [MI]+MNB classifier) to 0.997 (RFE+k-nearest neighborhood [KNN] classifier), respectively. There is no direct correlation among the number of the selected features, accuracy, and AUC, however, with changes in the number of the selected features, the accuracy and AUC values will change. Feature selection procedure RFE+BAG classifier and RFE+DT classifier achieve the highest prediction accuracy (accuracy: 0.984), followed by MI+Gaussian Naive Bayes (GNB) and logistic regression (LGR)+DT classifiers (accuracy: 0.976). RFE+KNN classifier as a feature selection procedure achieve the highest AUC (AUC: 0.997), followed by RFE+BAG classifier (AUC: 0.991) and RFE+gradient boosting decision tree (GBDT) classifier (AUC: 0.99).
This study demonstrates that the ML model based on RFE+KNN classifier achieves the highest performance to differentiate patients with a confirmed infection caused by COVID-19 from the suspected cases.
放射组学已广泛应用于医学图像的定量分析,以用于疾病诊断和预后评估。本研究旨在测试一种基于从胸部 CT 图像中提取的放射组学特征的机器学习(ML)方法,以用于筛查 COVID-19 病例。
本研究在两组患者中进行,包括 138 例确诊患者和 140 例疑似 COVID-19 患者。我们专注于通过全肺容积分割和提取 86 个放射组学特征来区分 COVID-19 引起的肺炎与疑似病例。在进行特征提取后,使用九种特征选择程序来识别有价值的特征。然后,应用十种 ML 分类器对 COVID-19 病例进行分类和预测。每个 ML 模型均使用十折交叉验证方法进行训练和测试。使用曲线下面积(AUC)和准确性评估每个 ML 模型的预测性能。
准确性和 AUC 的范围分别为 0.32(递归特征消除[RFE]+多项式朴素贝叶斯[MNB]分类器)至 0.984(RFE+袋装[BAG]、RFE+决策树[DT]分类器)和 0.27(互信息[MI]+MNB 分类器)至 0.997(RFE+k 最近邻[KNN]分类器)。所选特征的数量、准确性和 AUC 之间没有直接的相关性,但是,随着所选特征数量的变化,准确性和 AUC 值将会变化。特征选择程序 RFE+BAG 分类器和 RFE+DT 分类器实现了最高的预测准确性(准确性:0.984),其次是 MI+高斯朴素贝叶斯(GNB)和逻辑回归(LGR)+DT 分类器(准确性:0.976)。RFE+KNN 分类器作为特征选择程序实现了最高的 AUC(AUC:0.997),其次是 RFE+BAG 分类器(AUC:0.991)和 RFE+梯度提升决策树(GBDT)分类器(AUC:0.99)。
本研究表明,基于 RFE+KNN 分类器的 ML 模型在区分确诊 COVID-19 感染患者与疑似患者方面具有最佳性能。