Suppr超能文献

利用 CT 放射组学特征和机器学习算法对 COVID-19 肺炎进行高维多项多分类严重程度评分。

High-dimensional multinomial multiclass severity scoring of COVID-19 pneumonia using CT radiomics features and machine learning algorithms.

机构信息

Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, 1211, Geneva, Switzerland.

Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden.

出版信息

Sci Rep. 2022 Sep 1;12(1):14817. doi: 10.1038/s41598-022-18994-z.

Abstract

We aimed to construct a prediction model based on computed tomography (CT) radiomics features to classify COVID-19 patients into severe-, moderate-, mild-, and non-pneumonic. A total of 1110 patients were studied from a publicly available dataset with 4-class severity scoring performed by a radiologist (based on CT images and clinical features). The entire lungs were segmented and followed by resizing, bin discretization and radiomic features extraction. We utilized two feature selection algorithms, namely bagging random forest (BRF) and multivariate adaptive regression splines (MARS), each coupled to a classifier, namely multinomial logistic regression (MLR), to construct multiclass classification models. The dataset was divided into 50% (555 samples), 20% (223 samples), and 30% (332 samples) for training, validation, and untouched test datasets, respectively. Subsequently, nested cross-validation was performed on train/validation to select the features and tune the models. All predictive power indices were reported based on the testing set. The performance of multi-class models was assessed using precision, recall, F1-score, and accuracy based on the 4 × 4 confusion matrices. In addition, the areas under the receiver operating characteristic curves (AUCs) for multi-class classifications were calculated and compared for both models. Using BRF, 23 radiomic features were selected, 11 from first-order, 9 from GLCM, 1 GLRLM, 1 from GLDM, and 1 from shape. Ten features were selected using the MARS algorithm, namely 3 from first-order, 1 from GLDM, 1 from GLRLM, 1 from GLSZM, 1 from shape, and 3 from GLCM features. The mean absolute deviation, skewness, and variance from first-order and flatness from shape, and cluster prominence from GLCM features and Gray Level Non Uniformity Normalize from GLRLM were selected by both BRF and MARS algorithms. All selected features by BRF or MARS were significantly associated with four-class outcomes as assessed within MLR (All p values < 0.05). BRF + MLR and MARS + MLR resulted in pseudo-R prediction performances of 0.305 and 0.253, respectively. Meanwhile, there was a significant difference between the feature selection models when using a likelihood ratio test (p value = 0.046). Based on confusion matrices for BRF + MLR and MARS + MLR algorithms, the precision was 0.856 and 0.728, the recall was 0.852 and 0.722, whereas the accuracy was 0.921 and 0.861, respectively. AUCs (95% CI) for multi-class classification were 0.846 (0.805-0.887) and 0.807 (0.752-0.861) for BRF + MLR and MARS + MLR algorithms, respectively. Our models based on the utilization of radiomic features, coupled with machine learning were able to accurately classify patients according to the severity of pneumonia, thus highlighting the potential of this emerging paradigm in the prognostication and management of COVID-19 patients.

摘要

我们旨在构建一个基于计算机断层扫描(CT)放射组学特征的预测模型,以将 COVID-19 患者分为重症、中度、轻症和非肺炎。总共从一个公开的数据集研究了 1110 名患者,该数据集由放射科医生根据 CT 图像和临床特征进行 4 级严重程度评分。对整个肺部进行分割,然后进行重新调整大小、二进制离散化和放射组学特征提取。我们使用了两种特征选择算法,即袋装随机森林(BRF)和多变量自适应回归样条(MARS),它们分别与多类逻辑回归(MLR)分类器结合使用,构建多类分类模型。数据集分为 50%(555 个样本)、20%(223 个样本)和 30%(332 个样本),分别用于训练、验证和未触及的测试数据集。随后,在训练/验证数据上进行嵌套交叉验证,以选择特征并调整模型。所有预测能力指数均基于测试集报告。使用精度、召回率、F1 评分和准确性基于 4×4 混淆矩阵评估多类模型的性能。此外,还计算了多类分类的接收器工作特征曲线(AUC)的面积,并比较了两种模型的 AUC。使用 BRF,选择了 23 个放射组学特征,其中 11 个来自一阶,9 个来自 GLCM,1 个来自 GLRLM,1 个来自 GLDM,1 个来自形状。使用 MARS 算法选择了 10 个特征,即 3 个来自一阶,1 个来自 GLDM,1 个来自 GLRLM,1 个来自 GLSZM,1 个来自形状,3 个来自 GLCM 特征。均值绝对偏差、偏度和方差来自一阶和形状的平坦度,以及 GLCM 特征的聚类突出度和 GLRLM 的灰度非均匀性归一化。BRF 和 MARS 算法均选择了来自一阶和形状的特征,GLCM 特征和 GLRLM 的平坦度、聚类突出度和灰度非均匀性归一化。由 BRF 或 MARS 选择的所有特征与 MLR 评估的四分类结果显著相关(所有 p 值均<0.05)。BRF+MLR 和 MARS+MLR 的预测性能伪 R 分别为 0.305 和 0.253。同时,使用似然比检验时,特征选择模型之间存在显著差异(p 值=0.046)。基于 BRF+MLR 和 MARS+MLR 算法的混淆矩阵,精度分别为 0.856 和 0.728,召回率分别为 0.852 和 0.722,准确性分别为 0.921 和 0.861。多类分类的 AUC(95%CI)分别为 BRF+MLR 和 MARS+MLR 算法的 0.846(0.805-0.887)和 0.807(0.752-0.861)。我们的模型基于放射组学特征的利用,结合机器学习,能够根据肺炎的严重程度准确地对患者进行分类,从而凸显了这一新兴范例在 COVID-19 患者预后和管理中的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb1f/9437017/0511e53af981/41598_2022_18994_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验