Astaraki Mehdi, Yang Guang, Zakko Yousuf, Toma-Dasu Iuliana, Smedby Örjan, Wang Chunliang
Department of Biomedical Engineering and Health Systems, KTH Royal Institute of Technology, Huddinge, Sweden.
Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden.
Front Oncol. 2021 Dec 17;11:737368. doi: 10.3389/fonc.2021.737368. eCollection 2021.
Both radiomics and deep learning methods have shown great promise in predicting lesion malignancy in various image-based oncology studies. However, it is still unclear which method to choose for a specific clinical problem given the access to the same amount of training data. In this study, we try to compare the performance of a series of carefully selected conventional radiomics methods, end-to-end deep learning models, and deep-feature based radiomics pipelines for pulmonary nodule malignancy prediction on an open database that consists of 1297 manually delineated lung nodules.
Conventional radiomics analysis was conducted by extracting standard handcrafted features from target nodule images. Several end-to-end deep classifier networks, including VGG, ResNet, DenseNet, and EfficientNet were employed to identify lung nodule malignancy as well. In addition to the baseline implementations, we also investigated the importance of feature selection and class balancing, as well as separating the features learned in the nodule target region and the background/context region. By pooling the radiomics and deep features together in a hybrid feature set, we investigated the compatibility of these two sets with respect to malignancy prediction.
The best baseline conventional radiomics model, deep learning model, and deep-feature based radiomics model achieved AUROC values (mean ± standard deviations) of 0.792 ± 0.025, 0.801 ± 0.018, and 0.817 ± 0.032, respectively through 5-fold cross-validation analyses. However, after trying out several optimization techniques, such as feature selection and data balancing, as well as adding context features, the corresponding best radiomics, end-to-end deep learning, and deep-feature based models achieved AUROC values of 0.921 ± 0.010, 0.824 ± 0.021, and 0.936 ± 0.011, respectively. We achieved the best prediction accuracy from the hybrid feature set (AUROC: 0.938 ± 0.010).
The end-to-end deep-learning model outperforms conventional radiomics out of the box without much fine-tuning. On the other hand, fine-tuning the models lead to significant improvements in the prediction performance where the conventional and deep-feature based radiomics models achieved comparable results. The hybrid radiomics method seems to be the most promising model for lung nodule malignancy prediction in this comparative study.
在各种基于图像的肿瘤学研究中,放射组学和深度学习方法在预测病变恶性程度方面都显示出了巨大的潜力。然而,在可获得相同数量训练数据的情况下,对于特定临床问题该选择哪种方法仍不明确。在本研究中,我们试图在一个包含1297个手动勾勒的肺结节的开放数据库上,比较一系列精心挑选的传统放射组学方法、端到端深度学习模型以及基于深度特征的放射组学流程在肺结节恶性程度预测方面的性能。
通过从目标结节图像中提取标准的手工特征进行传统放射组学分析。还采用了几个端到端深度分类器网络,包括VGG、ResNet、DenseNet和EfficientNet来识别肺结节的恶性程度。除了基线实现外,我们还研究了特征选择和类别平衡的重要性,以及区分在结节目标区域和背景/上下文区域中学习到的特征。通过将放射组学和深度特征合并到一个混合特征集中,我们研究了这两组特征在恶性程度预测方面的兼容性。
通过5折交叉验证分析,最佳的基线传统放射组学模型、深度学习模型和基于深度特征的放射组学模型分别实现了0.792±0.025、0.801±0.018和0.817±0.032的AUROC值(均值±标准差)。然而,在尝试了几种优化技术,如特征选择和数据平衡,以及添加上下文特征后,相应的最佳放射组学、端到端深度学习和基于深度特征的模型分别实现了0.921±0.010、0.824±0.021和0.936±0.011的AUROC值。我们从混合特征集中获得了最佳预测准确率(AUROC:0.938±0.010)。
端到端深度学习模型在未经太多微调的情况下优于传统放射组学。另一方面,对模型进行微调会显著提高预测性能,此时传统放射组学模型和基于深度特征的放射组学模型取得了可比的结果。在这项比较研究中,混合放射组学方法似乎是肺结节恶性程度预测最有前景的模型。