深度学习卷积神经网络与放射科医生在CT图像上鉴别甲状腺良恶性结节的比较。

A comparison between deep learning convolutional neural networks and radiologists in the differentiation of benign and malignant thyroid nodules on CT images.

作者信息

Zhao Hong-Bo, Liu Chang, Ye Jing, Chang Lu-Fan, Xu Qing, Shi Bo-Wen, Liu Lu-Lu, Yin Yi-Li, Shi Bin-Bin

机构信息

Department of Radiology, Second Affiliated Hospital of Dalian Medical University, Dalian, China.

Department of Radiology, Subei People's Hospital of Jiangsu province, Yangzhou, China.

出版信息

Endokrynol Pol. 2021;72(3):217-225. doi: 10.5603/EP.a2021.0015. Epub 2021 Feb 23.

DOI:10.5603/EP.a2021.0015

PMID:33619712

Abstract

INTRODUCTION

We designed 5 convolutional neural network (CNN) models and ensemble models to differentiate malignant and benign thyroid nodules on CT, and compared the diagnostic performance of CNN models with that of radiologists.

MATERIAL AND METHODS

We retrospectively included CT images of 880 patients with 986 thyroid nodules confirmed by surgical pathology between July 2017 and December 2019. Two radiologists retrospectively diagnosed benign and malignant thyroid nodules on CT images in a test set. Five CNNs (ResNet50, DenseNet121, DenseNet169, SE-ResNeXt50, and Xception) were trained-validated and tested using 788 and 198 thyroid nodule CT images, respectively. Then, we selected the 3 models with the best diagnostic performance on the test set for the model ensemble. We then compared the diagnostic performance of 2 radiologists with 5 CNN models and the integrated model.

RESULTS

Of the 986 thyroid nodules, 541 were malignant, and 445 were benign. The area under the curves (AUCs) for diagnosing thyroid malignancy was 0.587-0.754 for 2 radiologists. The AUCs for diagnosing thyroid malignancy for the 5 CNN models and ensemble model was 0.901-0.947. There were significant differences in AUC between the radiologists' models and the CNN models (p < 0.05). The ensemble model had the highest AUC value.

CONCLUSIONS

Five CNN models and an ensemble model performed better than radiologists in distinguishing malignant thyroid nodules from benign nodules on CT. The diagnostic performance of the ensemble model improved and showed good potential.

摘要

引言

我们设计了5种卷积神经网络（CNN）模型和集成模型，用于在CT上鉴别甲状腺良恶性结节，并将CNN模型的诊断性能与放射科医生的诊断性能进行比较。

材料与方法

我们回顾性纳入了2017年7月至2019年12月期间880例经手术病理证实有986个甲状腺结节的患者的CT图像。两名放射科医生在测试集中对CT图像上的甲状腺良恶性结节进行回顾性诊断。分别使用788例和198例甲状腺结节CT图像对5种CNN（ResNet50、DenseNet121、DenseNet169、SE-ResNeXt50和Xception）进行训练、验证和测试。然后，我们选择在测试集中诊断性能最佳的3个模型进行模型集成。然后，我们将2名放射科医生与5种CNN模型及集成模型的诊断性能进行比较。

结果

986个甲状腺结节中，541个为恶性，445个为良性。两名放射科医生诊断甲状腺恶性病变的曲线下面积（AUC）为0.587-0.754。5种CNN模型和集成模型诊断甲状腺恶性病变的AUC为0.901-0.947。放射科医生的模型与CNN模型的AUC存在显著差异（p<0.05）。集成模型的AUC值最高。