Wang Benjamin, Perronne Laetitia, Burke Christopher, Adler Ronald S
Department of Radiology, Division of Musculoskeletal Radiology, NYU Langone Health, 301 E 17th St, 6th Floor, New York, NY, 10003 (B.W., C.B., R.S.A.); and Department of Musculoskeletal Imaging, Hôpital Lariboisière, Paris, France (L.P.).
Radiol Artif Intell. 2020 Dec 2;3(1):e200125. doi: 10.1148/ryai.2020200125. eCollection 2021 Jan.
To train convolutional neural network (CNN) models to classify benign and malignant soft-tissue masses at US and to differentiate three commonly observed benign masses.
In this retrospective study, US images obtained between May 2010 and June 2019 from 419 patients (mean age, 52 years ± 18 [standard deviation]; 250 women) with histologic diagnosis confirmed at biopsy or surgical excision ( = 227) or masses that demonstrated imaging characteristics of lipoma, benign peripheral nerve sheath tumor, and vascular malformation ( = 192) were included. Images in patients with a histologic diagnosis ( = 227) were used to train and evaluate a CNN model to distinguish malignant and benign lesions. Twenty percent of cases were withheld as a test dataset, and the remaining cases were used to train the model with a 75%-25% training-validation split and fourfold cross-validation. Performance of the model was compared with retrospective interpretation of the same dataset by two experienced musculoskeletal radiologists, blinded to clinical history. A second group of US images from 275 of the 419 patients containing the three common benign masses was used to train and evaluate a separate model to differentiate between the masses. The models were trained on the Keras machine learning platform (version 2.3.1), with a modified pretrained VGG16 network. Performance metrics of the model and of the radiologists were compared by using the McNemar test, and 95% CIs for performance metrics were estimated by using the Clopper-Pearson method (accuracy, recall, specificity, and precision) and the DeLong method (area under the receiver operating characteristic curve).
The model trained to classify malignant and benign masses demonstrated an accuracy of 79% (95% CI: 68, 88) on the test data, with an area under the receiver operating characteristic curve of 0.91 (95% CI: 0.84, 0.98), matching the performance of two expert readers. Performance of the model distinguishing three benign masses was lower, with an accuracy of 71% (95% CI: 61, 80) on the test data.
The trained CNN was capable of differentiating between benign and malignant soft-tissue masses depicted on US images, with performance matching that of two experienced musculoskeletal radiologists.© RSNA, 2020.
训练卷积神经网络(CNN)模型,以对超声检查中的良性和恶性软组织肿块进行分类,并区分三种常见的良性肿块。
在这项回顾性研究中,纳入了2010年5月至2019年6月期间419例患者(平均年龄52岁±18[标准差];250例女性)的超声图像,这些患者经活检或手术切除确诊有组织学诊断(n = 227),或肿块具有脂肪瘤、良性周围神经鞘瘤和血管畸形的影像学特征(n = 192)。组织学诊断患者(n = 227)的图像用于训练和评估CNN模型,以区分恶性和良性病变。20%的病例留作测试数据集,其余病例用于以75%-25%的训练-验证分割和四重交叉验证来训练模型。将该模型的性能与两位经验丰富的肌肉骨骼放射科医生对同一数据集的回顾性解读进行比较,他们对临床病史不知情。来自419例患者中275例的另一组包含三种常见良性肿块的超声图像用于训练和评估一个单独的模型,以区分这些肿块。这些模型在Keras机器学习平台(版本2.3.1)上使用经过修改的预训练VGG16网络进行训练。使用McNemar检验比较模型和放射科医生的性能指标,并使用Clopper-Pearson方法(准确性、召回率、特异性和精确性)和DeLong方法(受试者操作特征曲线下面积)估计性能指标的95%置信区间。
训练用于对恶性和良性肿块进行分类的模型在测试数据上的准确率为79%(95%置信区间:68, 88),受试者操作特征曲线下面积为0.91(95%置信区间:0.84, 0.98),与两位专家读者的表现相当。区分三种良性肿块的模型性能较低,在测试数据上的准确率为71%(95%置信区间:61, 80)。
训练后的CNN能够区分超声图像上显示的良性和恶性软组织肿块,其性能与两位经验丰富的肌肉骨骼放射科医生相当。©RSNA,2020年。