Wood David A, Kafiabadi Sina, Al Busaidi Aisha, Guilhem Emily L, Lynch Jeremy, Townend Matthew K, Montvila Antanas, Kiik Martin, Siddiqui Juveria, Gadapa Naveen, Benger Matthew D, Mazumder Asif, Barker Gareth, Ourselin Sebastian, Cole James H, Booth Thomas C
School of Biomedical Engineering & Imaging Sciences, Kings College London, Rayne Institute, 4th Floor, Lambeth Wing, London, SE1 7EH, UK.
Department of Neuroradiology, Ruskin Wing, King's College Hospital NHS Foundation Trust, London, SE5 9RS, UK.
Eur Radiol. 2022 Jan;32(1):725-736. doi: 10.1007/s00330-021-08132-0. Epub 2021 Jul 20.
The purpose of this study was to build a deep learning model to derive labels from neuroradiology reports and assign these to the corresponding examinations, overcoming a bottleneck to computer vision model development.
Reference-standard labels were generated by a team of neuroradiologists for model training and evaluation. Three thousand examinations were labelled for the presence or absence of any abnormality by manually scrutinising the corresponding radiology reports ('reference-standard report labels'); a subset of these examinations (n = 250) were assigned 'reference-standard image labels' by interrogating the actual images. Separately, 2000 reports were labelled for the presence or absence of 7 specialised categories of abnormality (acute stroke, mass, atrophy, vascular abnormality, small vessel disease, white matter inflammation, encephalomalacia), with a subset of these examinations (n = 700) also assigned reference-standard image labels. A deep learning model was trained using labelled reports and validated in two ways: comparing predicted labels to (i) reference-standard report labels and (ii) reference-standard image labels. The area under the receiver operating characteristic curve (AUC-ROC) was used to quantify model performance. Accuracy, sensitivity, specificity, and F1 score were also calculated.
Accurate classification (AUC-ROC > 0.95) was achieved for all categories when tested against reference-standard report labels. A drop in performance (ΔAUC-ROC > 0.02) was seen for three categories (atrophy, encephalomalacia, vascular) when tested against reference-standard image labels, highlighting discrepancies in the original reports. Once trained, the model assigned labels to 121,556 examinations in under 30 min.
Our model accurately classifies head MRI examinations, enabling automated dataset labelling for downstream computer vision applications.
• Deep learning is poised to revolutionise image recognition tasks in radiology; however, a barrier to clinical adoption is the difficulty of obtaining large labelled datasets for model training. • We demonstrate a deep learning model which can derive labels from neuroradiology reports and assign these to the corresponding examinations at scale, facilitating the development of downstream computer vision models. • We rigorously tested our model by comparing labels predicted on the basis of neuroradiology reports with two sets of reference-standard labels: (1) labels derived by manually scrutinising each radiology report and (2) labels derived by interrogating the actual images.
本研究旨在构建一个深度学习模型,从神经放射学报告中提取标签并将其分配给相应的检查,克服计算机视觉模型开发的一个瓶颈。
由一组神经放射科医生生成参考标准标签用于模型训练和评估。通过人工仔细审查相应的放射学报告(“参考标准报告标签”),对3000例检查进行有无任何异常的标注;通过查看实际图像,为这些检查中的一个子集(n = 250)分配“参考标准图像标签”。另外,对2000份报告进行7种特殊异常类型(急性中风、肿块、萎缩、血管异常、小血管疾病、白质炎症、脑软化)有无的标注,这些检查中的一个子集(n = 700)也被分配参考标准图像标签。使用标注好的报告训练一个深度学习模型,并通过两种方式进行验证:将预测标签与(i)参考标准报告标签和(ii)参考标准图像标签进行比较。使用受试者操作特征曲线下面积(AUC-ROC)来量化模型性能。还计算了准确率、敏感性、特异性和F1分数。
与参考标准报告标签进行测试时,所有类别均实现了准确分类(AUC-ROC > 0.95)。与参考标准图像标签进行测试时,三个类别(萎缩、脑软化、血管)的性能出现下降(ΔAUC-ROC > 0.02),突出了原始报告中的差异。一旦训练完成,该模型在不到30分钟的时间内为121,556例检查分配了标签。
我们的模型能够准确地对头MRI检查进行分类,为下游计算机视觉应用实现自动化数据集标注。
• 深度学习有望彻底改变放射学中的图像识别任务;然而,临床应用的一个障碍是难以获得用于模型训练的大型标注数据集。• 我们展示了一个深度学习模型,它可以从神经放射学报告中提取标签并大规模地将其分配给相应的检查,促进下游计算机视觉模型的开发。• 我们通过将基于神经放射学报告预测的标签与两组参考标准标签进行比较,对我们的模型进行了严格测试:(1)通过人工仔细审查每份放射学报告得出的标签和(2)通过查看实际图像得出的标签。