Faculty of Health and Medical Sciences, Copenhagen University, Copenhagen, Denmark.
Department of Radiology and Nuclear Medicine, Hospital of South West Jutland, University Hospital of Southern Denmark, Esbjerg, Denmark; Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark.
Eur J Radiol. 2022 Jan;146:110073. doi: 10.1016/j.ejrad.2021.110073. Epub 2021 Nov 24.
To compare the diagnostic accuracy of convolutional neural networks (CNN) with radiologists as the reference standard in the diagnosis of intracranial hemorrhages (ICH) with non contrast computed tomography of the cerebrum (NCTC).
PubMed, Embase, Scopus, and Web of Science were searched for the period from 1 January 2012 to 20 July 2020; eligible studies included patients with and without ICH as the target condition undergoing NCTC, studies had deep learning algorithms based on CNNs and radiologists reports as the minimum reference standard. Pooled sensitivities, specificities and a summary receiver operating characteristics curve (SROC) were employed for meta-analysis.
5,119 records were identified through database searching. Title-screening left 47 studies for full-text assessment and 6 studies for meta-analysis. Comparing the CNN performance to reference standards in the retrospective studies found a pooled sensitivity of 96.00% (95% CI: 93.00% to 97.00%), pooled specificity of 97.00% (95% CI: 90.00% to 99.00%) and SROC of 98.00% (95% CI: 97.00% to 99.00%), and combining retrospective and studies with external datasets found a pooled sensitivity of 95.00% (95% CI: 91.00% to 97.00%), pooled specificity of 96.00% (95% CI: 91.00% to 98.00%) and a pooled SROC of 98.00% (95% CI: 97.00% to 99.00%).
This review found the diagnostic performance of CNNs to be equivalent to that of radiologists for retrospective studies. Out-of-sample external validation studies pooled with retrospective studies found CNN performance to be slightly worse. There is a critical need for studies with a robust reference standard and external data-set validation.
比较卷积神经网络(CNN)与放射科医生作为参考标准在非对比性大脑计算机断层扫描(NCTC)诊断颅内出血(ICH)中的诊断准确性。
检索 2012 年 1 月 1 日至 2020 年 7 月 20 日期间的 PubMed、Embase、Scopus 和 Web of Science,纳入研究对象为患有和不患有 ICH 的患者,目标条件为接受 NCTC,研究采用基于 CNN 的深度学习算法和放射科医生报告作为最低参考标准。采用合并敏感性、特异性和综合受试者工作特征曲线(SROC)进行荟萃分析。
通过数据库检索共识别出 5119 条记录。经标题筛选,有 47 项研究进入全文评估,6 项研究进行荟萃分析。在回顾性研究中,将 CNN 与参考标准进行比较,发现合并敏感性为 96.00%(95%CI:93.00%至 97.00%),合并特异性为 97.00%(95%CI:90.00%至 99.00%),SROC 为 98.00%(95%CI:97.00%至 99.00%),合并回顾性研究和具有外部数据集的研究发现,合并敏感性为 95.00%(95%CI:91.00%至 97.00%),合并特异性为 96.00%(95%CI:91.00%至 98.00%),SROC 为 98.00%(95%CI:97.00%至 99.00%)。
本综述发现,CNN 的诊断性能与放射科医生的诊断性能相当,适用于回顾性研究。纳入回顾性研究和具有外部数据集的研究的汇总发现,CNN 的性能略差。迫切需要具有稳健参考标准和外部数据集验证的研究。