European Commission, Joint Research Centre (JRC), Via E. Fermi 2749 (TP 281) Ispra, Lombardy, Italy.
European Commission, Joint Research Centre (JRC), Via E. Fermi 2749 (TP 281) Ispra, Lombardy, Italy.
Eur J Radiol. 2021 Dec;145:110028. doi: 10.1016/j.ejrad.2021.110028. Epub 2021 Nov 16.
A growing number of studies have examined whether Artificial Intelligence (AI) systems can support imaging-based diagnosis of COVID-19-caused pneumonia, including both gains in diagnostic performance and speed. However, what is currently missing is a combined appreciation of studies comparing human readers and AI.
We followed PRISMA-DTA guidelines for our systematic review, searching EMBASE, PUBMED and Scopus databases. To gain insights into the potential value of AI methods, we focused on studies comparing the performance of human readers versus AI models or versus AI-supported human readings.
Our search identified 1270 studies, of which 12 fulfilled specific selection criteria. Concerning diagnostic performance, in testing datasets reported sensitivity was 42-100% (human readers, n = 9 studies), 60-95% (AI systems, n = 10) and 81-98% (AI-supported readers, n = 3), whilst reported specificity was 26-100% (human readers, n = 8), 61-96% (AI systems, n = 10) and 78-99% (AI-supported readings, n = 2). One study highlighted the potential of AI-supported readings for the assessment of lung lesion burden changes, whilst two studies indicated potential time savings for detection with AI.
Our review indicates that AI systems or AI-supported human readings show less performance variability (interquartile range) in general, and may support the differentiation of COVID-19 pneumonia from other forms of pneumonia when used in high-prevalence and symptomatic populations. However, inconsistencies related to study design, reporting of data, areas of risk of bias, as well as limitations of statistical analyses complicate clear conclusions. We therefore support efforts for developing critical elements of study design when assessing the value of AI for diagnostic imaging.
越来越多的研究探讨了人工智能(AI)系统是否能够支持基于影像的 COVID-19 引起的肺炎诊断,包括诊断性能和速度的提高。然而,目前缺少的是对比较人类读者和 AI 的研究的综合评估。
我们遵循 PRISMA-DTA 指南进行系统评价,检索了 EMBASE、PUBMED 和 Scopus 数据库。为了深入了解 AI 方法的潜在价值,我们专注于比较人类读者与 AI 模型或 AI 支持的人类阅读表现的研究。
我们的搜索确定了 1270 项研究,其中 12 项符合特定的选择标准。关于诊断性能,在测试数据集报告中,灵敏度为 42-100%(人类读者,n=9 项研究)、60-95%(AI 系统,n=10)和 81-98%(AI 支持的读者,n=3),而特异性报告为 26-100%(人类读者,n=8)、61-96%(AI 系统,n=10)和 78-99%(AI 支持的阅读,n=2)。一项研究强调了 AI 支持的阅读在评估肺部病变负担变化方面的潜力,而两项研究表明 AI 检测具有潜在的时间节省。
我们的综述表明,AI 系统或 AI 支持的人类阅读在总体上表现出较小的性能变异性(四分位距),并且当在高流行率和有症状的人群中使用时,可能有助于区分 COVID-19 肺炎与其他形式的肺炎。然而,研究设计、数据报告、偏倚风险领域以及统计分析的局限性方面的不一致性使得难以得出明确的结论。因此,我们支持在评估 AI 对诊断成像的价值时,努力开发研究设计的关键要素。