School of Computer Science, The University of Sydney, Australia; Sydney Law School, The University of Sydney, Australia.
School of Computer Science, The University of Sydney, Australia.
Int J Med Inform. 2023 Sep;177:105159. doi: 10.1016/j.ijmedinf.2023.105159. Epub 2023 Aug 2.
The global market for AI systems used in lung tuberculosis (TB) detection has expanded significantly in recent years. Verifying their performance across diverse settings is crucial before medical organisations can invest in them and pursue safe, wide-scale deployment. The goal of this research was to synthesise the clinical evidence for the diagnostic accuracy of certified AI products designed for screening TB in chest X-rays (CXRs) compared to a microbiological reference standard.
Four databases were searched between June to September 2022. Data concerning study methodology, system characteristics, and diagnostic accuracy metrics was extracted and summarised. Study bias was evaluated using QUADAS-2 and by examining sources of funding. Forest plots for diagnostic odds ratio (DOR) and summary receiver operating characteristic (SROC) curves were constructed for the AI products individually and collectively.
10 out of 3642 studies satisfied the review criteria however only 8 were subject to meta-analysis following bias assessment. Three AI products were evaluated with a 95 % confidence interval producing the following pooled estimates for accuracy rankings: qXR v2 (sensitivity of 0.944 [0.887-0.973], specificity of 0.692 [0.549-0.805], DOR of 3.63 [3.17-4.09], Lunit INSIGHT CXR v3.1 (sensitivity of 0.853 [0.787-0.901], specificity of 0.646 [0.627-0.665], DOR of 2.37 [1.96-2.78]), and CAD4TB v3.07 (sensitivity of 0.917 [0.848-0.956], specificity of 0.371 [0.336-0.408], DOR of 1.91 [1.4-2.47]). Overall, the products had a sensitivity of 0.903 (0.859-0.934), specificity of 0.526 (0.409-0.641), and DOR of 2.31 (1.78-2.84).
Current publicly available evidence indicates considerable variability in the diagnostic accuracy of available AI products although overall they have high sensitivity and modest specificity which is improving with time. These preliminary results are limited by the small number of studies and poor coverage for low TB burden settings. More research is needed to expand the clinical evidence base for the performance of AI products.
近年来,用于检测肺结核(TB)的人工智能系统在全球市场上的应用得到了显著扩展。在医疗机构投资并追求安全、广泛的部署之前,对其在不同环境下的性能进行验证至关重要。本研究的目的是综合评估经过认证的人工智能产品在胸部 X 光(CXR)中筛查结核病方面的临床诊断准确性,这些产品的设计基于微生物学参考标准。
2022 年 6 月至 9 月期间,在四个数据库中进行了检索。提取并总结了有关研究方法、系统特征和诊断准确性指标的数据。使用 QUADAS-2 评估研究偏倚,并检查资金来源。为每个 AI 产品和集体产品构建了诊断比值比(DOR)和汇总受试者工作特征(SROC)曲线的森林图。
在 3642 项研究中,有 10 项符合审查标准,但在进行偏倚评估后,只有 8 项可进行荟萃分析。对三种 AI 产品进行了评估,置信区间为 95%,产生了以下准确性排名的汇总估计值:qXR v2(敏感性为 0.944 [0.887-0.973],特异性为 0.692 [0.549-0.805],DOR 为 3.63 [3.17-4.09],Lunit INSIGHT CXR v3.1(敏感性为 0.853 [0.787-0.901],特异性为 0.646 [0.627-0.665],DOR 为 2.37 [1.96-2.78])和 CAD4TB v3.07(敏感性为 0.917 [0.848-0.956],特异性为 0.371 [0.336-0.408],DOR 为 1.91 [1.4-2.47])。总体而言,这些产品的敏感性为 0.903(0.859-0.934),特异性为 0.526(0.409-0.641),DOR 为 2.31(1.78-2.84)。
目前公开可用的证据表明,可用 AI 产品的诊断准确性存在相当大的差异,但总体而言,它们具有较高的敏感性和适度的特异性,并且随着时间的推移正在不断提高。这些初步结果受到研究数量少和低结核病负担环境覆盖范围不佳的限制。需要进一步的研究来扩展人工智能产品性能的临床证据基础。