Hansun Seng, Argha Ahmadreza, Bakhshayeshi Ivan, Wicaksana Arya, Alinejad-Rokny Hamid, Fox Greg J, Liaw Siaw-Teng, Celler Branko G, Marks Guy B
School of Clinical Medicine, South West Sydney, UNSW Medicine & Health, UNSW Sydney, Sydney, Australia.
Woolcock Vietnam Research Group, Woolcock Institute of Medical Research, Sydney, Australia.
J Med Internet Res. 2025 Mar 7;27:e69068. doi: 10.2196/69068.
Tuberculosis (TB) remains a significant health concern, contributing to the highest mortality among infectious diseases worldwide. However, none of the various TB diagnostic tools introduced is deemed sufficient on its own for the diagnostic pathway, so various artificial intelligence (AI)-based methods have been developed to address this issue.
We aimed to provide a comprehensive evaluation of AI-based algorithms for TB detection across various data modalities.
Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) 2020 guidelines, we conducted a systematic review to synthesize current knowledge on this topic. Our search across 3 major databases (Scopus, PubMed, Association for Computing Machinery [ACM] Digital Library) yielded 1146 records, of which we included 152 (13.3%) studies in our analysis. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies version 2) was performed for the risk-of-bias assessment of all included studies.
Radiographic biomarkers (n=129, 84.9%) and deep learning (DL; n=122, 80.3%) approaches were predominantly used, with convolutional neural networks (CNNs) using Visual Geometry Group (VGG)-16 (n=37, 24.3%), ResNet-50 (n=33, 21.7%), and DenseNet-121 (n=19, 12.5%) architectures being the most common DL approach. The majority of studies focused on model development (n=143, 94.1%) and used a single modality approach (n=141, 92.8%). AI methods demonstrated good performance in all studies: mean accuracy=91.93% (SD 8.10%, 95% CI 90.52%-93.33%; median 93.59%, IQR 88.33%-98.32%), mean area under the curve (AUC)=93.48% (SD 7.51%, 95% CI 91.90%-95.06%; median 95.28%, IQR 91%-99%), mean sensitivity=92.77% (SD 7.48%, 95% CI 91.38%-94.15%; median 94.05% IQR 89%-98.87%), and mean specificity=92.39% (SD 9.4%, 95% CI 90.30%-94.49%; median 95.38%, IQR 89.42%-99.19%). AI performance across different biomarker types showed mean accuracies of 92.45% (SD 7.83%), 89.03% (SD 8.49%), and 84.21% (SD 0%); mean AUCs of 94.47% (SD 7.32%), 88.45% (SD 8.33%), and 88.61% (SD 5.9%); mean sensitivities of 93.8% (SD 6.27%), 88.41% (SD 10.24%), and 93% (SD 0%); and mean specificities of 94.2% (SD 6.63%), 85.89% (SD 14.66%), and 95% (SD 0%) for radiographic, molecular/biochemical, and physiological types, respectively. AI performance across various reference standards showed mean accuracies of 91.44% (SD 7.3%), 93.16% (SD 6.44%), and 88.98% (SD 9.77%); mean AUCs of 90.95% (SD 7.58%), 94.89% (SD 5.18%), and 92.61% (SD 6.01%); mean sensitivities of 91.76% (SD 7.02%), 93.73% (SD 6.67%), and 91.34% (SD 7.71%); and mean specificities of 86.56% (SD 12.8%), 93.69% (SD 8.45%), and 92.7% (SD 6.54%) for bacteriological, human reader, and combined reference standards, respectively. The transfer learning (TL) approach showed increasing popularity (n=89, 58.6%). Notably, only 1 (0.7%) study conducted domain-shift analysis for TB detection.
Findings from this review underscore the considerable promise of AI-based methods in the realm of TB detection. Future research endeavors should prioritize conducting domain-shift analyses to better simulate real-world scenarios in TB detection.
PROSPERO CRD42023453611; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023453611.
结核病仍然是一个重大的健康问题,在全球传染病中导致的死亡率最高。然而,现有的各种结核病诊断工具单独用于诊断流程时都被认为不够充分,因此已经开发了各种基于人工智能(AI)的方法来解决这一问题。
我们旨在对基于AI的算法在各种数据模式下检测结核病进行全面评估。
遵循PRISMA(系统评价和Meta分析的首选报告项目)2020指南,我们进行了一项系统评价,以综合有关该主题的现有知识。我们在3个主要数据库(Scopus、PubMed、美国计算机协会[ACM]数字图书馆)中进行检索,获得了1146条记录,其中我们纳入了152项(13.3%)研究进行分析。对所有纳入研究进行QUADAS-2(诊断准确性研究质量评估第2版)以评估偏倚风险。
主要使用了影像学生物标志物(n = 129,84.9%)和深度学习(DL;n = 122,80.3%)方法,卷积神经网络(CNN)使用视觉几何组(VGG)-16(n = 37,24.3%)、ResNet-50(n = 33,21.7%)和DenseNet-121(n = 19,12.5%)架构是最常见的DL方法。大多数研究集中在模型开发(n = 143,94.1%),并使用单一模式方法(n = 141,92.8%)。AI方法在所有研究中均表现出良好性能:平均准确率=91.93%(标准差8.10%,95%置信区间90.52%-93.33%;中位数93.59%,四分位间距88.33%-98.32%),平均曲线下面积(AUC)=93.48%(标准差7.51%,95%置信区间91.90%-95.06%;中位数95.28%,四分位间距91%-99%),平均灵敏度=92.77%(标准差7.48%,95%置信区间91.38%-94.15%;中位数94.05%,四分位间距89%-98.87%),平均特异性=92.39%(标准差9.4%)%,95%置信区间90.30%-94.49%;中位数95.38%,四分位间距89.42%-99.19%)。不同生物标志物类型的AI性能显示,影像学、分子/生化和生理类型的平均准确率分别为92.45%(标准差7.83%)、89.03%(标准差8.49%)和84.21%(标准差0%);平均AUC分别为94.47%(标准差7.32%)、88.45%(标准差8.33%)和88.61%(标准差5.9%);平均灵敏度分别为93.8%(标准差6.27%)、88.41%(标准差10.24%)和93%(标准差0%);平均特异性分别为94.2%(标准差6.63%)、85.89%(标准差14.66%)和95%(标准差0%)。不同参考标准下的AI性能显示,细菌学、人工阅片和联合参考标准的平均准确率分别为91.44%(标准差7.3%)、93.16%(标准差6.44%)和88.98%(标准差9.77%);平均AUC分别为90.95%(标准差7.58%)、94.89%(标准差5.18%)和92.61%(标准差6.01%);平均灵敏度分别为91.76%(标准差7.02%)、93.73%(标准差6.67%)和91.34%(标准差7.71%);平均特异性分别为86.56%(标准差12.8%)、93.69%(标准差8.45%)和92.7%(标准差6.54%)。迁移学习(TL)方法的应用越来越普遍(n = 89,58.6%)。值得注意的是,只有1项(0.7%)研究对结核病检测进行了域转移分析。
本综述的结果强调了基于AI的方法在结核病检测领域的巨大前景。未来的研究应优先进行域转移分析,以更好地模拟结核病检测中的真实场景。
PROSPERO CRD42023453611;https://www.crd.york.ac.uk/PROSPERO/view/CRD42023453611。