Takita Hirotaka, Kabata Daijiro, Walston Shannon L, Tatekawa Hiroyuki, Saito Kenichi, Tsujimoto Yasushi, Miki Yukio, Ueda Daiju
Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan.
Center for Mathematical and Data Science, Kobe University, Kobe, Japan.
NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z.
While generative artificial intelligence (AI) has shown potential in medical diagnostics, comprehensive evaluation of its diagnostic performance and comparison with physicians has not been extensively explored. We conducted a systematic review and meta-analysis of studies validating generative AI models for diagnostic tasks published between June 2018 and June 2024. Analysis of 83 studies revealed an overall diagnostic accuracy of 52.1%. No significant performance difference was found between AI models and physicians overall (p = 0.10) or non-expert physicians (p = 0.93). However, AI models performed significantly worse than expert physicians (p = 0.007). Several models demonstrated slightly higher performance compared to non-experts, although the differences were not significant. Generative AI demonstrates promising diagnostic capabilities with accuracy varying by model. Although it has not yet achieved expert-level reliability, these findings suggest potential for enhancing healthcare delivery and medical education when implemented with appropriate understanding of its limitations.
虽然生成式人工智能(AI)在医学诊断中已显示出潜力,但对其诊断性能的全面评估以及与医生的比较尚未得到广泛探索。我们对2018年6月至2024年6月期间发表的验证生成式AI模型用于诊断任务的研究进行了系统综述和荟萃分析。对83项研究的分析显示总体诊断准确率为52.1%。总体而言,AI模型与医生(p = 0.10)或非专家医生(p = 0.93)之间未发现显著的性能差异。然而,AI模型的表现明显不如专家医生(p = 0.007)。与非专家相比,一些模型表现略好,尽管差异不显著。生成式AI显示出有前景的诊断能力,准确率因模型而异。虽然它尚未达到专家级的可靠性,但这些发现表明,在适当理解其局限性的情况下实施时,生成式AI有增强医疗服务和医学教育的潜力。