Suppr超能文献

用于乳腺癌检测的人工智能及其健康技术评估:一项范围综述。

Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review.

作者信息

Uwimana Anisie, Gnecco Giorgio, Riccaboni Massimo

机构信息

IMT School for Advanced Studies, Lucca, Italy.

IMT School for Advanced Studies, Lucca, Italy; IUSS University School for Advanced Studies, Pavia, Italy.

出版信息

Comput Biol Med. 2025 Jan;184:109391. doi: 10.1016/j.compbiomed.2024.109391. Epub 2024 Nov 22.

Abstract

BACKGROUND

Recent healthcare advancements highlight the potential of Artificial Intelligence (AI) - and especially, among its subfields, Machine Learning (ML) - in enhancing Breast Cancer (BC) clinical care, leading to improved patient outcomes and increased radiologists' efficiency. While medical imaging techniques have significantly contributed to BC detection and diagnosis, their synergy with AI algorithms has consistently demonstrated superior diagnostic accuracy, reduced False Positives (FPs), and enabled personalized treatment strategies. Despite the burgeoning enthusiasm for leveraging AI for early and effective BC clinical care, its widespread integration into clinical practice is yet to be realized, and the evaluation of AI-based health technologies in terms of health and economic outcomes remains an ongoing endeavor.

OBJECTIVES

This scoping review aims to investigate AI (and especially ML) applications that have been implemented and evaluated across diverse clinical tasks or decisions in breast imaging and to explore the current state of evidence concerning the assessment of AI-based technologies for BC clinical care within the context of Health Technology Assessment (HTA).

METHODS

We conducted a systematic literature search following the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P) checklist in PubMed and Scopus to identify relevant studies on AI (and particularly ML) applications in BC detection and diagnosis. We limited our search to studies published from January 2015 to October 2023. The Minimum Information about CLinical Artificial Intelligence Modeling (MI-CLAIM) checklist was used to assess the quality of AI algorithms development, evaluation, and reporting quality in the reviewed articles. The HTA Core Model® was also used to analyze the comprehensiveness, robustness, and reliability of the reported results and evidence in AI-systems' evaluations to ensure rigorous assessment of AI systems' utility and cost-effectiveness in clinical practice.

RESULTS

Of the 1652 initially identified articles, 104 were deemed eligible for inclusion in the review. Most studies examined the clinical effectiveness of AI-based systems (78.84%, n= 82), with one study focusing on safety in clinical settings, and 13.46% (n=14) focusing on patients' benefits. Of the studies, 31.73% (n=33) were ethically approved to be carried out in clinical practice, whereas 25% (n=26) evaluated AI systems legally approved for clinical use. Notably, none of the studies addressed the organizational implications of AI systems in clinical practice. Of the 104 studies, only two of them focused on cost-effectiveness analysis, and were analyzed separately. The average percentage scores for the first 102 AI-based studies' quality assessment based on the MI-CLAIM checklist criteria were 84.12%, 83.92%, 83.98%, 74.51%, and 14.7% for study design, data and optimization, model performance, model examination, and reproducibility, respectively. Notably, 20.59% (n=21) of these studies relied on large-scale representative real-world breast screening datasets, with only 10.78% (n =11) studies demonstrating the robustness and generalizability of the evaluated AI systems.

CONCLUSION

In bridging the gap between cutting-edge developments and seamless integration of AI systems into clinical workflows, persistent challenges encompass data quality and availability, ethical and legal considerations, robustness and trustworthiness, scalability, and alignment with existing radiologists' workflow. These hurdles impede the synthesis of comprehensive, robust, and reliable evidence to substantiate these systems' clinical utility, relevance, and cost-effectiveness in real-world clinical workflows. Consequently, evaluating AI-based health technologies through established HTA methodologies becomes complicated. We also highlight potential significant influences on AI systems' effectiveness of various factors, such as operational dynamics, organizational structure, the application context of AI systems, and practices in breast screening or examination reading of AI support tools in radiology. Furthermore, we emphasize substantial reciprocal influences on decision-making processes between AI systems and radiologists. Thus, we advocate for an adapted assessment framework specifically designed to address these potential influences on AI systems' effectiveness, mainly addressing system-level transformative implications for AI systems rather than focusing solely on technical performance and task-level evaluations.

摘要

背景

近期医疗保健领域的进展凸显了人工智能(AI)——尤其是其机器学习(ML)子领域——在改善乳腺癌(BC)临床护理方面的潜力,从而提高患者治疗效果并提升放射科医生的工作效率。尽管医学成像技术对乳腺癌的检测和诊断做出了重大贡献,但它们与人工智能算法的协同作用始终展现出卓越的诊断准确性、减少假阳性(FP),并能够制定个性化治疗策略。尽管利用人工智能进行早期和有效的乳腺癌临床护理的热情日益高涨,但其广泛融入临床实践尚未实现,而且从健康和经济结果方面评估基于人工智能的健康技术仍是一项持续的工作。

目的

本范围综述旨在调查已在乳腺成像的各种临床任务或决策中实施和评估的人工智能(尤其是机器学习)应用,并探讨在卫生技术评估(HTA)背景下评估基于人工智能的技术用于乳腺癌临床护理的现有证据状态。

方法

我们按照系统评价和Meta分析方案的首选报告项目(PRISMA-P)清单,在PubMed和Scopus中进行了系统的文献检索,以识别关于人工智能(特别是机器学习)在乳腺癌检测和诊断中的应用的相关研究。我们将搜索范围限制在2015年1月至2023年10月发表的研究。使用临床人工智能建模的最低信息(MI-CLAIM)清单来评估综述文章中人工智能算法开发、评估和报告质量。HTA核心模型®也用于分析人工智能系统评估中报告结果和证据的全面性、稳健性和可靠性,以确保对人工智能系统在临床实践中的效用和成本效益进行严格评估。

结果

在最初识别的1652篇文章中,104篇被认为符合纳入综述的条件。大多数研究考察了基于人工智能的系统的临床有效性(78.84%,n = 82),一项研究关注临床环境中的安全性,13.46%(n = 14)关注患者的益处。在这些研究中,31.73%(n = 33)在伦理上被批准在临床实践中进行,而25%(n = 26)评估了经法律批准用于临床的人工智能系统。值得注意的是,没有一项研究涉及人工智能系统在临床实践中的组织影响。在104项研究中,只有两项关注成本效益分析,并分别进行了分析。根据MI-CLAIM清单标准,前102项基于人工智能的研究的质量评估平均百分比得分在研究设计、数据和优化、模型性能、模型检验和可重复性方面分别为84.12%、83.92%、83.98%、74.51%和14.7%。值得注意的是,这些研究中有20.59%(n = 21)依赖大规模代表性真实世界乳腺筛查数据集,只有10.78%(n = 11)的研究证明了所评估的人工智能系统的稳健性和可推广性。

结论

在弥合人工智能系统的前沿发展与无缝集成到临床工作流程之间的差距方面,持续存在的挑战包括数据质量和可用性、伦理和法律考虑、稳健性和可信度、可扩展性以及与现有放射科医生工作流程的一致性。这些障碍阻碍了综合、稳健和可靠证据的合成,以证实这些系统在真实世界临床工作流程中的临床效用、相关性和成本效益。因此,通过既定的HTA方法评估基于人工智能的健康技术变得复杂。我们还强调了各种因素对人工智能系统有效性的潜在重大影响,如运营动态、组织结构、人工智能系统的应用背景以及放射学中人工智能支持工具的乳腺筛查或检查阅读实践。此外,我们强调人工智能系统与放射科医生之间对决策过程的重大相互影响。因此,我们主张采用专门设计的适应性评估框架,以解决这些对人工智能系统有效性的潜在影响,主要关注人工智能系统的系统层面变革性影响,而不是仅仅关注技术性能和任务层面评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验