Luo Xufei, Wang Bingyi, Shi Qianling, Wang Zijun, Lai Honghao, Liu Hui, Qin Yishan, Chen Fengxian, Song Xuping, Ge Long, Zhang Lu, Bian Zhaoxiang, Chen Yaolong
Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, China; Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China; World Health Organization Collaboration Center for Guideline Implementation and Knowledge Translation, Lanzhou, China; Institute of Health Data Science, Lanzhou University, Lanzhou, China; Key Laboratory of Evidence Based Medicine of Gansu Province, Lanzhou University, Lanzhou, China.
The First School of Clinical Medicine, Lanzhou University, Lanzhou, China.
J Clin Epidemiol. 2025 Jul 18;186:111903. doi: 10.1016/j.jclinepi.2025.111903.
This study aimed to systematically map the development methods, scope, and limitations of existing artificial intelligence (AI) reporting guidelines in medicine and to explore their applicability to generative AI (GAI) tools, such as large language models (LLMs).
We reported a scoping review adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews. Five information sources were searched, including MEDLINE (via PubMed), Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, China National Knowledge Infrastructure, FAIRsharing, and Google Scholar, from inception to December 31, 2024. Two reviewers independently screened records and extracted data using a predefined Excel template. Data included guideline characteristics (eg, development methods, target audience, AI domain), adherence to EQUATOR Network recommendations, and consensus methodologies. Discrepancies were resolved by a third reviewer.
Sixty-eight AI reporting guidelines were included; 48.5% focused on general AI, whereas only 7.4% addressed GAI/LLMs. Methodological rigor was limited; 39.7% described development processes, 42.6% involved multidisciplinary experts, and 33.8% followed EQUATOR recommendations. Significant overlap existed, particularly in medical imaging (20.6% of guidelines). GAI-specific guidelines (14.7%) lacked comprehensive coverage and methodological transparency.
Existing AI reporting guidelines in medicine have suboptimal methodological rigor, redundancy, and insufficient coverage of GAI applications. Future and updated guidelines should prioritize standardized development processes, multidisciplinary collaboration, and expanded focus on emerging AI technologies like LLMs.
本研究旨在系统梳理医学领域现有人工智能(AI)报告指南的制定方法、范围和局限性,并探讨其对生成式人工智能(GAI)工具(如大语言模型(LLMs))的适用性。
我们报告了一项遵循系统评价与Meta分析扩展版的范围综述优先报告项目的范围综述。检索了五个信息来源,包括MEDLINE(通过PubMed)、提高健康研究质量和透明度(EQUATOR)网络、中国知网、FAIRsharing和谷歌学术,检索时间从创刊至2024年12月31日。两名评审员独立筛选记录,并使用预定义的Excel模板提取数据。数据包括指南特征(如制定方法、目标受众、AI领域)、对EQUATOR网络建议的遵循情况以及共识方法。分歧由第三位评审员解决。
纳入了68项AI报告指南;48.5%聚焦于通用AI,而仅7.4%涉及GAI/LLMs。方法学严谨性有限;39.7%描述了制定过程,42.6%涉及多学科专家,33.8%遵循EQUATOR建议。存在显著重叠,尤其是在医学成像方面(占指南的20.6%)。特定于GAI的指南(14.7%)缺乏全面覆盖和方法学透明度。
医学领域现有的AI报告指南在方法学严谨性、冗余性以及对GAI应用的覆盖不足方面存在问题。未来和更新的指南应优先考虑标准化的制定过程、多学科合作以及扩大对LLMs等新兴AI技术的关注。