• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于临床决策支持的人工智能研究定量评估的 APPRAISE-AI 工具。

APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support.

机构信息

Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada.

Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada.

出版信息

JAMA Netw Open. 2023 Sep 5;6(9):e2335377. doi: 10.1001/jamanetworkopen.2023.35377.

DOI:10.1001/jamanetworkopen.2023.35377
PMID:37747733
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10520738/
Abstract

IMPORTANCE

Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question.

OBJECTIVE

To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support.

DESIGN, SETTING, AND PARTICIPANTS: This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022.

MAIN OUTCOMES AND MEASURES

The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement.

RESULTS

A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores.

CONCLUSIONS AND RELEVANCE

In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support.

摘要

重要性:人工智能(AI)在医疗保健领域引起了广泛关注,但人们对适当的方法和公平性表示担忧。目前的 AI 报告指南没有提供一种方法来量化 AI 研究的整体质量,限制了它们比较针对同一临床问题的模型的能力。

目的:开发一种工具(APPRAISE-AI),用于评估用于临床决策支持的 AI 预测模型的方法学和报告质量。

设计、设置和参与者:这项质量改进研究使用 APPRAISE-AI 工具评估了模型开发、静默和临床试验阶段的 AI 研究,这是一种用于评估 6 个领域 AI 研究质量的定量方法:临床相关性、数据质量、方法学实施、结果稳健性、报告质量和可重复性。这些领域包括 24 个项目,总分最高为 100 分。为每个项目分配分数,分数越高表示方法学或报告质量越强。该工具应用于一项关于机器学习的脓毒症系统评价,其中包括截至 2019 年 9 月 13 日发表的文章。数据分析于 2022 年 9 月至 12 月进行。

主要结果和措施:主要结果是 APPRAISE-AI 评分的内部和内部可靠性以及与专家评分、3 年引用率、QUADAS-2 低风险偏倚域数量和整体对多变量预测模型个体预后或诊断透明报告(TRIPOD)声明的相关性。

结果:共纳入 28 项研究。总体 APPRAISE-AI 评分范围为 33(低质量)至 67(高质量)。大多数研究为中等质量。得分最低的 5 个项目包括数据来源、样本量计算、偏差评估、误差分析和透明度。总体 APPRAISE-AI 评分与专家评分相关(Spearman ρ,0.82;95%CI,0.64-0.91;P<0.001),3 年引用率(Spearman ρ,0.69;95%CI,0.43-0.85;P<0.001),QUADAS-2 低风险偏倚域数量(Spearman ρ,0.56;95%CI,0.24-0.77;P=0.002)和对 TRIPOD 声明的遵守情况(Spearman ρ,0.87;95%CI,0.73-0.94;P<0.001)。个体项目的内部和内部可靠性的组内相关系数范围为 0.74 至 1.00,个体域为 0.81 至 0.99,总体评分为 0.91 至 0.98。

结论和相关性:在这项质量改进研究中,APPRAISE-AI 表现出很强的内部和内部可靠性,与多项研究质量指标相关性良好。该工具可为研究人员、审查员、编辑和资助组织提供一种定量方法,用于比较用于临床决策支持的 AI 研究的研究质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb52/10520738/33034b3546d3/jamanetwopen-e2335377-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb52/10520738/33034b3546d3/jamanetwopen-e2335377-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb52/10520738/33034b3546d3/jamanetwopen-e2335377-g001.jpg

相似文献

1
APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support.用于临床决策支持的人工智能研究定量评估的 APPRAISE-AI 工具。
JAMA Netw Open. 2023 Sep 5;6(9):e2335377. doi: 10.1001/jamanetworkopen.2023.35377.
2
Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.基于人工智能的诊断和预后预测模型研究报告指南(TRIPOD-AI)和偏倚风险工具(PROBAST-AI)制定方案。
BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Application of STREAM-URO and APPRAISE-AI reporting standards for artificial intelligence studies in pediatric urology: A case example with pediatric hydronephrosis.应用 STREAM-URO 和 APPRAISE-AI 报告标准进行儿科泌尿外科人工智能研究:以小儿肾积水为例。
J Pediatr Urol. 2024 Jun;20(3):455-467. doi: 10.1016/j.jpurol.2024.01.020. Epub 2024 Jan 29.
5
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
6
Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques.使用机器学习技术的预测模型研究的方法学和报告质量的系统评价议定书。
BMJ Open. 2020 Nov 11;10(11):e038832. doi: 10.1136/bmjopen-2020-038832.
7
Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies Using AI (QUADAS-AI): Protocol for a Qualitative Study.基于人工智能的诊断准确性研究质量评估修订工具(QUADAS-AI):一项定性研究的方案。
JMIR Res Protoc. 2024 Sep 18;13:e58202. doi: 10.2196/58202.
8
Assessment of bias in scoring of AI-based radiotherapy segmentation and planning studies using modified TRIPOD and PROBAST guidelines as an example.以修改后的 TRIPOD 和 PROBAST 指南为例评估基于人工智能的放射治疗分割和计划研究的评分偏倚。
Radiother Oncol. 2024 May;194:110196. doi: 10.1016/j.radonc.2024.110196. Epub 2024 Mar 2.
9
Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review.骨科诊断成像中机器学习模型的质量评估:一项系统综述。
Artif Intell Med. 2022 Oct;132:102396. doi: 10.1016/j.artmed.2022.102396. Epub 2022 Sep 6.
10
Development and Validation of the Morphea Activity Measure in Patients With Pediatric Morphea.儿童硬斑病患者中硬斑病活动度测量表的制定和验证。
JAMA Dermatol. 2023 Mar 1;159(3):299-307. doi: 10.1001/jamadermatol.2022.6365.

引用本文的文献

1
Five years after CONSORT-AI, not much has changed: a call to action for artificial intelligence research in oncology.在CONSORT-AI发布五年后,情况变化不大:呼吁开展肿瘤学人工智能研究。
BMJ Oncol. 2025 Aug 24;4(1):e000891. doi: 10.1136/bmjonc-2025-000891. eCollection 2025.
2
Use of soft tissue repair as a hip dislocation preventive strategy following a total hip arthroplasty by posterior and posterolateral approach in patients with osteoarthritis: a systematic scoping review.在骨关节炎患者中,采用软组织修复作为全髋关节置换术后经后路和后外侧入路预防髋关节脱位的策略:一项系统的范围综述。
J Orthop Surg Res. 2025 Jul 14;20(1):648. doi: 10.1186/s13018-025-05746-8.
3

本文引用的文献

1
There is no such thing as a validated prediction model.没有经过验证的预测模型这种东西。
BMC Med. 2023 Feb 24;21(1):70. doi: 10.1186/s12916-023-02779-w.
2
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.类别不平衡校正对风险预测模型的危害:使用逻辑回归进行说明和模拟。
J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.
3
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.
Predicting outcomes after moderate and severe traumatic brain injury using artificial intelligence: a systematic review.
使用人工智能预测中度和重度创伤性脑损伤后的结果:一项系统综述。
NPJ Digit Med. 2025 Jun 18;8(1):373. doi: 10.1038/s41746-025-01714-y.
4
Implications of artificial intelligence in periodontal treatment maintenance: a scoping review.人工智能在牙周治疗维护中的应用:一项范围综述
Front Oral Health. 2025 May 14;6:1561128. doi: 10.3389/froh.2025.1561128. eCollection 2025.
5
Identification of Key Genes and Potential Therapeutic Targets in Sepsis-Associated Acute Kidney Injury Using Transformer and Machine Learning Approaches.利用Transformer和机器学习方法鉴定脓毒症相关性急性肾损伤中的关键基因和潜在治疗靶点
Bioengineering (Basel). 2025 May 16;12(5):536. doi: 10.3390/bioengineering12050536.
6
International visualization analysis of research hotspots and development trends in the study of clinical decision support systems utilizing CiteSpace.利用CiteSpace对临床决策支持系统研究热点与发展趋势的国际可视化分析
Front Med (Lausanne). 2025 Apr 28;12:1546611. doi: 10.3389/fmed.2025.1546611. eCollection 2025.
7
Artificial intelligence tool development: what clinicians need to know?人工智能工具开发:临床医生需要了解什么?
BMC Med. 2025 Apr 24;23(1):244. doi: 10.1186/s12916-025-04076-0.
8
Utility of disease probability scores to guide decision-making during screening for phaeochromocytoma and paraganglioma: a machine learning modelling cross sectional study.疾病概率评分在嗜铬细胞瘤和副神经节瘤筛查中指导决策的效用:一项机器学习建模横断面研究。
EClinicalMedicine. 2025 Mar 29;82:103181. doi: 10.1016/j.eclinm.2025.103181. eCollection 2025 Apr.
9
Machine learning prediction of premature death from multimorbidity among people with inflammatory bowel disease: a population-based retrospective cohort study.炎症性肠病患者多种疾病导致过早死亡的机器学习预测:一项基于人群的回顾性队列研究。
CMAJ. 2025 Mar 24;197(11):E286-E297. doi: 10.1503/cmaj.241117.
10
AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines.人工智能在软组织和骨肿瘤放射成像中的应用:一项对照CLAIM和FUTURE-AI指南进行评估的系统综述
EBioMedicine. 2025 Apr;114:105642. doi: 10.1016/j.ebiom.2025.105642. Epub 2025 Mar 20.
停止为高风险决策解释黑箱机器学习模型,转而使用可解释模型。
Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.
4
Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.人工智能驱动的决策支持系统早期临床评估报告指南:DECIDE-AI。
Nat Med. 2022 May;28(5):924-933. doi: 10.1038/s41591-022-01772-9. Epub 2022 May 18.
5
The medical algorithmic audit.医学算法审计
Lancet Digit Health. 2022 May;4(5):e384-e397. doi: 10.1016/S2589-7500(22)00003-6. Epub 2022 Apr 5.
6
Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review.基于机器学习的肿瘤预后预测模型的方法学研究:系统评价。
BMC Med Res Methodol. 2022 Apr 8;22(1):101. doi: 10.1186/s12874-022-01577-x.
7
The false hope of current approaches to explainable artificial intelligence in health care.当前医疗保健中可解释人工智能方法的虚假希望。
Lancet Digit Health. 2021 Nov;3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9.
8
Evaluation of artificial intelligence on a reference standard based on subjective interpretation.基于主观解读的参考标准对人工智能的评估。
Lancet Digit Health. 2021 Nov;3(11):e693-e695. doi: 10.1016/S2589-7500(21)00216-8. Epub 2021 Sep 21.
9
Standardized Reporting of Machine Learning Applications in Urology: The STREAM-URO Framework.机器学习在泌尿外科中的标准化报告:STREAM-URO 框架。
Eur Urol Focus. 2021 Jul;7(4):672-682. doi: 10.1016/j.euf.2021.07.004. Epub 2021 Aug 3.
10
The Clinician and Dataset Shift in Artificial Intelligence.临床医生与人工智能中的数据集偏移
N Engl J Med. 2021 Jul 15;385(3):283-286. doi: 10.1056/NEJMc2104626.