• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能在眼病诊断中的工作流程、外部验证与发展

AI Workflow, External Validation, and Development in Eye Disease Diagnosis.

作者信息

Chen Qingyu, Keenan Tiarnan D L, Agron Elvira, Allot Alexis, Guan Emily, Duong Bryant, Elsawy Amr, Hou Benjamin, Xue Cancan, Bhandari Sanjeeb, Broadhead Geoffrey, Cousineau-Krieger Chantal, Davis Ellen, Gensheimer William G, Golshani Cyrus A, Grasic David, Gupta Seema, Haddock Luis, Konstantinou Eleni, Lamba Tania, Maiberger Michele, Mantopoulos Dimosthenis, Mehta Mitul C, Elnahry Ayman G, Al-Nawaflh Mutaz, Oshinsky Arnold, Powell Brittany E, Purt Boonkit, Shin Soo, Stiefel Hillary, Thavikulwat Alisa T, Wroblewski Keith James, Tham Yih Chung, Cheung Chui Ming Gemmy, Cheng Ching-Yu, Chew Emily Y, Hribar Michelle R, Chiang Michael F, Lu Zhiyong

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, Maryland.

Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, Connecticut.

出版信息

JAMA Netw Open. 2025 Jul 1;8(7):e2517204. doi: 10.1001/jamanetworkopen.2025.17204.

DOI:10.1001/jamanetworkopen.2025.17204
PMID:40668583
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12268484/
Abstract

IMPORTANCE

Timely disease diagnosis is challenging due to limited clinical availability and growing burdens. Although artificial intelligence (AI) has shown expert-level diagnostic accuracy, a lack of downstream accountability, including workflow integration, external validation, and further development, continues to hinder its clinical adoption.

OBJECTIVE

To address gaps in the downstream accountability of medical AI through a case study on age-related macular degeneration (AMD) diagnosis and severity classification.

DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study developed and evaluated an AI-assisted diagnostic and classification workflow for AMD. Four rounds of diagnostic assessments (accuracy and time) were conducted with 24 clinicians from 12 institutions. Each round was randomized and alternated between manual (clinician diagnosis) and manual plus AI (clinician assisted by AI diagnosis), with a 1-month washout period. In total, 2880 AMD risk features were evaluated across 960 images from 240 Age-Related Eye Disease Study patient samples, both with and without AI assistance. For further development, the original DeepSeeNet model was enhanced into the DeepSeeNet+ model using 39 196 additional images from the US population and tested on 3 datasets, including an external set from Singapore.

EXPOSURE

Age-related macular degeneration risk features.

MAIN OUTCOMES AND MEASURES

The F1 score for accuracy (Wilcoxon rank sum test) and diagnostic time (linear mixed-effects model) were measured, comparing manual vs manual plus AI. For further development, the F1 score (Wilcoxon rank sum test) was again used.

RESULTS

Among 240 patients (mean [SD] age, 68.5 [5.0] years; 127 female [53%]), AI assistance significantly improved accuracy for 23 of 24 clinicians, increasing the mean F1 score from 37.71 (95% CI, 27.83-44.17) to 45.52 (95% CI, 39.01-51.61), with some improvements exceeding 50%. Manual diagnosis initially took an estimated 39.8 seconds (95% CI, 34.1-45.6 seconds) per patient, whereas manual plus AI saved 10.3 seconds (95% CI, -15.1 to -5.5 seconds) and remained faster by 6.9 seconds (95% CI, 0.2-13.7 seconds) to 8.6 seconds (95% CI, 1.8-15.3 seconds) in subsequent rounds. However, combining manual and AI did not always yield the highest accuracy or efficiency, underscoring challenges in explainability and trust. The DeepSeeNet+ model performed better in 3 test sets, achieving a significantly higher F1 score than the Singapore cohort (52.43 [95% CI, 44.38-61.00] vs 38.95 [95% CI, 30.50-47.45]).

CONCLUSIONS AND RELEVANCE

In this diagnostic study, AI assistance was associated with improved accuracy and time efficiency for AMD diagnosis. Further development is essential for enhancing AI generalizability across diverse populations. These findings highlight the need for downstream accountability during early-stage clinical evaluations of medical AI.

摘要

重要性

由于临床可用性有限且负担不断增加,及时进行疾病诊断具有挑战性。尽管人工智能(AI)已显示出专家级的诊断准确性,但缺乏包括工作流程整合、外部验证和进一步开发在内的下游问责制,仍然阻碍了其在临床中的应用。

目的

通过一项关于年龄相关性黄斑变性(AMD)诊断和严重程度分类的案例研究,解决医学人工智能下游问责制方面的差距。

设计、设置和参与者:这项诊断研究开发并评估了一种用于AMD的人工智能辅助诊断和分类工作流程。对来自12个机构的24名临床医生进行了四轮诊断评估(准确性和时间)。每一轮评估都是随机的,在手动(临床医生诊断)和手动加人工智能(临床医生由人工智能辅助诊断)之间交替进行,有1个月的洗脱期。总共对来自240例年龄相关性眼病研究患者样本的960张图像中的2880个AMD风险特征进行了评估,评估过程中有无人工智能辅助。为了进一步开发,使用来自美国人群的另外39196张图像将原始的DeepSeeNet模型增强为DeepSeeNet+模型,并在3个数据集上进行测试,包括来自新加坡的一个外部数据集。

暴露因素

年龄相关性黄斑变性风险特征。

主要结局和测量指标

测量准确性的F1分数(Wilcoxon秩和检验)和诊断时间(线性混合效应模型),比较手动诊断与手动加人工智能诊断。为了进一步开发,再次使用F1分数(Wilcoxon秩和检验)。

结果

在240例患者(平均[标准差]年龄,68.5[5.0]岁;127名女性[53%])中,人工智能辅助显著提高了24名临床医生中23人的准确性,将平均F1分数从37.71(95%置信区间,27.83 - 44.17)提高到45.52(95%置信区间,39.01 - 51.61),有些提高超过了50%。最初,手动诊断每名患者估计需要39.8秒(95%置信区间,34.1 - 45.6秒),而手动加人工智能节省了10.3秒(95%置信区间,-15.1至-5.5秒),并且在随后的轮次中仍比手动诊断快6.9秒(95%置信区间,0.2 - 13.7秒)至8.6秒(95%置信区间,1.8 - 15.3秒)。然而,将手动诊断和人工智能相结合并不总是能产生最高的准确性或效率,这凸显了可解释性和信任方面的挑战。DeepSeeNet+模型在3个测试集中表现更好,其F1分数显著高于新加坡队列(52.43[95%置信区间,44.38 - 61.00]对38.95[95%置信区间,30.50 - 47.45])。

结论及相关性

在这项诊断研究中,人工智能辅助与提高AMD诊断的准确性和时间效率相关。进一步开发对于提高人工智能在不同人群中的通用性至关重要。这些发现凸显了在医学人工智能早期临床评估期间进行下游问责制的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/26efb4f4c779/jamanetwopen-e2517204-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/3063c975995d/jamanetwopen-e2517204-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/255618f63f37/jamanetwopen-e2517204-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/0c8480d173f3/jamanetwopen-e2517204-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/26efb4f4c779/jamanetwopen-e2517204-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/3063c975995d/jamanetwopen-e2517204-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/255618f63f37/jamanetwopen-e2517204-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/0c8480d173f3/jamanetwopen-e2517204-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2f8/12268484/26efb4f4c779/jamanetwopen-e2517204-g004.jpg

相似文献

1
AI Workflow, External Validation, and Development in Eye Disease Diagnosis.人工智能在眼病诊断中的工作流程、外部验证与发展
JAMA Netw Open. 2025 Jul 1;8(7):e2517204. doi: 10.1001/jamanetworkopen.2025.17204.
2
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
3
Deep Learning-Based Precision Cropping of Eye Regions in Strabismus Photographs: Algorithm Development and Validation Study for Workflow Optimization.基于深度学习的斜视照片眼部区域精准裁剪:用于工作流程优化的算法开发与验证研究
J Med Internet Res. 2025 Jul 17;27:e74402. doi: 10.2196/74402.
4
AI-Assisted vs Unassisted Identification of Prostate Cancer in Magnetic Resonance Images.磁共振图像中人工智能辅助与非辅助前列腺癌识别
JAMA Netw Open. 2025 Jun 2;8(6):e2515672. doi: 10.1001/jamanetworkopen.2025.15672.
5
AI-based Hepatic Steatosis Detection and Integrated Hepatic Assessment from Cardiac CT Attenuation Scans Enhances All-cause Mortality Risk Stratification: A Multi-center Study.基于人工智能的心脏CT衰减扫描检测肝脂肪变性及综合肝脏评估可增强全因死亡风险分层:一项多中心研究
medRxiv. 2025 Jun 11:2025.06.09.25329157. doi: 10.1101/2025.06.09.25329157.
6
Artificial intelligence-assisted detection of nasopharyngeal carcinoma on endoscopic images: a national, multicentre, model development and validation study.人工智能辅助内镜图像检测鼻咽癌:一项全国性、多中心的模型开发与验证研究。
Lancet Digit Health. 2025 Jun;7(6):100869. doi: 10.1016/j.landig.2025.03.001. Epub 2025 Jun 20.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
External Validation of an Upgraded AI Model for Screening Ileocolic Intussusception Using Pediatric Abdominal Radiographs: Multicenter Retrospective Study.使用儿科腹部X光片筛查回结肠套叠的升级人工智能模型的外部验证:多中心回顾性研究
J Med Internet Res. 2025 Jul 8;27:e72097. doi: 10.2196/72097.
9
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
10
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

本文引用的文献

1
An Updated Simplified Severity Scale for Age-Related Macular Degeneration Incorporating Reticular Pseudodrusen: Age-Related Eye Disease Study Report Number 42.年龄相关性黄斑变性改良简易严重程度分级标准纳入网状假性玻璃膜疣:年龄相关性眼病研究报告第 42 号。
Ophthalmology. 2024 Oct;131(10):1164-1174. doi: 10.1016/j.ophtha.2024.04.011. Epub 2024 Apr 23.
2
A deep network DeepOpacityNet for detection of cataracts from color fundus photographs.一种用于从彩色眼底照片中检测白内障的深度网络DeepOpacityNet。
Commun Med (Lond). 2023 Dec 16;3(1):184. doi: 10.1038/s43856-023-00410-w.
3
Disparities in Eye Care Access and Utilization: A Narrative Review.
眼科保健可及性和利用的差距:叙事性综述。
Annu Rev Vis Sci. 2023 Sep 15;9:15-37. doi: 10.1146/annurev-vision-112122-020934. Epub 2023 May 30.
4
Diagnostic test accuracy of artificial intelligence-based imaging for lung cancer screening: A systematic review and meta-analysis.基于人工智能成像的肺癌筛查诊断测试准确性:一项系统评价与荟萃分析。
Lung Cancer. 2023 Feb;176:4-13. doi: 10.1016/j.lungcan.2022.12.002. Epub 2022 Dec 15.
5
Reticular Pseudodrusen: The Third Macular Risk Feature for Progression to Late Age-Related Macular Degeneration: Age-Related Eye Disease Study 2 Report 30.网状假性玻璃膜疣:晚期年龄相关性黄斑变性进展的第三个黄斑风险特征:年龄相关性眼病研究 2 报告 30。
Ophthalmology. 2022 Oct;129(10):1107-1119. doi: 10.1016/j.ophtha.2022.05.021. Epub 2022 May 31.
6
Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.人工智能驱动的决策支持系统早期临床评估报告指南:DECIDE-AI。
Nat Med. 2022 May;28(5):924-933. doi: 10.1038/s41591-022-01772-9. Epub 2022 May 18.
7
Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda.疾病诊断中的人工智能:系统文献综述、综合框架及未来研究议程
J Ambient Intell Humaniz Comput. 2023;14(7):8459-8486. doi: 10.1007/s12652-021-03612-z. Epub 2022 Jan 13.
8
DeepLensNet: Deep Learning Automated Diagnosis and Quantitative Classification of Cataract Type and Severity.DeepLensNet:深度学习自动诊断和白内障类型及严重程度的定量分类。
Ophthalmology. 2022 May;129(5):571-584. doi: 10.1016/j.ophtha.2021.12.017. Epub 2022 Jan 3.
9
The Collaborative Community on Ophthalmic Imaging: Accelerating Global Innovation and Clinical Utility.眼科成像协作社区:加速全球创新与临床应用。
Ophthalmology. 2022 Feb;129(2):e9-e13. doi: 10.1016/j.ophtha.2021.10.001. Epub 2021 Nov 10.
10
Evaluation framework to guide implementation of AI systems into healthcare settings.指导将人工智能系统引入医疗保健环境的实施的评估框架。
BMJ Health Care Inform. 2021 Oct;28(1). doi: 10.1136/bmjhci-2021-100444.