• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能与专业翻译的出院指导说明的准确性比较

Accuracy of Artificial Intelligence vs Professionally Translated Discharge Instructions.

作者信息

Martos Melissa, Fields Blanca, Finlayson Samuel G, Hartell Nigel, Kim Theresa, Larimer Emily, Lau Jason J, Lin Yu-Hsiang, Salaguinto Taylor, Tran Nguyen, Lion K Casey

机构信息

University of Washington, Seattle.

Seattle Children's Hospital, Seattle, Washington.

出版信息

JAMA Netw Open. 2025 Sep 2;8(9):e2532312. doi: 10.1001/jamanetworkopen.2025.32312.

DOI:10.1001/jamanetworkopen.2025.32312
PMID:40960827
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12444566/
Abstract

IMPORTANCE

Patients using languages other than English are a group at risk of poor health outcomes and encounter barriers to access of translated written materials. Although artificial intelligence (AI) may offer an opportunity to improve access, few studies have evaluated the accuracy and safety of AI translation for clinical care under routine practice conditions.

OBJECTIVE

To investigate the accuracy of AI translation compared with professional human translation of patient-specific issued pediatric inpatient discharge instructions.

DESIGN, SETTING, AND PARTICIPANTS: This comparative effectiveness analysis compared translations by a neural machine translation model vs professional translators using patient-specific pediatric inpatient discharge instructions received by families between May 18, 2023, and May 18, 2024, at a single center academic pediatric hospital. Instructions were translated to Simplified Chinese, Somali, Spanish, and Vietnamese by professional translators and the Azure AI system and then broken into scoring sections. Two professional translators per language evaluated translations (blinded to source) on an established 5-point scale for fluency, adequacy, meaning, and error severity, with 1 indicating worst performance and 5 indicating best performance.

EXPOSURE

AI vs professional translation.

MAIN OUTCOME AND MEASURE

Quality of discharge instruction translation, including fluency, adequacy, meaning, and severity of errors.

RESULTS

A total of 148 sections from 34 discharge instructions were analyzed. When considering all 4 languages together, average fluency, adequacy, and meaning were lower among AI compared with professional human translations. Among all tested languages, mean (SD) fluency for AI translations was 2.98 (1.12) compared with 3.90 (0.96) for professional translations (difference, 0.92; 95% CI, 0.83-1.01; P < .001), adequacy was 3.81 (1.14) compared with 4.56 (0.70) (difference, 0.74; 95% CI, 0.65-0.83; P < .001), meaning was 3.38 (1.15) compared with 4.28 (0.84) (difference, 0.90; 95% CI, 0.80-0.99; P < .001), and error severity was 3.53 (1.28) compared with 4.48 (0.88) (difference, 0.95; 95% CI, 0.85-1.06; P < .001). Compared with professional translations, the Spanish AI translations were noninferior in adequacy (difference, 0.08; 95% CI, -0.02 to 0.19) and error severity (difference, 0.03; 95% CI, -0.09 to 0.14) but inferior in fluency (difference, 0.38; 95% CI, 0.23-0.53) and just crossed the inferiority threshold in meaning (difference, 0.08; 95% CI, -0.04 to 0.20). The Chinese, Vietnamese, and Somali AI translations were inferior to the professional translations across all metrics, with the greatest differences for Somali.

CONCLUSIONS AND RELEVANCE

In this comparative effectiveness analysis of AI- vs professionally translated issued discharge instructions, AI-translated instructions performed similarly for Spanish but worse for other languages tested. Validation and clinical implementation of AI-based translation will require special attention to languages of lesser diffusion to prevent creating new inequities.

摘要

重要性

使用英语以外语言的患者健康结局较差,并且在获取翻译后的书面材料时会遇到障碍。虽然人工智能(AI)可能提供改善获取途径的机会,但很少有研究在常规实践条件下评估人工智能翻译在临床护理中的准确性和安全性。

目的

比较人工智能翻译与专业人工翻译针对特定患者的儿科住院患者出院指导的准确性。

设计、背景和参与者:这项比较有效性分析比较了神经机器翻译模型与专业翻译人员的翻译,使用的是2023年5月18日至2024年5月18日期间一家单中心学术儿科医院家庭收到的针对特定患者的儿科住院患者出院指导。专业翻译人员和Azure人工智能系统将指导翻译成简体中文、索马里语、西班牙语和越南语,然后分成评分部分。每种语言由两名专业翻译人员(对来源不知情)根据既定的5分制对翻译的流畅性、充分性、含义和错误严重程度进行评估,1表示最差表现,5表示最佳表现。

暴露因素

人工智能翻译与专业翻译。

主要结局和衡量指标

出院指导翻译的质量,包括流畅性、充分性、含义和错误严重程度。

结果

共分析了34份出院指导中的148个部分。综合考虑所有4种语言,与专业人工翻译相比,人工智能翻译的平均流畅性、充分性和含义较低。在所有测试语言中,人工智能翻译的平均(标准差)流畅性为2.98(1.12),而专业翻译为3.90(0.96)(差异为0.92;95%置信区间为0.83 - 1.01;P < 0.001),充分性为3.81(1.14),而专业翻译为4.56(0.70)(差异为0.74;95%置信区间为0.65 - 0.83;P < 0.001),含义为3.38(1.15),而专业翻译为4.28(0.84)(差异为0.90;95%置信区间为0.80 - 0.99;P < 0.001),错误严重程度为3.53(1.28),而专业翻译为4.48(0.88)(差异为0.95;95%置信区间为0.85 - 1.06;P < 0.001)。与专业翻译相比,西班牙语的人工智能翻译在充分性(差异为0.08;95%置信区间为 - 0.02至0.19)和错误严重程度(差异为0.03;95%置信区间为 - 0.09至0.14)方面非劣效,但在流畅性(差异为0.38;95%置信区间为0.23 - 0.53)方面较差,在含义方面刚好越过劣效阈值(差异为0.08;95%置信区间为 - 0.04至0.20)。中文、越南语和索马里语的人工智能翻译在所有指标上均劣于专业翻译,索马里语的差异最大。

结论与相关性

在这项关于人工智能翻译与专业翻译出院指导的比较有效性分析中,人工智能翻译的指导对于西班牙语表现相似,但对于其他测试语言表现较差。基于人工智能的翻译的验证和临床应用需要特别关注使用较少的语言,以防止产生新的不公平现象。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd60/12444566/5dd74e29e23e/jamanetwopen-e2532312-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd60/12444566/ce5f757baf46/jamanetwopen-e2532312-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd60/12444566/5dd74e29e23e/jamanetwopen-e2532312-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd60/12444566/ce5f757baf46/jamanetwopen-e2532312-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd60/12444566/5dd74e29e23e/jamanetwopen-e2532312-g002.jpg

相似文献

1
Accuracy of Artificial Intelligence vs Professionally Translated Discharge Instructions.人工智能与专业翻译的出院指导说明的准确性比较
JAMA Netw Open. 2025 Sep 2;8(9):e2532312. doi: 10.1001/jamanetworkopen.2025.32312.
2
Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework.使用标准化框架评估大型语言模型在将患者指导说明翻译成西班牙语方面的表现。
JAMA Pediatr. 2025 Jul 7. doi: 10.1001/jamapediatrics.2025.1729.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Translation from english into Urdu of a clinical decision tool to screen older women with back pain for osteoporotic-related vertebral fragility fractures.将一种临床决策工具从英文翻译成乌尔都语,用于筛查有背痛的老年女性是否患有骨质疏松相关的椎体脆性骨折。
BMC Musculoskelet Disord. 2025 Jul 18;26(1):691. doi: 10.1186/s12891-025-08837-z.
5
A systematic multimodal assessment of AI machine translation tools for enhancing access to critical care education internationally.一项关于人工智能机器翻译工具的系统性多模态评估,以促进在国际上获取重症监护教育资源。
BMC Med Educ. 2025 Jul 8;25(1):1022. doi: 10.1186/s12909-025-07452-9.
6
Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis.患者特定出院指导机器翻译的准确性和安全性评估:一项对比分析。
BMJ Qual Saf. 2025 Jul 9. doi: 10.1136/bmjqs-2024-018384.
7
Sexual Harassment and Prevention Training性骚扰与预防培训
8
Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study.使用人工智能翻译和简化西班牙语骨科医学文本:仪器验证研究。
JMIR AI. 2025 Mar 21;4:e70222. doi: 10.2196/70222.
9
Can machine translation match human expertise? Quantifying the performance of large language models in the translation of patient-reported outcome measures (PROMs).机器翻译能与人类专业水平相媲美吗?量化大型语言模型在患者报告结局量表(PROMs)翻译中的表现。
J Patient Rep Outcomes. 2025 Jul 25;9(1):94. doi: 10.1186/s41687-025-00926-w.
10
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

本文引用的文献

1
Evaluating the quality and equity of patient hospital discharge instructions.评估患者医院出院指导的质量和公平性。
BMC Health Serv Res. 2025 Feb 21;25(1):291. doi: 10.1186/s12913-025-12410-8.
2
Artificial Intelligence for Language Translation: The Equity Is in the Details.用于语言翻译的人工智能:公平性体现在细节之中。
JAMA. 2024 Nov 5;332(17):1427-1428. doi: 10.1001/jama.2024.15296.
3
Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation.ChatGPT 和谷歌翻译在儿科出院医嘱翻译中的性能。
Pediatrics. 2024 Jul 1;154(1). doi: 10.1542/peds.2023-065573.
4
The utility of ChatGPT as a generative medical translator.ChatGPT 在生成式医学翻译中的效用。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6161-6165. doi: 10.1007/s00405-024-08708-8. Epub 2024 May 5.
5
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation.通过非平行语料库改进低资源语言的神经机器翻译:以埃及方言到现代标准阿拉伯语的翻译为例
Sci Rep. 2024 Jan 27;14(1):2265. doi: 10.1038/s41598-023-51090-4.
6
The Association of Limited English Proficiency With Morbidity and Mortality After Trauma.有限英语能力与创伤后发病率和死亡率的关系。
J Surg Res. 2022 Dec;280:326-332. doi: 10.1016/j.jss.2022.07.044. Epub 2022 Aug 26.
7
Empowering patients: simplifying discharge instructions.赋予患者权力:简化出院指导。
BMJ Open Qual. 2021 Sep;10(3). doi: 10.1136/bmjoq-2021-001419.
8
The Clinician and Dataset Shift in Artificial Intelligence.临床医生与人工智能中的数据集偏移
N Engl J Med. 2021 Jul 15;385(3):283-286. doi: 10.1056/NEJMc2104626.
9
Readability of Patient Discharge Instructions.患者出院指导的可读性。
J Gen Intern Med. 2022 May;37(7):1797-1798. doi: 10.1007/s11606-021-06988-y. Epub 2021 Jul 8.
10
Accuracy of Google Translate in translating the directions and counseling points for top-selling drugs from English to Arabic, Chinese, and Spanish.谷歌翻译将畅销药物的使用说明和咨询要点从英语翻译成阿拉伯语、中文和西班牙语的准确性。
Am J Health Syst Pharm. 2021 Nov 9;78(22):2053-2058. doi: 10.1093/ajhp/zxab224.