Suppr超能文献

一项关于人工智能机器翻译工具的系统性多模态评估,以促进在国际上获取重症监护教育资源。

A systematic multimodal assessment of AI machine translation tools for enhancing access to critical care education internationally.

作者信息

Chen Christine L, Dong Yue, Castillo-Zambrano Claudia, Bencheqroun Hassan, Barwise Amelia, Hoffman Adria, Nalaie Keivan, Qiu Yishu, Boulekbache Oualid, Niven Alexander S

机构信息

Division of Internal Medicine, Mayo Clinic, 200 First St. SW, Rochester, MN, 55905, USA.

Division of Pulmonary and Critical Care Medicine, Mayo Clinic, 200 First St. SW, Rochester, MN, 55905, USA.

出版信息

BMC Med Educ. 2025 Jul 8;25(1):1022. doi: 10.1186/s12909-025-07452-9.

Abstract

BACKGROUND

Language barriers pose a significant barrier to expanding access to critical care education worldwide. Machine translation (MT) offers significant promise to increase accessibility to critical care content, and has rapidly evolved using newer artificial intelligence frameworks and large language models. The best approach to systematically apply and evaluate these tools, however, remains unclear.

METHODS

We developed a multimodal method to evaluate translations of critical care content used as part of an established international critical care education program. Four freely-available MT tools were selected (DeepL™, Google Gemini™, Google Translate™, Microsoft CoPilot™) and used to translate selected phrases and paragraphs into Chinese (Mandarin), Spanish, and Ukrainian. A human translation performed by a professional medical translator was used for comparison. These translations were compared using 1) blinded bilingual clinician evaluations using anchored Likert domains of fluency, adequacy, and meaning; 2) automated BiLingual Evaluation Understudy (BLEU) scores; and 3) validated system usability scale to assess the ease of use of MT tools. Blinded bilingual clinician evaluations were calculated as individual domains and averaged composite scores.

RESULTS

Blinded clinician composite scores were highest for human translation (Chinese), Google Gemini (Spanish), and Microsoft CoPilot (Ukrainian). Microsoft CoPilot (Chinese) and Google Translate (Spanish and Ukrainian) earned the lowest scores. All Chinese and Spanish versions received "understandable to good" or "high quality" BLEU scores, while Ukrainian overall scored "hard to get the gist" except using Microsoft CoPilot. Usability scores were highest with DeepL (Chinese), Google Gemini (Spanish), and Google Translate (Ukrainian), and lower with Microsoft CoPilot (Chinese and Ukrainian) and Google Translate (Spanish).

CONCLUSION

No single MT tool performed best across all metrics and languages, highlighting the importance of routine assessment of these tools during educational activities given their rapid ongoing evolution. We offer a multimodal evaluation methodology to aid this assessment as medical educators expand their use of MT in international educational programs.

摘要

背景

语言障碍对在全球范围内扩大重症监护教育的可及性构成了重大障碍。机器翻译(MT)为增加重症监护内容的可及性带来了巨大希望,并且已使用更新的人工智能框架和大语言模型迅速发展。然而,系统应用和评估这些工具的最佳方法仍不明确。

方法

我们开发了一种多模态方法,以评估作为既定国际重症监护教育计划一部分使用的重症监护内容的翻译。选择了四个免费的机器翻译工具(DeepL™、谷歌Gemini™、谷歌翻译™、微软Copilot™),并用于将选定的短语和段落翻译成中文(普通话)、西班牙语和乌克兰语。由专业医学翻译人员进行的人工翻译用于比较。使用以下方法对这些翻译进行比较:1)采用流畅性、充分性和意义的锚定李克特量表进行盲法双语临床医生评估;2)自动双语评估替代指标(BLEU)得分;3)经过验证的系统可用性量表,以评估机器翻译工具的易用性。盲法双语临床医生评估按各个领域计算,并得出平均综合得分。

结果

人工翻译(中文)、谷歌Gemini(西班牙语)和微软Copilot(乌克兰语)的盲法临床医生综合得分最高。微软Copilot(中文)以及谷歌翻译(西班牙语和乌克兰语)得分最低。所有中文和西班牙语版本的BLEU得分均为“可理解到良好”或“高质量”,而乌克兰语版本总体得分“难以理解主旨”,不过使用微软Copilot时除外。DeepL(中文)、谷歌Gemini(西班牙语)和谷歌翻译(乌克兰语)的可用性得分最高,微软Copilot(中文和乌克兰语)以及谷歌翻译(西班牙语)的可用性得分较低。

结论

没有单一的机器翻译工具在所有指标和语言上都表现最佳,鉴于这些工具正在迅速发展,这凸显了在教育活动中对其进行常规评估的重要性。随着医学教育工作者在国际教育计划中扩大对机器翻译的使用,我们提供了一种多模态评估方法来辅助这一评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验