Andalib Saman, Spina Aidin, Picton Bryce, Solomon Sean S, Scolaro John A, Nelson Ariana M
UCI School of Medicine, University of California, 1001 Health Sciences Rd, Irvine, CA, 92617, United States, 1 (949) 824-6119.
Department of Orthopaedic Surgery, UC Irvine Health, Orange, United States.
JMIR AI. 2025 Mar 21;4:e70222. doi: 10.2196/70222.
Language barriers contribute significantly to health care disparities in the United States, where a sizable proportion of patients are exclusively Spanish speakers. In orthopedic surgery, such barriers impact both patients' comprehension of and patients' engagement with available resources. Studies have explored the utility of large language models (LLMs) for medical translation but have yet to robustly evaluate artificial intelligence (AI)-driven translation and simplification of orthopedic materials for Spanish speakers.
This study used the bilingual evaluation understudy (BLEU) method to assess translation quality and investigated the ability of AI to simplify patient education materials (PEMs) in Spanish.
PEMs (n=78) from the American Academy of Orthopaedic Surgery were translated from English to Spanish, using 2 LLMs (GPT-4 and Google Translate). The BLEU methodology was applied to compare AI translations with professionally human-translated PEMs. The Friedman test and Dunn multiple comparisons test were used to statistically quantify differences in translation quality. A readability analysis and feature analysis were subsequently performed to evaluate text simplification success and the impact of English text features on BLEU scores. The capability of an LLM to simplify medical language written in Spanish was also assessed.
As measured by BLEU scores, GPT-4 showed moderate success in translating PEMs into Spanish but was less successful than Google Translate. Simplified PEMs demonstrated improved readability when compared to original versions (P<.001) but were unable to reach the targeted grade level for simplification. The feature analysis revealed that the total number of syllables and average number of syllables per sentence had the highest impact on BLEU scores. GPT-4 was able to significantly reduce the complexity of medical text written in Spanish (P<.001).
Although Google Translate outperformed GPT-4 in translation accuracy, LLMs, such as GPT-4, may provide significant utility in translating medical texts into Spanish and simplifying such texts. We recommend considering a dual approach-using Google Translate for translation and GPT-4 for simplification-to improve medical information accessibility and orthopedic surgery education among Spanish-speaking patients.
在美国,语言障碍极大地加剧了医疗保健方面的差异,因为有相当一部分患者只会说西班牙语。在骨科手术中,此类障碍既影响患者对现有资源的理解,也影响患者对这些资源的利用。已有研究探讨了大语言模型(LLMs)在医学翻译中的效用,但尚未对人工智能(AI)驱动的、面向说西班牙语患者的骨科材料翻译及简化进行有力评估。
本研究采用双语评估替补(BLEU)方法来评估翻译质量,并调查AI简化西班牙语患者教育材料(PEMs)的能力。
使用两个大语言模型(GPT-4和谷歌翻译)将美国骨科医师学会的患者教育材料(n = 78)从英语翻译成西班牙语。应用BLEU方法将AI翻译与专业人工翻译的患者教育材料进行比较。使用弗里德曼检验和邓恩多重比较检验对翻译质量的差异进行统计学量化。随后进行可读性分析和特征分析,以评估文本简化的成效以及英语文本特征对BLEU分数的影响。还评估了一个大语言模型简化用西班牙语撰写的医学语言的能力。
根据BLEU分数衡量,GPT-4在将患者教育材料翻译成西班牙语方面取得了一定成功,但不如谷歌翻译成功。与原始版本相比,简化后的患者教育材料可读性有所提高(P <.001),但未能达到简化的目标年级水平。特征分析表明,音节总数和平均每句音节数对BLEU分数的影响最大。GPT-4能够显著降低用西班牙语撰写的医学文本的复杂性(P <.001)。
尽管谷歌翻译在翻译准确性方面优于GPT-4,但像GPT-4这样的大语言模型在将医学文本翻译成西班牙语并简化此类文本方面可能具有显著效用。我们建议考虑采用双重方法——使用谷歌翻译进行翻译,使用GPT-4进行简化——以提高说西班牙语患者获取医疗信息的机会和骨科手术教育水平。