Balas Michael, Kaplan Alexander J, Esmail Kaisra, Saleh Solin, Sharma Rahul A, Yan Peng, Arjmand Parnian
Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada.
Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, ON, Canada; Kensington Eye Institute, Toronto, ON, Canada.
Can J Ophthalmol. 2024 Dec 9. doi: 10.1016/j.jcjo.2024.11.003.
Our goal was to evaluate the efficacy of OpenAI's ChatGPT-4.0 large language model (LLM) in translating technical ophthalmology terminology into more comprehensible language for allied health care professionals and compare it with other LLMs.
Observational cross-sectional study.
Five ophthalmologists each contributed three clinical encounter notes, totaling 15 reports for analysis.
Notes were translated into more comprehensible language using ChatGPT-4.0, ChatGPT-4o, Claude 3 Sonnet, and Google Gemini. Ten family physicians, masked to whether the note was original or translated by an LLM, independently evaluated both sets using Likert scales to assess comprehension and utility for clinical decision-making. Readability was evaluated using Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Five ophthalmologist raters compared performance between LLMs and identified translation errors.
LLM translations significantly outperformed the original notes in terms of comprehension (mean score of 4.7/5.0 vs 3.7/5.0; p < 0.001) and perceived usefulness (mean score of 4.6/5.0 vs 3.8/5.0; p < 0.005). Readability analysis demonstrated mildly increased linguistic complexity in the translated notes. ChatGPT-4.0 was preferred in 8 of 15 cases, ChatGPT-4o in 4, Gemini in 3, and Claude 3 Sonnet in 0 cases. All models exhibited some translation errors, but ChatGPT-4o and ChatGPT-4.0 had fewer inaccuracies.
ChatGPT-4.0 can significantly enhance the comprehensibility of ophthalmic notes, facilitating better interprofessional communication and suggesting a promising role for LLMs in medical translation. However, the results also underscore the need for ongoing refinement and careful implementation of such technologies. Further research is needed to validate these findings across a broader range of specialties and languages.
我们的目标是评估OpenAI的ChatGPT-4.0大语言模型(LLM)将眼科专业术语翻译成更通俗易懂的语言供联合医疗保健专业人员使用的效果,并将其与其他大语言模型进行比较。
观察性横断面研究。
五位眼科医生每人提供三份临床会诊记录,共15份报告用于分析。
使用ChatGPT-4.0、ChatGPT-4o、Claude 3 Sonnet和谷歌Gemini将记录翻译成更通俗易懂的语言。十位家庭医生在不知道记录是原始记录还是由大语言模型翻译的情况下,使用李克特量表独立评估这两组记录,以评估其可理解性和对临床决策的实用性。使用弗莱什易读性和弗莱什-金凯德年级水平分数评估可读性。五位眼科医生评分者比较了大语言模型之间的表现并识别翻译错误。
在可理解性方面(平均得分4.7/5.0对3.7/5.0;p<0.001)和感知有用性方面(平均得分4.6/5.0对3.8/5.0;p<0.005),大语言模型的翻译明显优于原始记录。可读性分析表明翻译后的记录在语言复杂性上略有增加。在15个案例中,ChatGPT-4.0在8个案例中更受青睐,ChatGPT-4o在4个案例中更受青睐,Gemini在3个案例中更受青睐,Claude 3 Sonnet在0个案例中更受青睐。所有模型都出现了一些翻译错误,但ChatGPT-4o和ChatGPT-4.0的不准确之处较少。
ChatGPT-4.0可以显著提高眼科记录的可理解性,促进更好的跨专业交流,并表明大语言模型在医学翻译中具有广阔前景。然而,结果也强调了持续改进和谨慎应用此类技术的必要性。需要进一步研究以在更广泛的专业和语言范围内验证这些发现。