Suppr超能文献

流行的人工智能大语言模型能否为有关肩袖撕裂的常见问题提供可靠答案?

Can popular AI large language models provide reliable answers to frequently asked questions about rotator cuff tears?

作者信息

Kolac Ulas Can, Karademir Orhan Mete, Ayik Gokhan, Kaymakoglu Mehmet, Familiari Filippo, Huri Gazi

机构信息

Department of Orthopedics and Traumatology, Hacettepe University Faculty of Medicine, Ankara, Turkey.

Faculty of Medicine, Hacettepe University, Ankara, Turkey.

出版信息

JSES Int. 2024 Nov 29;9(2):390-397. doi: 10.1016/j.jseint.2024.11.012. eCollection 2025 Mar.

Abstract

BACKGROUND

Rotator cuff tears are common upper-extremity injuries that significantly impair shoulder function, leading to pain, reduced range of motion, and a decrease in quality of life. With the increasing reliance on artificial intelligence large language models (AI LLMs) for health information, it is crucial to evaluate the quality and readability of the information provided by these models.

METHODS

A pool of 50 questions was generated related to rotator cuff tear by querying popular AI LLMs (ChatGPT 3.5, ChatGPT 4, Gemini, and Microsoft CoPilot) and using Google search. After that, responses from the AI LLMs were saved and evaluated. For information quality the DISCERN tool and a Likert Scale was used, for readability the Patient Education Materials Assessment Tool for Printable Materials (PEMAT) Understandability Score and the Flesch-Kincaid Reading Ease Score was used. Two orthopedic surgeons assessed the responses, and discrepancies were resolved by a senior author.

RESULTS

Out of 198 answers, the median DISCERN score was 40, with 56.6% considered sufficient. The Likert Scale showed 96% sufficiency. The median PEMAT Understandability score was 83.33, with 77.3% sufficiency, while the Flesch-Kincaid Reading Ease score had a median of 42.05 with 88.9% sufficiency. Overall, 39.8% of the answers were sufficient in both information quality and readability. Differences were found among AI models in DISCERN, Likert, PEMAT Understandability, and Flesch-Kincaid scores.

CONCLUSION

AI LLMs generally cannot offer sufficient information quality and readability. While they are not ready for use in medical field, they show a promising future. There is a necessity for continuous re-evaluation of these models due to their rapid evolution. Developing new, comprehensive tools for evaluating medical information quality and readability is crucial for ensuring these models can effectively support patient education. Future research should focus on enhancing readability and consistent information quality to better serve patients.

摘要

背景

肩袖撕裂是常见的上肢损伤,会严重损害肩部功能,导致疼痛、活动范围减小以及生活质量下降。随着人们越来越依赖人工智能大语言模型(AI LLMs)获取健康信息,评估这些模型提供信息的质量和可读性至关重要。

方法

通过查询流行的AI LLMs(ChatGPT 3.5、ChatGPT 4、Gemini和Microsoft CoPilot)并使用谷歌搜索,生成了一组与肩袖撕裂相关的50个问题。之后,保存并评估AI LLMs的回答。对于信息质量,使用了DISCERN工具和李克特量表;对于可读性,使用了可打印材料的患者教育材料评估工具(PEMAT)理解分数和弗莱什-金凯德阅读简易度分数。两名骨科医生评估了回答,分歧由一位资深作者解决。

结果

在198个答案中,DISCERN分数的中位数为40,56.6%被认为足够。李克特量表显示充足率为96%。PEMAT理解分数的中位数为83.33,充足率为77.3%,而弗莱什-金凯德阅读简易度分数的中位数为42.05,充足率为88.9%。总体而言,39.8%的答案在信息质量和可读性方面都足够。在DISCERN、李克特、PEMAT理解和弗莱什-金凯德分数方面,不同的AI模型之间存在差异。

结论

AI LLMs通常不能提供足够的信息质量和可读性。虽然它们还未准备好在医疗领域使用,但显示出有前途的未来。由于这些模型的快速发展,有必要对其进行持续重新评估。开发新的、全面的工具来评估医疗信息质量和可读性对于确保这些模型能够有效支持患者教育至关重要。未来的研究应专注于提高可读性和一致的信息质量,以更好地服务患者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79de/11962600/3d29d4d30612/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验