Suppr超能文献

评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究

Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.

作者信息

Tepe Murat, Emekli Emre

机构信息

Radiology, Mediclinic City Hospital, Dubai, ARE.

Radiology, Eskişehir Osmangazi University Health Practice and Research Hospital, Eskişehir, TUR.

出版信息

Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.

Abstract

Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy and aid in patient decision-making. Given the complexities involved in breast imaging procedures, accurate and comprehensible information is vital for patient engagement and compliance. This study aims to evaluate the readability and accuracy of the information provided by three prominent LLMs, ChatGPT-4, Gemini, and Microsoft Copilot, in response to frequently asked questions in breast imaging, assessing their potential to improve patient understanding and facilitate healthcare communication. Methodology We collected the most common questions on breast imaging from clinical practice and posed them to LLMs. We then evaluated the responses in terms of readability and accuracy. Responses from LLMs were analyzed for readability using the Flesch Reading Ease and Flesch-Kincaid Grade Level tests and for accuracy through a radiologist-developed Likert-type scale. Results The study found significant variations among LLMs. Gemini and Microsoft Copilot scored higher on readability scales (p < 0.001), indicating their responses were easier to understand. In contrast, ChatGPT-4 demonstrated greater accuracy in its responses (p < 0.001). Conclusions While LLMs such as ChatGPT-4 show promise in providing accurate responses, readability issues may limit their utility in patient education. Conversely, Gemini and Microsoft Copilot, despite being less accurate, are more accessible to a broader patient audience. Ongoing adjustments and evaluations of these models are essential to ensure they meet the diverse needs of patients, emphasizing the need for continuous improvement and oversight in the deployment of artificial intelligence technologies in healthcare.

摘要

背景 大型语言模型(LLMs),如ChatGPT-4、Gemini和Microsoft Copilot,已在包括医疗保健在内的各个领域发挥了重要作用,它们提高了健康素养并有助于患者决策。鉴于乳腺成像程序的复杂性,准确且易于理解的信息对于患者参与和依从性至关重要。本研究旨在评估ChatGPT-4、Gemini和Microsoft Copilot这三个著名的大型语言模型针对乳腺成像常见问题所提供信息的可读性和准确性,评估它们在提高患者理解和促进医疗沟通方面的潜力。

方法 我们从临床实践中收集了关于乳腺成像的最常见问题,并向大型语言模型提出这些问题。然后,我们从可读性和准确性方面评估回答。使用弗莱什易读性和弗莱什-金凯德年级水平测试分析大型语言模型回答的可读性,并通过放射科医生制定的李克特量表评估准确性。

结果 研究发现大型语言模型之间存在显著差异。Gemini和Microsoft Copilot在可读性量表上得分更高(p < 0.001),表明它们的回答更易于理解。相比之下,ChatGPT-4的回答准确性更高(p < 0.001)。

结论 虽然像ChatGPT-4这样的大型语言模型在提供准确回答方面有前景,但可读性问题可能会限制它们在患者教育中的效用。相反,Gemini和Microsoft Copilot尽管准确性较低,但对更广泛的患者群体来说更容易理解。对这些模型进行持续调整和评估对于确保它们满足患者的各种需求至关重要,强调在医疗保健中部署人工智能技术时需要持续改进和监督。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8885/11080394/a3d6ac4e7eb2/cureus-0016-00000059960-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验