Suppr超能文献

评估大型语言模型在为脉络膜黑色素瘤患者提供支持方面的准确性。

Assessing large language models' accuracy in providing patient support for choroidal melanoma.

机构信息

Moorfields Eye Hospital NHS Foundation Trust, City Road London, London, UK.

Department of Ophthalmology, Inselspital University Hospital of Bern, Bern, Switzerland.

出版信息

Eye (Lond). 2024 Nov;38(16):3113-3117. doi: 10.1038/s41433-024-03231-w. Epub 2024 Jul 13.

Abstract

PURPOSE

This study aimed to evaluate the accuracy of information that patients can obtain from large language models (LLMs) when seeking answers to common questions about choroidal melanoma.

METHODS

Comparative study comparing frequently asked questions from choroidal melanoma patients and queried three major LLMs-ChatGPT 3.5, Bing AI, and DocsGPT. Answers were reviewed by three ocular oncology experts and scored as accurate, partially accurate, or inaccurate. Statistical analysis compared the quality of responses across models.

RESULTS

For medical advice questions, ChatGPT gave 92% accurate responses compared to 58% for Bing AI and DocsGPT. For pre/post-op questions, ChatGPT and Bing AI were 86% accurate while DocsGPT was 73% accurate. There were no statistically significant differences between models. ChatGPT responses were the longest while Bing AI responses were the shortest, but length did not affect accuracy. All LLMs appropriately directed patients to seek medical advice from professionals.

CONCLUSION

LLMs show promising capability to address common choroidal melanoma patient questions at generally acceptable accuracy levels. However, inconsistent, and inaccurate responses do occur, highlighting the need for improved fine-tuning and oversight before integration into clinical practice.

摘要

目的

本研究旨在评估患者在寻求有关脉络膜黑色素瘤常见问题的答案时,从大型语言模型(LLM)获得的信息的准确性。

方法

这是一项比较研究,比较了脉络膜黑色素瘤患者的常见问题和三个主要的 LLM(ChatGPT 3.5、Bing AI 和 DocsGPT)查询的问题。答案由三位眼肿瘤科专家进行审查,并评为准确、部分准确或不准确。对跨模型的响应质量进行了统计分析。

结果

在医疗建议问题上,ChatGPT 的准确回答率为 92%,而 Bing AI 和 DocsGPT 为 58%。在术前/术后问题上,ChatGPT 和 Bing AI 的准确率为 86%,而 DocsGPT 为 73%。模型之间没有统计学上的显著差异。ChatGPT 的回答最长,而 Bing AI 的回答最短,但长度并不影响准确性。所有的 LLM 都适当地指导患者向专业人士寻求医疗建议。

结论

LLM 显示出有希望的能力,可以在普遍可接受的准确性水平上解决常见的脉络膜黑色素瘤患者问题。然而,确实会出现不一致和不准确的回答,这突出表明在将其整合到临床实践之前,需要进行改进的微调和监督。

相似文献

1
Assessing large language models' accuracy in providing patient support for choroidal melanoma.
Eye (Lond). 2024 Nov;38(16):3113-3117. doi: 10.1038/s41433-024-03231-w. Epub 2024 Jul 13.
2
ChatGPT 3.5 Better Improves Comprehensibility of English, than Spanish, Generated Responses to Osteosarcoma Questions.
J Surg Oncol. 2025 Jun;131(8):1692-1695. doi: 10.1002/jso.28109. Epub 2025 Feb 3.
3
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
5
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
7
The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.
Eur J Ophthalmol. 2025 Jul;35(4):1323-1328. doi: 10.1177/11206721251321197. Epub 2025 Feb 19.
8
Can artificial intelligence improve the readability of patient education information in gynecology?
Am J Obstet Gynecol. 2025 Jun 25. doi: 10.1016/j.ajog.2025.06.047.

引用本文的文献

1
ChatGPT-4o and OpenAI-o1: A Comparative Analysis of Its Accuracy in Refractive Surgery.
J Clin Med. 2025 Jul 22;14(15):5175. doi: 10.3390/jcm14155175.
2
The Role of ChatGPT in Dermatology Diagnostics.
Diagnostics (Basel). 2025 Jun 16;15(12):1529. doi: 10.3390/diagnostics15121529.
3
Performance analysis of an emergency triage system in ophthalmology using a customized CHATBOT.
Digit Health. 2025 May 11;11:20552076251320298. doi: 10.1177/20552076251320298. eCollection 2025 Jan-Dec.
5
Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.
J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.
6
Exploring the Role of Large Language Models in Melanoma: A Systematic Review.
J Clin Med. 2024 Dec 9;13(23):7480. doi: 10.3390/jcm13237480.
7
Artificial intelligence derived large language model in decision-making process in uveitis.
Int J Retina Vitreous. 2024 Sep 11;10(1):63. doi: 10.1186/s40942-024-00581-1.

本文引用的文献

1
Applications of artificial intelligence-enabled robots and chatbots in ophthalmology: recent advances and future trends.
Curr Opin Ophthalmol. 2024 May 1;35(3):238-243. doi: 10.1097/ICU.0000000000001035. Epub 2024 Jan 22.
2
Exploring large language model for next generation of artificial intelligence in ophthalmology.
Front Med (Lausanne). 2023 Nov 23;10:1291404. doi: 10.3389/fmed.2023.1291404. eCollection 2023.
4
ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina.
Ophthalmic Surg Lasers Imaging Retina. 2023 Oct;54(10):557-562. doi: 10.3928/23258160-20230926-01. Epub 2023 Oct 1.
5
How to use large language models in ophthalmology: from prompt engineering to protecting confidentiality.
Eye (Lond). 2024 Mar;38(4):649-653. doi: 10.1038/s41433-023-02772-w. Epub 2023 Oct 5.
6
Large language models in vitreoretinal surgery.
Eye (Lond). 2024 Mar;38(4):809-810. doi: 10.1038/s41433-023-02751-1. Epub 2023 Sep 19.
8
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
9
Accuracy of Vitreoretinal Disease Information From an Artificial Intelligence Chatbot.
JAMA Ophthalmol. 2023 Sep 1;141(9):906-907. doi: 10.1001/jamaophthalmol.2023.3314.
10
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验