Suppr超能文献

ChatGPT 与泪液排出障碍:性能与改善范围。

ChatGPT and Lacrimal Drainage Disorders: Performance and Scope of Improvement.

机构信息

Govindram Seksaria Institute of Dacryology, L.V. Prasad Eye Institute, Hyderabad, India.

出版信息

Ophthalmic Plast Reconstr Surg. 2023;39(3):221-225. doi: 10.1097/IOP.0000000000002418. Epub 2023 May 10.

Abstract

PURPOSE

This study aimed to report the performance of the large language model ChatGPT (OpenAI, San Francisco, CA, U.S.A.) in the context of lacrimal drainage disorders.

METHODS

A set of prompts was constructed through questions and statements spanning common and uncommon aspects of lacrimal drainage disorders. Care was taken to avoid constructing prompts that had significant or new knowledge beyond the year 2020. Each of the prompts was presented thrice to ChatGPT. The questions covered common disorders such as primary acquired nasolacrimal duct obstruction and congenital nasolacrimal duct obstruction and their cause and management. The prompts also tested ChatGPT on certain specifics, such as the history of dacryocystorhinostomy (DCR) surgery, lacrimal pump anatomy, and human canalicular surfactants. ChatGPT was also quizzed on controversial topics such as silicone intubation and the use of mitomycin C in DCR surgery. The responses of ChatGPT were carefully analyzed for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises. Three lacrimal surgeons graded the responses into three categories: correct, partially correct, and factually incorrect.

RESULTS

A total of 21 prompts were presented to the ChatGPT. The responses were detailed and were based according to the prompt structure. In response to most questions, ChatGPT provided a generic disclaimer that it could not give medical advice or professional opinion but then provided an answer to the question in detail. Specific prompts such as "how can I perform an external DCR?" were responded by a sequential listing of all the surgical steps. However, several factual inaccuracies were noted across many ChatGPT replies. Several responses on controversial topics such as silicone intubation and mitomycin C were generic and not precisely evidence-based. ChatGPT's response to specific questions such as canalicular surfactants and idiopathic canalicular inflammatory disease was poor. The presentation of variable prompts on a single topic led to responses with either repetition or recycling of the phrases. Citations were uniformly missing across all responses. Agreement among the three observers was high (95%) in grading the responses. The responses of ChatGPT were graded as correct for only 40% of the prompts, partially correct in 35%, and outright factually incorrect in 25%. Hence, some degree of factual inaccuracy was present in 60% of the responses, if we consider the partially correct responses. The exciting aspect was that ChatGPT was able to admit mistakes and correct them when presented with counterarguments. It was also capable of challenging incorrect prompts and premises.

CONCLUSION

The performance of ChatGPT in the context of lacrimal drainage disorders, at best, can be termed average. However, the potential of this AI chatbot to influence medicine is enormous. There is a need for it to be specifically trained and retrained for individual medical subspecialties.

摘要

目的

本研究旨在报告大型语言模型 ChatGPT(美国旧金山的 OpenAI)在泪道疾病方面的表现。

方法

通过涵盖常见和不常见泪道疾病方面的问题和陈述来构建一组提示。我们特别注意避免构建超出 2020 年的具有重大或新知识的提示。将每个提示向 ChatGPT 展示三次。这些问题涵盖了常见疾病,如原发性获得性鼻泪管阻塞和先天性鼻泪管阻塞及其病因和治疗。提示还测试了 ChatGPT 关于某些特定方面的知识,例如泪囊鼻腔吻合术(DCR)手术的历史、泪液泵解剖和人眼道表面活性剂。ChatGPT 还被问到了一些有争议的话题,如硅胶插管和丝裂霉素 C 在 DCR 手术中的应用。仔细分析了 ChatGPT 的回答,以评估其是否有基于证据的内容、回答的特异性、是否存在通用文本、免责声明、事实错误以及承认错误和挑战不正确前提的能力。三位泪道外科医生将回答分为三类:正确、部分正确和事实错误。

结果

共向 ChatGPT 提出了 21 个提示。回答详细且基于提示结构。对于大多数问题,ChatGPT 提供了一个通用的免责声明,即它不能提供医疗建议或专业意见,但随后详细回答了问题。具体提示,如“我如何进行外部 DCR?”则通过列出所有手术步骤来回答。然而,在许多 ChatGPT 回复中注意到了几个事实错误。一些有争议话题的回复,如硅胶插管和丝裂霉素 C,比较笼统,并非完全基于证据。ChatGPT 对特定问题(如人眼道表面活性剂和特发性人眼道炎症性疾病)的回答较差。对同一主题的不同提示的呈现导致了重复或重复的短语。所有回复均未引用文献。三位观察者在评分中的一致性很高(95%)。如果我们考虑部分正确的回复,那么 ChatGPT 的回复只有 40%是正确的,35%是部分正确的,25%是完全错误的。因此,如果我们考虑部分正确的回复,那么 60%的回复存在一定程度的事实错误。令人兴奋的是,ChatGPT 能够承认错误,并在遇到反驳时纠正错误。它还能够挑战不正确的提示和前提。

结论

ChatGPT 在泪道疾病方面的表现充其量只能说是一般。然而,这种人工智能聊天机器人对医学的影响是巨大的。有必要对其进行专门的培训和再培训,以适应各个医学专业。

相似文献

1
ChatGPT and Lacrimal Drainage Disorders: Performance and Scope of Improvement.ChatGPT 与泪液排出障碍:性能与改善范围。
Ophthalmic Plast Reconstr Surg. 2023;39(3):221-225. doi: 10.1097/IOP.0000000000002418. Epub 2023 May 10.
3
Probing for congenital nasolacrimal duct obstruction.探查先天性鼻泪管阻塞
Cochrane Database Syst Rev. 2017 Jul 12;7(7):CD011109. doi: 10.1002/14651858.CD011109.pub2.

引用本文的文献

2
Using large language models to generate child-friendly education materials on myopia.使用大语言模型生成适合儿童的近视教育材料。
Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验