Suppr超能文献

谷歌医生与ChatGPT医生:通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性,探索人工智能在眼科领域的应用。

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.

作者信息

Cohen Samuel A, Brant Arthur, Fisher Ann Caroline, Pershing Suzann, Do Diana, Pan Carolyn

机构信息

Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA.

出版信息

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

Abstract

PURPOSE

Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials.

METHODS

The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria.

RESULTS

Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google ( < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers ( < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%).

CONCLUSIONS

When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.

摘要

目的

患者正在使用在线搜索方式来了解自己的眼部健康状况。虽然谷歌仍然是最受欢迎的搜索引擎,但像ChatGPT这样的大语言模型(LLM)的使用有所增加。白内障手术是美国最常见的外科手术,而关于在谷歌等搜索引擎以及ChatGPT等LLM平台上搜索与白内障手术相关内容后出现的在线信息质量的数据有限。我们确定了患者关于白内障和白内障手术最常见的常见问题(FAQ),并评估了谷歌和ChatGPT对这些问题的回答的准确性、安全性和可读性。我们展示了ChatGPT在撰写病历和创建患者教育材料方面的效用。

方法

从谷歌记录了与白内障和白内障手术相关的前20个常见问题。由一组眼科医生对谷歌和ChatGPT提供的问题回答进行准确性和安全性评估。评估人员还被要求区分谷歌和LLM聊天机器人的答案。使用五个经过验证的可读性指标来评估回答的可读性。根据特定的可读性标准,指示ChatGPT生成手术记录、术后指导和可定制的患者教育材料。

结果

ChatGPT生成的对20个患者常见问题的回答明显比谷歌提供的回答更长,且写作水平更高(<0.001),平均年级水平为14.8(大学水平)。专家评审员平均有31%的时间能够正确区分人工审核的回答和聊天机器人生成的回答。谷歌的回答有27%的时间包含不正确或不适当的内容,而LLM生成的回答为6%(<0.001)。当要求专家评审员直接比较回答时,聊天机器人的回答更受青睐(66%)。

结论

在比较ChatGPT和谷歌提供的针对患者白内障常见问题的回答时,执业眼科医生绝大多数更喜欢ChatGPT的回答。LLM聊天机器人的回答不太可能包含不准确的信息。ChatGPT对于健康素养较高的患者来说是一个可行的眼部健康信息来源。眼科医生也可以使用ChatGPT为健康素养不同的患者创建可定制的患者教育材料。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验