Cohen Samuel A, Brant Arthur, Fisher Ann Caroline, Pershing Suzann, Do Diana, Pan Carolyn
Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA.
Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.
Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials.
The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria.
Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google ( < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers ( < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%).
When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
患者正在使用在线搜索方式来了解自己的眼部健康状况。虽然谷歌仍然是最受欢迎的搜索引擎,但像ChatGPT这样的大语言模型(LLM)的使用有所增加。白内障手术是美国最常见的外科手术,而关于在谷歌等搜索引擎以及ChatGPT等LLM平台上搜索与白内障手术相关内容后出现的在线信息质量的数据有限。我们确定了患者关于白内障和白内障手术最常见的常见问题(FAQ),并评估了谷歌和ChatGPT对这些问题的回答的准确性、安全性和可读性。我们展示了ChatGPT在撰写病历和创建患者教育材料方面的效用。
从谷歌记录了与白内障和白内障手术相关的前20个常见问题。由一组眼科医生对谷歌和ChatGPT提供的问题回答进行准确性和安全性评估。评估人员还被要求区分谷歌和LLM聊天机器人的答案。使用五个经过验证的可读性指标来评估回答的可读性。根据特定的可读性标准,指示ChatGPT生成手术记录、术后指导和可定制的患者教育材料。
ChatGPT生成的对20个患者常见问题的回答明显比谷歌提供的回答更长,且写作水平更高(<0.001),平均年级水平为14.8(大学水平)。专家评审员平均有31%的时间能够正确区分人工审核的回答和聊天机器人生成的回答。谷歌的回答有27%的时间包含不正确或不适当的内容,而LLM生成的回答为6%(<0.001)。当要求专家评审员直接比较回答时,聊天机器人的回答更受青睐(66%)。
在比较ChatGPT和谷歌提供的针对患者白内障常见问题的回答时,执业眼科医生绝大多数更喜欢ChatGPT的回答。LLM聊天机器人的回答不太可能包含不准确的信息。ChatGPT对于健康素养较高的患者来说是一个可行的眼部健康信息来源。眼科医生也可以使用ChatGPT为健康素养不同的患者创建可定制的患者教育材料。