Athavale Anand, Baier Jonathan, Ross Elsie, Fukaya Eri
Division of Vascular Surgery, Stanford University School of Medicine, Palo Alto.
NextNext LLC, Lovettsville.
JVS Vasc Insights. 2023;1. doi: 10.1016/j.jvsvi.2023.100019. Epub 2023 Jun 19.
Health care providers and recipients have been using artificial intelligence and its subfields, such as natural language processing and machine learning technologies, in the form of search engines to obtain medical information for some time now. Although a search engine returns a ranked list of webpages in response to a query and allows the user to obtain information from those links directly, ChatGPT has elevated the interface between humans with artificial intelligence by attempting to provide relevant information in a human-like textual conversation. This technology is being adopted rapidly and has enormous potential to impact various aspects of health care, including patient education, research, scientific writing, pre-visit/post-visit queries, documentation assistance, and more. The objective of this study is to assess whether chatbots could assist with answering patient questions and electronic health record inbox management.
We devised two questionnaires: (1) administrative and non-complex medical questions (based on actual inbox questions); and (2) complex medical questions on the topic of chronic venous disease. We graded the performance of publicly available chatbots regarding their potential to assist with electronic health record inbox management. The study was graded by an internist and a vascular medicine specialist independently.
On administrative and non-complex medical questions, ChatGPT 4.0 performed better than ChatGPT 3.5. ChatGPT 4.0 received a grade of 1 on all the questions: 20 of 20 (100%). ChatGPT 3.5 received a grade of 1 on 14 of 20 questions (70%), grade 2 on 4 of 16 questions (20%), grade 3 on 0 questions (0%), and grade 4 on 2/20 questions (10%). On complex medical questions, ChatGPT 4.0 performed the best. ChatGPT 4.0 received a grade of 1 on 15 of 20 questions (75%), grade 2 on 2 of 20 questions (10%), grade 3 on 2 of 20 questions (10%), and grade 4 on 1 of 20 questions (5%). ChatGPT 3.5 received a grade of 1 on 9 of 20 questions (45%), grade 2 on 4 of 20 questions (20%), grade 3 on 4 of 20 questions (20%), and grade 4 on 3 of 20 questions (15%). Clinical Camel received a grade of 1 on 0 of 20 questions (0%), grade 2 on 5 of 20 questions (25%), grade 3 on 5 of 20 questions (25%), and grade 4 on 10 of 20 questions (50%).
Based on our interactions with ChatGPT regarding the topic of chronic venous disease, it is plausible that in the future, this technology may be used to assist with electronic health record inbox management and offload medical staff. However, for this technology to receive regulatory approval to be used for that purpose, it will require extensive supervised training by subject experts, have guardrails to prevent "hallucinations" and maintain confidentiality, and prove that it can perform at a level comparable to (if not better than) humans. (JVS-Vascular Insights 2023;1:100019.).
一段时间以来,医疗服务提供者和接受者一直在以搜索引擎的形式使用人工智能及其子领域,如自然语言处理和机器学习技术来获取医学信息。虽然搜索引擎会根据查询返回一个网页排名列表,并允许用户直接从这些链接中获取信息,但ChatGPT通过尝试在类似人类的文本对话中提供相关信息,提升了人与人工智能之间的交互界面。这项技术正在迅速被采用,并且有巨大的潜力影响医疗保健的各个方面,包括患者教育、研究、科学写作、就诊前/就诊后咨询、文档协助等等。本研究的目的是评估聊天机器人是否可以协助回答患者问题和管理电子健康记录收件箱。
我们设计了两份问卷:(1)行政和非复杂医学问题(基于实际收件箱问题);(2)关于慢性静脉疾病主题的复杂医学问题。我们对公开可用的聊天机器人在协助管理电子健康记录收件箱方面的潜力进行了评分。该研究由一名内科医生和一名血管医学专家独立评分。
在行政和非复杂医学问题上,ChatGPT 4.0的表现优于ChatGPT 3.5。ChatGPT 4.0在所有问题上的得分为1级:20个问题中的20个(100%)。ChatGPT 3.5在20个问题中的14个(70%)得分为1级,在16个问题中的4个(20%)得分为2级,0个问题(0%)得分为3级,2/20个问题(10%)得分为4级。在复杂医学问题上,ChatGPT 4.0表现最佳。ChatGPT 4.0在20个问题中的15个(75%)得分为1级,20个问题中的2个(10%)得分为2级,20个问题中的2个(10%)得分为3级,20个问题中的1个(5%)得分为4级。ChatGPT 3.5在20个问题中的9个(45%)得分为1级,20个问题中的4个(20%)得分为2级,20个问题中的4个(20%)得分为3级,20个问题中的3个(15%)得分为4级。Clinical Camel在20个问题中的0个(0%)得分为1级,20个问题中的5个(25%)得分为2级,20个问题中的5个(25%)得分为