Department of Pathology, University of Michigan, Ann Arbor, MI, United States.
Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, United States.
Clin Chem. 2024 Sep 3;70(9):1122-1139. doi: 10.1093/clinchem/hvae093.
The integration of ChatGPT, a large language model (LLM) developed by OpenAI, into healthcare has sparked significant interest due to its potential to enhance patient care and medical education. With the increasing trend of patients accessing laboratory results online, there is a pressing need to evaluate the effectiveness of ChatGPT in providing accurate laboratory medicine information. Our study evaluates ChatGPT's effectiveness in addressing patient questions in this area, comparing its performance with that of medical professionals on social media.
This study sourced patient questions and medical professional responses from Reddit and Quora, comparing them with responses generated by ChatGPT versions 3.5 and 4.0. Experienced laboratory medicine professionals evaluated the responses for quality and preference. Evaluation results were further analyzed using R software.
The study analyzed 49 questions, with evaluators reviewing responses from both medical professionals and ChatGPT. ChatGPT's responses were preferred by 75.9% of evaluators and generally received higher ratings for quality. They were noted for their comprehensive and accurate information, whereas responses from medical professionals were valued for their conciseness. The interrater agreement was fair, indicating some subjectivity but a consistent preference for ChatGPT's detailed responses.
ChatGPT demonstrates potential as an effective tool for addressing queries in laboratory medicine, often surpassing medical professionals in response quality. These results support the need for further research to confirm ChatGPT's utility and explore its integration into healthcare settings.
OpenAI 开发的大型语言模型(LLM)ChatGPT 在医疗保健领域的整合引起了极大的兴趣,因为它有可能改善患者护理和医学教育。随着越来越多的患者在线访问实验室结果,迫切需要评估 ChatGPT 在提供准确的实验室医学信息方面的有效性。我们的研究评估了 ChatGPT 在解决这一领域患者问题方面的有效性,将其性能与社交媒体上的医疗专业人员进行了比较。
本研究从 Reddit 和 Quora 上获取了患者问题和医学专业人员的回复,并将其与 ChatGPT 版本 3.5 和 4.0 生成的回复进行了比较。经验丰富的实验室医学专业人员评估了回复的质量和偏好。使用 R 软件进一步分析了评估结果。
本研究分析了 49 个问题,评估人员审查了来自医学专业人员和 ChatGPT 的回复。75.9%的评估人员更喜欢 ChatGPT 的回复,并且总体上对其质量给予了更高的评价。ChatGPT 的回复以其全面和准确的信息为特点,而医学专业人员的回复则因其简洁而受到重视。评分者间的一致性为中等,表明存在一定的主观性,但对 ChatGPT 详细回复的偏好是一致的。
ChatGPT 显示出在解决实验室医学查询方面具有潜力,其回复质量通常超过医学专业人员。这些结果支持进一步研究的必要性,以确认 ChatGPT 的实用性,并探索其在医疗保健环境中的整合。