Suppr超能文献

评估ChatGPT作为慢性乙型肝炎医疗咨询助手:英语和中文的跨语言研究

Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese.

作者信息

Wang Yijie, Chen Yining, Sheng Jifang

机构信息

State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Department of Urology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.

出版信息

JMIR Med Inform. 2024 Aug 8;12:e56426. doi: 10.2196/56426.

Abstract

BACKGROUND

Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes.

OBJECTIVE

This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts.

METHODS

Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0.

RESULTS

Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P<.001).

CONCLUSIONS

In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered when selecting large language models as medical consultation assistants. Given that both models performed inadequately in emotional guidance management, this study highlights the importance of providing specific language training and emotional management strategies when deploying ChatGPT for medical purposes. Furthermore, the tendency of these models to use disclaimers in conversations should be further investigated to understand the impact on patients' experiences in practical applications.

摘要

背景

慢性乙型肝炎(CHB)在全球范围内带来了巨大的经济和社会负担。CHB的管理涉及复杂的监测和依从性挑战,尤其是在中国等地区,CHB的高患病率与医疗资源限制并存。本研究探讨了新兴人工智能(AI)助手ChatGPT-3.5应对这些复杂性问题的潜力。鉴于ChatGPT-3.5在医学教育和实践方面具有显著能力,本研究考察了其在CHB管理中的作用,特别是在具有不同医疗环境的地区。

目的

本研究旨在深入了解ChatGPT-3.5在为不同语言背景的CHB患者提供个性化医疗咨询帮助方面的潜力和局限性。

方法

从已发表的指南、在线CHB社区以及英文和中文搜索引擎中获取的问题经过提炼、翻译并整理成96个询问。随后,在独立对话中将这些问题呈现给ChatGPT-3.5和ChatGPT-4.0。然后由资深医生对回复进行评估,重点关注信息性、情绪管理、重复询问的一致性以及关于医疗建议的警示声明。此外,采用是非题问卷进一步辨别ChatGPT-3.5和ChatGPT-4.0在封闭式问题信息准确性方面的差异。

结果

ChatGPT-3.5超过一半的回复(228/370,61.6%)被认为是全面的。相比之下,ChatGPT-4.0的这一比例更高,为74.5%(172/222;P<0.001)。值得注意的是,在英语方面表现更优,尤其是在信息性和重复询问的一致性方面。然而,在情绪管理指导方面存在不足,ChatGPT-3.5中仅有3.2%(6/186),ChatGPT-4.0中为8.1%(15/154)(P=0.04)。ChatGPT-3.5在10.8%(24/222)的回复中包含免责声明,而ChatGPT-4.0在13.1%(29/222)的回复中包含免责声明(P=0.46)。在回答是非题时,ChatGPT-4.0的准确率为93.3%(168/180),显著超过ChatGPT-3.5的准确率65.0%(117/180)(P<0.001)。

结论

在本研究中,ChatGPT展示了作为CHB管理医疗咨询助手的基本能力。ChatGPT-3.5的工作语言选择被认为是影响其性能的一个潜在因素,特别是在术语和口语化语言的使用方面,这可能会影响其在特定目标人群中的适用性。然而,作为更新的模型,ChatGPT-4.0表现出改进的信息处理能力,克服了语言对信息准确性的影响。这表明在选择大语言模型作为医疗咨询助手时,需要考虑模型进步对应用的影响。鉴于这两个模型在情绪指导管理方面表现不佳,本研究强调了在将ChatGPT用于医疗目的时提供特定语言培训和情绪管理策略的重要性。此外,应进一步研究这些模型在对话中使用免责声明的倾向,以了解其在实际应用中对患者体验的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7318/11342014/fb831ea8ac31/medinform_v12i1e56426_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验