• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于医疗纠纷调解前咨询的大语言模型:与人类专家的比较评估

Large Language Models for Pre-mediation Counseling in Medical Disputes: A Comparative Evaluation against Human Experts.

作者信息

Kim Min Seo, Lee Jung Su, Bae Hyuna

机构信息

College of Medicine, Kangwon National University, Chuncheon, Korea.

Korea Medical Dispute Mediation and Arbitration Agency, Seoul, Korea.

出版信息

Healthc Inform Res. 2025 Apr;31(2):200-208. doi: 10.4258/hir.2025.31.2.200. Epub 2025 Apr 30.

DOI:10.4258/hir.2025.31.2.200
PMID:40384071
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086436/
Abstract

OBJECTIVES

Assessing medical disputes requires both medical and legal expertise, presenting challenges for patients seeking clarity regarding potential malpractice claims. This study aimed to develop and evaluate a chatbot based on a chain-of-thought pipeline using a large language model (LLM) for providing medical dispute counseling and compare its performance with responses from human experts.

METHODS

Retrospective counseling cases (n = 279) were collected from the Korea Medical Dispute Mediation and Arbitration Agency's website, from which 50 cases were randomly selected as a validation dataset. The Claude 3.5 Sonnet model processed each counseling request through a five-step chain-of-thought pipeline. Thirty-eight experts evaluated the chatbot's responses against the original human expert responses, rating them across four dimensions on a 5-point Likert scale. Statistical analyses were conducted using Wilcoxon signed-rank tests.

RESULTS

The chatbot significantly outperformed human experts in quality of information (p < 0.001), understanding and reasoning (p < 0.001), and overall satisfaction (p < 0.001). It also demonstrated a stronger tendency to produce opinion-driven content (p < 0.001). Despite generally high scores, evaluators noted specific instances where the chatbot encountered difficulties.

CONCLUSIONS

A chain-of-thought-based LLM chatbot shows promise for enhancing the quality of medical dispute counseling, outperforming human experts across key evaluation metrics. Future research should address inaccuracies resulting from legal and contextual variability, investigate patient acceptance, and further refine the chatbot's performance in domain-specific applications.

摘要

目的

评估医疗纠纷既需要医学专业知识也需要法律专业知识,这给寻求明确潜在医疗事故索赔的患者带来了挑战。本研究旨在开发并评估一种基于思维链管道的聊天机器人,该管道使用大语言模型(LLM)来提供医疗纠纷咨询服务,并将其表现与人类专家的回复进行比较。

方法

从韩国医疗纠纷调解与仲裁机构的网站收集回顾性咨询案例(n = 279),从中随机选取50个案例作为验证数据集。Claude 3.5 Sonnet模型通过五步思维链管道处理每个咨询请求。38位专家将聊天机器人的回复与原始人类专家的回复进行对比,在四个维度上按5点李克特量表对其进行评分。使用Wilcoxon符号秩检验进行统计分析。

结果

在信息质量(p < 0.001)、理解与推理(p < 0.001)以及总体满意度(p < 0.001)方面,聊天机器人的表现显著优于人类专家。它还表现出更强的倾向生成观点驱动内容(p < 0.001)。尽管评分普遍较高,但评估者指出了聊天机器人遇到困难的具体情况。

结论

基于思维链的大语言模型聊天机器人在提高医疗纠纷咨询质量方面显示出前景,在关键评估指标上优于人类专家。未来的研究应解决法律和背景变异性导致的不准确问题,调查患者的接受度,并进一步优化聊天机器人在特定领域应用中的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/50c8d22c66a0/hir-2025-31-2-200f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/8733cdc74b7e/hir-2025-31-2-200f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/124d807f5be5/hir-2025-31-2-200f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/b65abc9ade08/hir-2025-31-2-200f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/50c8d22c66a0/hir-2025-31-2-200f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/8733cdc74b7e/hir-2025-31-2-200f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/124d807f5be5/hir-2025-31-2-200f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/b65abc9ade08/hir-2025-31-2-200f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/12086436/50c8d22c66a0/hir-2025-31-2-200f4.jpg

相似文献

1
Large Language Models for Pre-mediation Counseling in Medical Disputes: A Comparative Evaluation against Human Experts.用于医疗纠纷调解前咨询的大语言模型:与人类专家的比较评估
Healthc Inform Res. 2025 Apr;31(2):200-208. doi: 10.4258/hir.2025.31.2.200. Epub 2025 Apr 30.
2
Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。
JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.
3
Development and Evaluation of a Mental Health Chatbot Using ChatGPT 4.0: Mixed Methods User Experience Study With Korean Users.使用ChatGPT 4.0开发和评估心理健康聊天机器人:针对韩国用户的混合方法用户体验研究
JMIR Med Inform. 2025 Jan 3;13:e63538. doi: 10.2196/63538.
4
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型:领域适应的比较研究与专家评估数据集的开发
JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.
5
A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone.专家、经过专家编辑的大语言模型或仅经过专家编辑的大语言模型对视网膜问题回答的比较研究。
Ophthalmol Sci. 2024 Feb 6;4(4):100485. doi: 10.1016/j.xops.2024.100485. eCollection 2024 Jul-Aug.
6
Advancing health coaching: A comparative study of large language model and health coaches.推进健康辅导:大型语言模型与健康辅导员的比较研究。
Artif Intell Med. 2024 Nov;157:103004. doi: 10.1016/j.artmed.2024.103004. Epub 2024 Oct 19.
7
Dispute cases related to pain management in Korea: analysis of Korea Medical Dispute Mediation and Arbitration Agency data.韩国与疼痛管理相关的纠纷案例:韩国医疗纠纷调解与仲裁机构数据解析
Anesth Pain Med (Seoul). 2020 Jan 31;15(1):96-102. doi: 10.17085/apm.2020.15.1.96.
8
Prompt Engineering an Informational Chatbot for Education on Mental Health Using a Multiagent Approach for Enhanced Compliance With Prompt Instructions: Algorithm Development and Validation.使用多智能体方法构建用于心理健康教育的信息聊天机器人以提高对提示指令的依从性:算法开发与验证
JMIR AI. 2025 Mar 26;4:e69820. doi: 10.2196/69820.
9
Evaluating large language and large reasoning models as decision support tools in emergency internal medicine.评估大语言和大推理模型作为急诊内科决策支持工具的作用。
Comput Biol Med. 2025 Jun;192(Pt B):110351. doi: 10.1016/j.compbiomed.2025.110351. Epub 2025 May 12.
10
Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.评估 GPT-4 提供医疗建议的表现:与人类专家的比较分析。
JMIR Med Educ. 2024 Jul 8;10:e51282. doi: 10.2196/51282.

本文引用的文献

1
Assessing AI efficacy in medical knowledge tests: A study using Taiwan's internal medicine exam questions from 2020 to 2023.评估人工智能在医学知识测试中的效能:一项使用2020年至2023年台湾内科医师考试试题的研究。
Digit Health. 2024 Oct 18;10:20552076241291404. doi: 10.1177/20552076241291404. eCollection 2024 Jan-Dec.
2
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.大语言模型在医疗保健应用中的测试与评估:一项系统综述。
JAMA. 2025 Jan 28;333(4):319-328. doi: 10.1001/jama.2024.21700.
3
A framework for human evaluation of large language models in healthcare derived from literature review.
一个源自文献综述的用于医疗保健领域大语言模型人工评估的框架。
NPJ Digit Med. 2024 Sep 28;7(1):258. doi: 10.1038/s41746-024-01258-7.
4
Influence of believed AI involvement on the perception of digital medical advice.相信人工智能参与对数字医疗建议感知的影响。
Nat Med. 2024 Nov;30(11):3098-3100. doi: 10.1038/s41591-024-03180-7. Epub 2024 Jul 25.
5
Medical malpractice liability in large language model artificial intelligence: legal review and policy recommendations.大语言模型人工智能中的医疗事故责任:法律审查与政策建议。
J Osteopath Med. 2024 Jan 31;124(7):287-290. doi: 10.1515/jom-2023-0229. eCollection 2024 Jul 1.
6
Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.诊断推理提示揭示了医学中大型语言模型可解释性的潜力。
NPJ Digit Med. 2024 Jan 24;7(1):20. doi: 10.1038/s41746-024-01010-1.
7
Consent-GPT: is it ethical to delegate procedural consent to conversational AI?同意-GPT:将程序性同意委托给会话式 AI 是否合乎道德?
J Med Ethics. 2024 Jan 23;50(2):77-83. doi: 10.1136/jme-2023-109347.
8
Large Language Model-Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures.基于大语言模型的聊天机器人与外科医生生成的常见手术知情同意书文档。
JAMA Netw Open. 2023 Oct 2;6(10):e2336997. doi: 10.1001/jamanetworkopen.2023.36997.
9
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
10
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.