文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Large Language Models and Empathy: Systematic Review.

作者信息

Sorin Vera, Brin Dana, Barash Yiftach, Konen Eli, Charney Alexander, Nadkarni Girish, Klang Eyal

机构信息

Department of Radiology, Mayo Clinic, Rochester, MN, United States.

Department of Diagnostic Imaging, Sheba Medical Center, Ramat Gan, Israel.

出版信息

J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.


DOI:10.2196/52597
PMID:39661968
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11669866/
Abstract

BACKGROUND: Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience another being's emotions within oneself. In health care, empathy is a fundamental for health care professionals and patients' interaction. It is a unique quality to humans that large language models (LLMs) are believed to lack. OBJECTIVE: We aimed to review the literature on the capacity of LLMs in demonstrating empathy. METHODS: We conducted a literature search on MEDLINE, Google Scholar, PsyArXiv, medRxiv, and arXiv between December 2022 and February 2024. We included English-language full-length publications that evaluated empathy in LLMs' outputs. We excluded papers evaluating other topics related to emotional intelligence that were not specifically empathy. The included studies' results, including the LLMs used, performance in empathy tasks, and limitations of the models, along with studies' metadata were summarized. RESULTS: A total of 12 studies published in 2023 met the inclusion criteria. ChatGPT-3.5 (OpenAI) was evaluated in all studies, with 6 studies comparing it with other LLMs such GPT-4, LLaMA (Meta), and fine-tuned chatbots. Seven studies focused on empathy within a medical context. The studies reported LLMs to exhibit elements of empathy, including emotions recognition and emotional support in diverse contexts. Evaluation metric included automatic metrics such as Recall-Oriented Understudy for Gisting Evaluation and Bilingual Evaluation Understudy, and human subjective evaluation. Some studies compared performance on empathy with humans, while others compared between different models. In some cases, LLMs were observed to outperform humans in empathy-related tasks. For example, ChatGPT-3.5 was evaluated for its responses to patients' questions from social media, where ChatGPT's responses were preferred over those of humans in 78.6% of cases. Other studies used subjective readers' assigned scores. One study reported a mean empathy score of 1.84-1.9 (scale 0-2) for their fine-tuned LLM, while a different study evaluating ChatGPT-based chatbots reported a mean human rating of 3.43 out of 4 for empathetic responses. Other evaluations were based on the level of the emotional awareness scale, which was reported to be higher for ChatGPT-3.5 than for humans. Another study evaluated ChatGPT and GPT-4 on soft-skills questions in the United States Medical Licensing Examination, where GPT-4 answered 90% of questions correctly. Limitations were noted, including repetitive use of empathic phrases, difficulty following initial instructions, overly lengthy responses, sensitivity to prompts, and overall subjective evaluation metrics influenced by the evaluator's background. CONCLUSIONS: LLMs exhibit elements of cognitive empathy, recognizing emotions and providing emotionally supportive responses in various contexts. Since social skills are an integral part of intelligence, these advancements bring LLMs closer to human-like interactions and expand their potential use in applications requiring emotional intelligence. However, there remains room for improvement in both the performance of these models and the evaluation strategies used for assessing soft skills.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7c5/11669866/91d5643eab50/jmir_v26i1e52597_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7c5/11669866/91d5643eab50/jmir_v26i1e52597_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7c5/11669866/91d5643eab50/jmir_v26i1e52597_fig1.jpg

相似文献

[1]
Large Language Models and Empathy: Systematic Review.

J Med Internet Res. 2024-12-11

[2]
Prescription of Controlled Substances: Benefits and Risks

2025-1

[3]
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.

J Med Internet Res. 2024-11-7

[4]
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.

J Med Internet Res. 2025-7-11

[5]
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.

J Med Internet Res. 2025-7-16

[6]
Sexual Harassment and Prevention Training

2025-1

[7]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[8]
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.

JBI Database System Rev Implement Rep. 2016-4

[9]
Examining the Role of Large Language Models in Orthopedics: Systematic Review.

J Med Internet Res. 2024-11-15

[10]
Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001

引用本文的文献

[1]
Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.

JMIR Form Res. 2025-8-28

[2]
Patient insights into empathy, compassion and self-disclosure in medical large language models: results from the IPALLM III study.

World J Urol. 2025-8-14

[3]
Multimodal Sensing-Enabled Large Language Models for Automated Emotional Regulation: A Review of Current Technologies, Opportunities, and Challenges.

Sensors (Basel). 2025-8-1

[4]
Artificial intelligence, health empowerment, and the general practitioner scheme.

Digit Health. 2025-7-29

[5]
Comparing the value of perceived human versus AI-generated empathy.

Nat Hum Behav. 2025-6-30

[6]
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.

JMIR Ment Health. 2025-6-27

[7]
Depression and the use of conversational AI for companionship among college students: the mediating role of loneliness and the moderating effects of gender and mind perception.

Front Public Health. 2025-5-30

[8]
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics.

J Med Internet Res. 2025-5-26

[9]
Static network structure cannot stabilize cooperation among large language model agents.

PLoS One. 2025-5-22

[10]
Artificial intelligence and psychoanalysis: is it time for psychoanalyst.AI?

Front Psychiatry. 2025-4-7

本文引用的文献

[1]
A future role for health applications of large language models depends on regulators enforcing safety standards.

Lancet Digit Health. 2024-9

[2]
Ethical and regulatory challenges of large language models in medicine.

Lancet Digit Health. 2024-6

[3]
Evaluating large language models as agents in the clinic.

NPJ Digit Med. 2024-4-3

[4]
Leveraging large language models for generating responses to patient messages-a subjective analysis.

J Am Med Inform Assoc. 2024-5-20

[5]
The future landscape of large language models in medicine.

Commun Med (Lond). 2023-10-10

[6]
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

Sci Rep. 2023-10-1

[7]
Adversarial attacks in radiology - A systematic review.

Eur J Radiol. 2023-10

[8]
ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection.

J Am Coll Radiol. 2023-10

[9]
Developing ChatGPT's Theory of Mind.

Front Robot AI. 2023-5-30

[10]
ChatGPT outperforms humans in emotional awareness evaluations.

Front Psychol. 2023-5-26

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索