生成式语言模型表现出社会身份偏见。

Generative language models exhibit social identity biases.

作者信息

Hu Tiancheng, Kyrychenko Yara, Rathje Steve, Collier Nigel, van der Linden Sander, Roozenbeek Jon

机构信息

Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK.

Department of Psychology, University of Cambridge, Cambridge, UK.

出版信息

Nat Comput Sci. 2025 Jan;5(1):65-75. doi: 10.1038/s43588-024-00741-1. Epub 2024 Dec 12.

DOI:10.1038/s43588-024-00741-1

PMID:39668254

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11774750/

Abstract

Social identity biases, particularly the tendency to favor one's own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, 'We are…'), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human-LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human-LLM interactions might reinforce existing social biases.

摘要

社会身份偏见，尤其是偏袒自己群体（内群体团结）和诋毁其他群体（外群体敌意）的倾向，深深植根于人类心理和社会行为之中。然而，尚不清楚此类偏见是否也存在于人工智能系统中。在此我们表明，大型语言模型（LLMs）表现出与人类相似的社会身份偏见模式。通过对77个不同的大型语言模型进行句子完成提示（例如，“我们是……”），我们证明几乎所有基础模型以及一些经过指令调整和偏好调整的模型都表现出明显的内群体偏袒和外群体诋毁。这些偏见在受控实验环境和自然主义的人机大型语言模型对话中均有体现。然而，我们发现精心筛选训练数据和进行专门的微调可以大幅降低偏见水平。这些发现对于开发更公平的人工智能系统具有重要意义，并凸显了迫切需要了解人机大型语言模型交互如何可能强化现有的社会偏见。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5f/11774750/6bcc397ec52e/43588_2024_741_Fig1_HTML.jpg

相似文献

Generative language models exhibit social identity biases.生成式语言模型表现出社会身份偏见。

Nat Comput Sci. 2025 Jan;5(1):65-75. doi: 10.1038/s43588-024-00741-1. Epub 2024 Dec 12.

Ingroup favoritism in cooperation: a meta-analysis.内群体偏好合作：一项元分析。

Psychol Bull. 2014 Nov;140(6):1556-81. doi: 10.1037/a0037737. Epub 2014 Sep 15.

Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.用于个性化风险评估的生成式大语言模型驱动的对话式人工智能应用程序：COVID-19案例研究

JMIR AI. 2025 Mar 27;4:e67363. doi: 10.2196/67363.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型：平台开发与综合分析

J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes.探索大语言模型在心理健康领域的偏差：神经性厌食症和神经性贪食症病例 vignettes 中性别和性取向影响的比较问卷调查研究。

JMIR Ment Health. 2025 Mar 20;12:e57986. doi: 10.2196/57986.

Ingroup Love or Outgroup Hate (or Both)? Mapping Distinct Bias Profiles in the Population.内群体喜爱还是外群体厌恶（或两者皆有）？描绘人群中不同的偏见特征。

Pers Soc Psychol Bull. 2020 Feb;46(2):171-188. doi: 10.1177/0146167219845919. Epub 2019 May 16.

Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用：系统评价。

J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性：在受控环境中使用的考量

Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性：使用施瓦茨基本价值观理论的横断面研究。

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

引用本文的文献

Using machine-assisted topic analysis to expedite thematic analysis of free-text data: Exemplar investigation of factors influencing health behaviours and wellbeing during the COVID-19 pandemic.使用机器辅助主题分析来加速对自由文本数据的主题分析：关于新冠疫情期间影响健康行为和幸福感因素的示例研究

Br J Health Psychol. 2025 Sep;30(3):e70017. doi: 10.1111/bjhp.70017.

GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels.GPT-4V在现象学和神经层面展现出类人的社会感知能力。

Imaging Neurosci (Camb). 2025 Sep 2;3. doi: 10.1162/IMAG.a.134. eCollection 2025.

Emergent social conventions and collective bias in LLM populations.大语言模型群体中的新兴社会习俗和集体偏见。

Sci Adv. 2025 May 16;11(20):eadu9368. doi: 10.1126/sciadv.adu9368. Epub 2025 May 14.

Delving into the Practical Applications and Pitfalls of Large Language Models in Medical Education: Narrative Review.深入探讨大语言模型在医学教育中的实际应用与陷阱：叙述性综述

Adv Med Educ Pract. 2025 Apr 18;16:625-636. doi: 10.2147/AMEP.S497020. eCollection 2025.

本文引用的文献

Evaluating large language models in theory of mind tasks.评估大型语言模型在心理论任务中的表现。

Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2405460121. doi: 10.1073/pnas.2405460121. Epub 2024 Oct 29.

Social identity correlates of social media engagement before and after the 2022 Russian invasion of Ukraine.社交媒体在 2022 年俄罗斯入侵乌克兰前后的使用情况与社会认同的相关性。

Nat Commun. 2024 Oct 1;15(1):8127. doi: 10.1038/s41467-024-52179-8.

AI generates covertly racist decisions about people based on their dialect.人工智能根据人们的方言生成关于他们的隐性种族主义决策。

Nature. 2024 Sep;633(8028):147-154. doi: 10.1038/s41586-024-07856-5. Epub 2024 Aug 28.

GPT is an effective tool for multilingual psychological text analysis.GPT 是一种用于多语言心理文本分析的有效工具。

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2308950121. doi: 10.1073/pnas.2308950121. Epub 2024 Aug 12.

The potential of generative AI for personalized persuasion at scale.生成式人工智能在大规模个性化说服方面的潜力。

Sci Rep. 2024 Feb 26;14(1):4692. doi: 10.1038/s41598-024-53755-0.

Emergent analogical reasoning in large language models.大语言模型中的紧急类比推理。

Nat Hum Behav. 2023 Sep;7(9):1526-1541. doi: 10.1038/s41562-023-01659-w. Epub 2023 Jul 31.

How do social media feed algorithms affect attitudes and behavior in an election campaign?社交媒体的推荐算法如何影响选举活动中的态度和行为？

Science. 2023 Jul 28;381(6656):398-404. doi: 10.1126/science.abp9364. Epub 2023 Jul 27.

Algorithms of Oppression: How Search Engines Reinforce Racism NYU Press, 2018. 256 pp.《压迫的算法：搜索引擎如何强化种族主义》纽约大学出版社，2018年。256页。

Science. 2021 Oct 29;374(6567):542. doi: 10.1126/science.abm5861. Epub 2021 Oct 28.

Out-group animosity drives engagement on social media.外群体敌意推动社交媒体参与度。

Proc Natl Acad Sci U S A. 2021 Jun 29;118(26). doi: 10.1073/pnas.2024292118.

Word embeddings quantify 100 years of gender and ethnic stereotypes.词嵌入量化了 100 年来的性别和种族刻板印象。

Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3635-E3644. doi: 10.1073/pnas.1720347115. Epub 2018 Apr 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生成式语言模型表现出社会身份偏见。

Generative language models exhibit social identity biases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献