大语言模型的文化偏见与文化契合度

Cultural bias and cultural alignment of large language models.

作者信息

Tao Yan, Viberg Olga, Baker Ryan S, Kizilcec René F

机构信息

Department of Information Science, Cornell University, Ithaca, NY 14853, USA.

Department of Human Centered Technology, KTH Royal Institute of Technology, Stockholm 10044, Sweden.

出版信息

PNAS Nexus. 2024 Sep 17;3(9):pgae346. doi: 10.1093/pnasnexus/pgae346. eCollection 2024 Sep.

DOI:10.1093/pnasnexus/pgae346

PMID:39290441

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11407280/

Abstract

Culture fundamentally shapes people's reasoning, behavior, and communication. As people increasingly use generative artificial intelligence (AI) to expedite and automate personal and professional tasks, cultural values embedded in AI models may bias people's authentic expression and contribute to the dominance of certain cultures. We conduct a disaggregated evaluation of cultural bias for five widely used large language models (OpenAI's GPT-4o/4-turbo/4/3.5-turbo/3) by comparing the models' responses to nationally representative survey data. All models exhibit cultural values resembling English-speaking and Protestant European countries. We test cultural prompting as a control strategy to increase cultural alignment for each country/territory. For later models (GPT-4, 4-turbo, 4o), this improves the cultural alignment of the models' output for 71-81% of countries and territories. We suggest using cultural prompting and ongoing evaluation to reduce cultural bias in the output of generative AI.

摘要

文化从根本上塑造了人们的推理、行为和交流方式。随着人们越来越多地使用生成式人工智能（AI）来加速和自动化个人及专业任务，人工智能模型中嵌入的文化价值观可能会使人们的真实表达产生偏差，并助长某些文化的主导地位。我们通过将五个广泛使用的大语言模型（OpenAI的GPT-4o/4-turbo/4/3.5-turbo/3）的回答与具有全国代表性的调查数据进行比较，对文化偏差进行了分类评估。所有模型都表现出类似于英语国家和新教欧洲国家的文化价值观。我们测试了文化提示作为一种控制策略，以提高每个国家/地区的文化契合度。对于较新的模型（GPT-4、4-turbo、4o），这提高了71%-81%的国家和地区的模型输出的文化契合度。我们建议使用文化提示和持续评估来减少生成式人工智能输出中的文化偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd79/11407280/4f8076ef6d7f/pgae346f1.jpg

相似文献

Cultural bias and cultural alignment of large language models.大语言模型的文化偏见与文化契合度

PNAS Nexus. 2024 Sep 17;3(9):pgae346. doi: 10.1093/pnasnexus/pgae346. eCollection 2024 Sep.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用：比较研究。

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Diagnostic accuracy of large language models in psychiatry.精神科大语言模型的诊断准确性。

Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.

Evaluating the OpenAI's GPT-3.5 Turbo's performance in extracting information from scientific articles on diabetic retinopathy.评估 OpenAI 的 GPT-3.5 Turbo 在从关于糖尿病视网膜病变的科学文章中提取信息的性能。

Syst Rev. 2024 May 16;13(1):135. doi: 10.1186/s13643-024-02523-2.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性：使用施瓦茨基本价值观理论的横断面研究。

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Large Language Models in Otolaryngology Residency Admissions: A Random Sampling Analysis.耳鼻喉科住院医师招生中的大语言模型：随机抽样分析

Laryngoscope. 2025 Jan;135(1):87-93. doi: 10.1002/lary.31705. Epub 2024 Aug 19.

Empowering personalized pharmacogenomics with generative AI solutions.利用生成式人工智能解决方案增强个性化药物基因组学。

J Am Med Inform Assoc. 2024 May 20;31(6):1356-1366. doi: 10.1093/jamia/ocae039.

Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening.利用大语言模型加强临床试验匹配：患者筛选自动化研究

Cureus. 2024 May 10;16(5):e60044. doi: 10.7759/cureus.60044. eCollection 2024 May.

Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.大语言模型与专家临床医生在远程心理健康患者危机预测中的比较研究。

JMIR Ment Health. 2024 Aug 2;11:e58129. doi: 10.2196/58129.

GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination.GPT-4 Turbo with Vision 在日本诊断放射学委员会考试中未能优于仅文本的 GPT-4 Turbo。

Jpn J Radiol. 2024 Aug;42(8):918-926. doi: 10.1007/s11604-024-01561-z. Epub 2024 May 11.

引用本文的文献

Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations.开源大语言模型在精神病学中的表现：通过非英语记录与英语译文的对比分析进行可用性研究

J Med Internet Res. 2025 Aug 18;27:e69857. doi: 10.2196/69857.

Artificial intelligence can emulate human normative judgments on emotional visual scenes.人工智能可以模拟人类对情感视觉场景的规范性判断。

R Soc Open Sci. 2025 Jul 9;12(7):250128. doi: 10.1098/rsos.250128. eCollection 2025 Jul.

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs.创作者与评估者之间的（不）相似性对创造力评估的影响：人类与大语言模型的比较

J Intell. 2025 Jul 3;13(7):80. doi: 10.3390/jintelligence13070080.

Artificial Intelligence in Food Bank and Pantry Services: A Systematic Review.食品银行和食品储藏室服务中的人工智能：一项系统综述。

Nutrients. 2025 Apr 26;17(9):1461. doi: 10.3390/nu17091461.

Artificial intelligence in health and sport sciences: Promise, progress, and prudence.健康与体育科学中的人工智能：前景、进展与审慎态度。

J Sport Health Sci. 2025 May 9;14:101054. doi: 10.1016/j.jshs.2025.101054.

Exploring past and future fluency of temporal landmarks under reduced agency.在能动性降低的情况下探索时间标志的过去和未来流畅性。

Sci Rep. 2025 May 7;15(1):15920. doi: 10.1038/s41598-025-00530-4.

AI and inclusion in simulation education and leadership: a global cross-sectional evaluation of diversity.人工智能与模拟教育及领导力中的包容性：一项全球多样性横断面评估

Adv Simul (Lond). 2025 May 4;10(1):26. doi: 10.1186/s41077-025-00355-1.

Enhancing Patient-Physician Communication: Simulating African American Vernacular English in Medical Diagnostics with Large Language Models.加强医患沟通：利用大语言模型在医学诊断中模拟非裔美国黑人英语

J Healthc Inform Res. 2025 Mar 11;9(2):119-153. doi: 10.1007/s41666-025-00194-9. eCollection 2025 Jun.

Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation.衡量大语言模型中的性别和种族偏见：来自自动化简历评估的交叉性证据。

PNAS Nexus. 2025 Mar 12;4(3):pgaf089. doi: 10.1093/pnasnexus/pgaf089. eCollection 2025 Mar.

本文引用的文献

Perils and opportunities in using large language models in psychological research.在心理学研究中使用大语言模型的风险与机遇

PNAS Nexus. 2024 Jul 16;3(7):pgae245. doi: 10.1093/pnasnexus/pgae245. eCollection 2024 Jul.

Studying large language models as compression algorithms for human culture.将大型语言模型视为人类文化的压缩算法进行研究。

Trends Cogn Sci. 2024 Mar;28(3):187-189. doi: 10.1016/j.tics.2024.01.001. Epub 2024 Jan 19.

Why Japan is building its own version of ChatGPT.为何日本正在打造自己版本的ChatGPT。

Nature. 2023 Sep 14. doi: 10.1038/d41586-023-02868-z.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起：公共卫生领域新的 AI 驱动的信息疫情威胁。

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

Artificial intelligence in communication impacts language and social relationships.人工智能在通信中的应用影响着语言和社会关系。

Sci Rep. 2023 Apr 4;13(1):5487. doi: 10.1038/s41598-023-30938-9.

Probing the psychology of AI models.探究人工智能模型的心理学

Proc Natl Acad Sci U S A. 2023 Mar 7;120(10):e2300963120. doi: 10.1073/pnas.2300963120. Epub 2023 Mar 1.

Social norms and dishonesty across societies.社会规范与跨社会的不诚实行为。

Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2120138119. doi: 10.1073/pnas.2120138119. Epub 2022 Jul 28.

How language shapes the cultural inheritance of categories.语言如何塑造范畴的文化传承。

Proc Natl Acad Sci U S A. 2017 Jul 25;114(30):7900-7907. doi: 10.1073/pnas.1621073114. Epub 2017 Jul 24.

Culture shapes the evolution of cognition.文化塑造认知的演变。

Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):4530-5. doi: 10.1073/pnas.1523631113. Epub 2016 Apr 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大语言模型的文化偏见与文化契合度

Cultural bias and cultural alignment of large language models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献