• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能生成内容的偏差:对大语言模型生成的新闻的审视

Bias of AI-generated content: an examination of news produced by large language models.

作者信息

Fang Xiao, Che Shangkun, Mao Minjia, Zhang Hongzhe, Zhao Ming, Zhao Xiaohang

机构信息

University of Delaware, Newark, USA.

Tsinghua University, Beijing, China.

出版信息

Sci Rep. 2024 Mar 4;14(1):5224. doi: 10.1038/s41598-024-55686-2.

DOI:10.1038/s41598-024-55686-2
PMID:38433238
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10909834/
Abstract

Large language models (LLMs) have the potential to transform our lives and work through the content they generate, known as AI-Generated Content (AIGC). To harness this transformation, we need to understand the limitations of LLMs. Here, we investigate the bias of AIGC produced by seven representative LLMs, including ChatGPT and LLaMA. We collect news articles from The New York Times and Reuters, both known for their dedication to provide unbiased news. We then apply each examined LLM to generate news content with headlines of these news articles as prompts, and evaluate the gender and racial biases of the AIGC produced by the LLM by comparing the AIGC and the original news articles. We further analyze the gender bias of each LLM under biased prompts by adding gender-biased messages to prompts constructed from these news headlines. Our study reveals that the AIGC produced by each examined LLM demonstrates substantial gender and racial biases. Moreover, the AIGC generated by each LLM exhibits notable discrimination against females and individuals of the Black race. Among the LLMs, the AIGC generated by ChatGPT demonstrates the lowest level of bias, and ChatGPT is the sole model capable of declining content generation when provided with biased prompts.

摘要

大语言模型(LLMs)有潜力通过其生成的内容,即所谓的人工智能生成内容(AIGC),来改变我们的生活和工作。为了利用这一变革,我们需要了解大语言模型的局限性。在此,我们研究了包括ChatGPT和LLaMA在内的七个代表性大语言模型所产生的AIGC的偏差。我们从以提供无偏见新闻而闻名的《纽约时报》和路透社收集新闻文章。然后,我们将每个被研究的大语言模型应用于以这些新闻文章的标题为提示来生成新闻内容,并通过比较AIGC和原始新闻文章来评估大语言模型所产生的AIGC的性别和种族偏差。我们通过在由这些新闻标题构建的提示中添加性别偏见信息,进一步分析每个大语言模型在有偏见提示下的性别偏差。我们的研究表明,每个被研究的大语言模型所产生的AIGC都表现出相当大的性别和种族偏差。此外,每个大语言模型生成的AIGC对女性和黑人个体表现出明显的歧视。在这些大语言模型中,ChatGPT生成的AIGC偏差水平最低,并且ChatGPT是唯一一个在收到有偏见的提示时能够拒绝生成内容的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/39bfb94ecdaa/41598_2024_55686_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/a4c72629d7d1/41598_2024_55686_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/038973e341d5/41598_2024_55686_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/27c44c29778a/41598_2024_55686_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/9476c362b3bc/41598_2024_55686_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/4015e7f712d1/41598_2024_55686_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/326063a25fb7/41598_2024_55686_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/5a92f3c33fdc/41598_2024_55686_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/4758e60a28b3/41598_2024_55686_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/26bd3699a445/41598_2024_55686_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/39bfb94ecdaa/41598_2024_55686_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/a4c72629d7d1/41598_2024_55686_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/038973e341d5/41598_2024_55686_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/27c44c29778a/41598_2024_55686_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/9476c362b3bc/41598_2024_55686_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/4015e7f712d1/41598_2024_55686_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/326063a25fb7/41598_2024_55686_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/5a92f3c33fdc/41598_2024_55686_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/4758e60a28b3/41598_2024_55686_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/26bd3699a445/41598_2024_55686_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c1/10909834/39bfb94ecdaa/41598_2024_55686_Fig10_HTML.jpg

相似文献

1
Bias of AI-generated content: an examination of news produced by large language models.人工智能生成内容的偏差:对大语言模型生成的新闻的审视
Sci Rep. 2024 Mar 4;14(1):5224. doi: 10.1038/s41598-024-55686-2.
2
Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes.探索大语言模型在心理健康领域的偏差:神经性厌食症和神经性贪食症病例 vignettes 中性别和性取向影响的比较问卷调查研究。
JMIR Ment Health. 2025 Mar 20;12:e57986. doi: 10.2196/57986.
3
What's in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT.名字里的乾坤:ChatGPT 生成的推荐信中的性别偏见的实验证据。
J Med Internet Res. 2024 Mar 5;26:e51837. doi: 10.2196/51837.
4
Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care-Related Tasks: Cross-Sectional Study.大型语言模型在医疗相关任务的文本生成中种族和民族偏见的评估:横断面研究。
J Med Internet Res. 2025 Mar 13;27:e57257. doi: 10.2196/57257.
5
Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models.利用人工智能为泌尿科患者生成医学文献:三种不同的大型语言模型比较。
World J Urol. 2024 Jul 29;42(1):455. doi: 10.1007/s00345-024-05146-3.
6
Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.在耳鼻喉科、头颈外科中,评估本地运行和基于网络的大语言模型与人类委员会建议的决策情况。
Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10.
7
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
8
Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models.人工智能驱动的肿瘤学中的公平性:探究大语言模型中的种族和性别偏见
Cureus. 2024 Sep 16;16(9):e69541. doi: 10.7759/cureus.69541. eCollection 2024 Sep.
9
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
10
Leveraging Large Language Models (LLM) for the Plastic Surgery Resident Training: Do They Have a Role?利用大语言模型进行整形外科住院医师培训:它们能发挥作用吗?
Indian J Plast Surg. 2023 Aug 28;56(5):413-420. doi: 10.1055/s-0043-1772704. eCollection 2023 Oct.

引用本文的文献

1
Impact of YouTube User-Generated Content on News Dissemination and Youth Information Reception.YouTube用户生成内容对新闻传播和青少年信息接收的影响。
Health Expect. 2025 Oct;28(5):e70408. doi: 10.1111/hex.70408.
2
Improving large language models accuracy for aortic stenosis treatment via Heart Team simulation: a prompt design analysis.通过心脏团队模拟提高大语言模型在主动脉瓣狭窄治疗方面的准确性:提示设计分析
Eur Heart J Digit Health. 2025 Jun 16;6(4):665-674. doi: 10.1093/ehjdh/ztaf068. eCollection 2025 Jul.
3
Facial Analysis for Plastic Surgery in the Era of Artificial Intelligence: A Comparative Evaluation of Multimodal Large Language Models.

本文引用的文献

1
Contrasting Linguistic Patterns in Human and LLM-Generated News Text.人类与大语言模型生成的新闻文本中的语言模式对比。
Artif Intell Rev. 2024;57(10):265. doi: 10.1007/s10462-024-10903-2. Epub 2024 Aug 23.
2
Gender composition predicts gender bias: A meta-reanalysis of hiring discrimination audit experiments.性别构成预测性别偏见:雇佣歧视审计实验的元分析。
Sci Adv. 2023 May 5;9(18):eade7979. doi: 10.1126/sciadv.ade7979.
3
Sentiments analysis of fMRI using automatically generated stimuli labels under naturalistic paradigm.基于自然范式使用自动生成的刺激标签进行 fMRI 的情感分析。
人工智能时代整形外科的面部分析:多模态大语言模型的比较评估
J Clin Med. 2025 May 16;14(10):3484. doi: 10.3390/jcm14103484.
4
Delving into the Practical Applications and Pitfalls of Large Language Models in Medical Education: Narrative Review.深入探讨大语言模型在医学教育中的实际应用与陷阱:叙述性综述
Adv Med Educ Pract. 2025 Apr 18;16:625-636. doi: 10.2147/AMEP.S497020. eCollection 2025.
5
Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence.人类与生成式人工智能交互实验模型中的刻板印象偏差放大与逆转
R Soc Open Sci. 2025 Apr 9;12(4):241472. doi: 10.1098/rsos.241472. eCollection 2025 Apr.
6
Academic Psychiatry in the Age of Artificial Intelligence.人工智能时代的学术精神病学。
Acad Psychiatry. 2025 Feb;49(1):1-4. doi: 10.1007/s40596-025-02112-y.
7
Exploring the potential of large language model-based chatbots in challenges of ribosome profiling data analysis: a review.探索基于大语言模型的聊天机器人在核糖体谱数据分析挑战中的潜力:综述
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae641.
8
Harnessing the Power of ChatGPT in Cardiovascular Medicine: Innovations, Challenges, and Future Directions.利用ChatGPT在心血管医学中的力量:创新、挑战与未来方向。
J Clin Med. 2024 Oct 31;13(21):6543. doi: 10.3390/jcm13216543.
9
Political biases and inconsistencies in bilingual GPT models-the cases of the U.S. and China.双语GPT模型中的政治偏见与不一致性——以美国和中国为例。
Sci Rep. 2024 Oct 23;14(1):25048. doi: 10.1038/s41598-024-76395-w.
10
Perforator Selection with Computed Tomography Angiography for Unilateral Breast Reconstruction: A Clinical Multicentre Analysis.应用计算机断层血管造影术对单侧乳房重建进行穿支皮瓣选择:一项临床多中心分析。
Medicina (Kaunas). 2024 Sep 14;60(9):1500. doi: 10.3390/medicina60091500.
Sci Rep. 2023 May 4;13(1):7267. doi: 10.1038/s41598-023-33734-7.
4
The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting.在实验环境中,利用临床医生同行网络减少临床治疗建议中的种族和性别偏见。
Nat Commun. 2021 Nov 15;12(1):6585. doi: 10.1038/s41467-021-26905-5.
5
The Psychology of Fake News.假新闻的心理学。
Trends Cogn Sci. 2021 May;25(5):388-402. doi: 10.1016/j.tics.2021.02.007. Epub 2021 Mar 15.
6
Gender bias goes away when grant reviewers focus on the science.当资助评审人员专注于科学时,性别偏见就会消失。
Nature. 2018 Feb;554(7690):14-15. doi: 10.1038/d41586-018-01212-0.
7
Dissecting racial bias in an algorithm used to manage the health of populations.剖析用于管理人群健康的算法中的种族偏见。
Science. 2019 Oct 25;366(6464):447-453. doi: 10.1126/science.aax2342.
8
The effect of publishing peer review reports on referee behavior in five scholarly journals.发表同行评议报告对五个学术期刊审稿人行为的影响。
Nat Commun. 2019 Jan 18;10(1):322. doi: 10.1038/s41467-018-08250-2.