Suppr超能文献

人工智能根据人们的方言生成关于他们的隐性种族主义决策。

AI generates covertly racist decisions about people based on their dialect.

机构信息

Allen Institute for AI, Seattle, WA, USA.

University of Oxford, Oxford, UK.

出版信息

Nature. 2024 Sep;633(8028):147-154. doi: 10.1038/s41586-024-07856-5. Epub 2024 Aug 28.

Abstract

Hundreds of millions of people now interact with language models, with uses ranging from help with writing to informing hiring decisions. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models' overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

摘要

现在数亿人在与语言模型互动,其用途从帮助写作到影响招聘决策不等。然而,这些语言模型被认为会延续系统性的种族偏见,使它们对非裔美国人等群体的判断产生有问题的偏见。尽管之前的研究集中在语言模型中的公开种族主义,但社会科学家认为,随着时间的推移,具有更微妙特征的种族主义已经发展起来,尤其是在美国民权运动之后。目前还不清楚这种隐性种族主义是否在语言模型中表现出来。在这里,我们证明语言模型以方言偏见的形式体现了隐性种族主义,对非裔美国人英语(AAE)使用者表现出比任何人类关于非裔美国人的刻板印象更负面的种族语言刻板印象。相比之下,语言模型对非裔美国人的公开刻板印象更为正面。方言偏见有可能产生有害的后果:语言模型更有可能建议将 AAE 使用者分配到声望较低的工作岗位,被判有罪和被判死刑。最后,我们表明,目前缓解语言模型中种族偏见的做法,例如人类偏好对齐,通过表面上掩盖语言模型在更深层次上维持的种族主义,加剧了隐性和显性刻板印象之间的差异。我们的发现对语言技术的公平和安全使用具有深远的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e525/11374696/da149a89b628/41586_2024_7856_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验