• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从语言语料库中自动推导出来的语义包含类人偏见。

Semantics derived automatically from language corpora contain human-like biases.

机构信息

Center for Information Technology Policy, Princeton University, Princeton, NJ, USA.

Department of Computer Science, University of Bath, Bath BA2 7AY, UK.

出版信息

Science. 2017 Apr 14;356(6334):183-186. doi: 10.1126/science.aal4230.

DOI:10.1126/science.aal4230
PMID:28408601
Abstract

Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.

摘要

机器学习是通过发现现有数据中的模式来获得人工智能的一种手段。在这里,我们表明,将机器学习应用于普通人类语言会导致类似人类的语义偏见。我们使用广泛使用的、仅基于统计的机器学习模型,该模型基于来自万维网的标准文本语料库进行训练,复制了一系列已知的偏见,这些偏见是通过内隐联想测试来衡量的。我们的结果表明,文本语料库包含可恢复和准确的历史偏见印记,无论是对昆虫或花朵等道德中立的偏见,还是对种族或性别等有问题的偏见,甚至是简单的真实性偏见,反映了性别在职业或名字方面的现状分布。我们的方法有望识别和解决文化中的偏见来源,包括技术。

相似文献

1
Semantics derived automatically from language corpora contain human-like biases.从语言语料库中自动推导出来的语义包含类人偏见。
Science. 2017 Apr 14;356(6334):183-186. doi: 10.1126/science.aal4230.
2
Gender bias at scale: Evidence from the usage of personal names.大规模的性别偏见:来自人名使用的证据。
Behav Res Methods. 2019 Aug;51(4):1601-1618. doi: 10.3758/s13428-019-01234-0.
3
Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora.语境至关重要:从大规模文本语料库的机器学习分析中恢复人类语义结构。
Cogn Sci. 2022 Feb;46(2):e13085. doi: 10.1111/cogs.13085.
4
The semantic representation of prejudice and stereotypes.偏见和刻板印象的语义表征。
Cognition. 2017 Jul;164:46-60. doi: 10.1016/j.cognition.2017.03.016. Epub 2017 Mar 31.
5
Semantic Space models for classification of consumer webpages on metadata attributes.基于语义空间模型的消费者网页元数据属性分类。
J Biomed Inform. 2010 Oct;43(5):725-35. doi: 10.1016/j.jbi.2010.06.005. Epub 2010 Jun 23.
6
How useful are corpus-based methods for extrapolating psycholinguistic variables?基于语料库的方法在推断心理语言学变量方面有多有用?
Q J Exp Psychol (Hove). 2015;68(8):1623-42. doi: 10.1080/17470218.2014.988735. Epub 2015 Feb 19.
7
The Moral Choice Machine.道德选择机器
Front Artif Intell. 2020 May 20;3:36. doi: 10.3389/frai.2020.00036. eCollection 2020.
8
A system for de-identifying medical message board text.一个用于去除医疗留言板文本中身份信息的系统。
BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-12-S3-S2.
9
The role of corpus size and syntax in deriving lexico-semantic representations for a wide range of concepts.语料库规模和句法在推导广泛概念的词汇语义表征中的作用。
Q J Exp Psychol (Hove). 2015;68(8):1643-64. doi: 10.1080/17470218.2014.994098. Epub 2015 Feb 26.
10
The influence of place and time on lexical behavior: A distributional analysis.地点和时间对词汇行为的影响:一种分布分析。
Behav Res Methods. 2019 Dec;51(6):2438-2453. doi: 10.3758/s13428-019-01289-z.

引用本文的文献

1
FanFAIR: sensitive data sets semi-automatic fairness assessment.FanFAIR:敏感数据集半自动公平性评估
BMC Med Inform Decis Mak. 2025 Sep 12;25(Suppl 3):329. doi: 10.1186/s12911-025-03184-4.
2
Disembodied creativity in generative AI: challenges and limitations of prompting in creative practice.生成式人工智能中脱离实体的创造力:创造性实践中提示的挑战与局限
Front Artif Intell. 2025 Aug 14;8:1651354. doi: 10.3389/frai.2025.1651354. eCollection 2025.
3
Evaluating gender bias in large language models in long-term care.评估长期护理中大型语言模型的性别偏见。
BMC Med Inform Decis Mak. 2025 Aug 11;25(1):274. doi: 10.1186/s12911-025-03118-0.
4
Benchmarking bias in embeddings of healthcare AI models: using SD-WEAT for detection and measurement across sensitive populations.医疗保健人工智能模型嵌入中的基准偏差:使用SD-WEAT对敏感人群进行检测和测量
BMC Med Inform Decis Mak. 2025 Jul 10;25(1):258. doi: 10.1186/s12911-025-03102-8.
5
Biased echoes: Large language models reinforce investment biases and increase portfolio risks of private investors.有偏见的回声:大型语言模型强化投资偏见并增加私人投资者的投资组合风险。
PLoS One. 2025 Jun 27;20(6):e0325459. doi: 10.1371/journal.pone.0325459. eCollection 2025.
6
Perceptual interventions ameliorate statistical discrimination in learning agents.感知干预可改善学习智能体中的统计歧视。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319933121. doi: 10.1073/pnas.2319933121. Epub 2025 Jun 16.
7
Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases.机器学习在糖尿病和心脏病临床支持中的不同模型性能与稳定性
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:95-104. eCollection 2025.
8
Bringing AI participation down to scale.
Patterns (N Y). 2025 May 9;6(5):101241. doi: 10.1016/j.patter.2025.101241.
9
Gender differences in resume language and gender gaps in salary expectations.简历语言中的性别差异与薪资期望中的性别差距。
J R Soc Interface. 2025 Jun;22(227):20240784. doi: 10.1098/rsif.2024.0784. Epub 2025 Jun 4.
10
Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research.与你的数据对话:介绍心理研究中的文本嵌入相似性分析(TESA)。
Behav Res Methods. 2025 May 28;57(7):179. doi: 10.3758/s13428-025-02698-z.