Bhatia Sudeep
Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States.
Cognition. 2017 Jul;164:46-60. doi: 10.1016/j.cognition.2017.03.016. Epub 2017 Mar 31.
We use a theory of semantic representation to study prejudice and stereotyping. Particularly, we consider large datasets of newspaper articles published in the United States, and apply latent semantic analysis (LSA), a prominent model of human semantic memory, to these datasets to learn representations for common male and female, White, African American, and Latino names. LSA performs a singular value decomposition on word distribution statistics in order to recover word vector representations, and we find that our recovered representations display the types of biases observed in human participants using tasks such as the implicit association test. Importantly, these biases are strongest for vector representations with moderate dimensionality, and weaken or disappear for representations with very high or very low dimensionality. Moderate dimensional LSA models are also the best at learning race, ethnicity, and gender-based categories, suggesting that social category knowledge, acquired through dimensionality reduction on word distribution statistics, can facilitate prejudiced and stereotyped associations.
我们运用一种语义表征理论来研究偏见和刻板印象。具体而言,我们考量了美国出版的报纸文章的大型数据集,并将潜在语义分析(LSA)(一种人类语义记忆的重要模型)应用于这些数据集,以学习常见男性和女性、白人、非裔美国人和拉丁裔名字的表征。LSA对词分布统计进行奇异值分解以恢复词向量表征,并且我们发现,我们恢复的表征显示出在使用诸如内隐联想测验等任务的人类参与者中观察到的偏见类型。重要的是,这些偏见对于中等维度的向量表征最为强烈,而对于非常高或非常低维度的表征则会减弱或消失。中等维度的LSA模型在学习基于种族、民族和性别的类别方面也是最出色的,这表明通过对词分布统计进行降维获得的社会类别知识能够促进有偏见和刻板的联想。