Department of Electrical Engineering, Stanford University, Stanford, CA 94305;
Department of History, Stanford University, Stanford, CA 94305.
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3635-E3644. doi: 10.1073/pnas.1720347115. Epub 2018 Apr 3.
Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.
词嵌入是一种强大的机器学习框架,通过向量来表示每个英文单词。这些向量之间的几何关系捕捉到了相应单词之间有意义的语义关系。在本文中,我们开发了一个框架,展示了词嵌入的时间动态如何帮助量化 20 世纪和 21 世纪美国对女性和少数族裔的刻板印象和态度的变化。我们将经过 100 年文本数据训练的词嵌入与美国人口普查数据相结合,表明嵌入的变化与人口和职业随时间的变化密切相关。该嵌入捕捉到了社会变化,例如 20 世纪 60 年代的妇女运动和美国的亚洲移民,也揭示了随着时间的推移,特定形容词和职业是如何与特定人群更紧密地联系在一起的。我们的词嵌入时间分析框架为机器学习和定量社会科学之间的交叉研究开辟了一条富有成果的道路。