Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA.
Science. 2011 Jan 14;331(6014):176-82. doi: 10.1126/science.1199644. Epub 2010 Dec 16.
We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
我们构建了一个包含约 4%的所有已印刷书籍的数字化文本语料库。对该语料库的分析使我们能够定量研究文化趋势。我们调查了“文化组学”的广阔领域,重点研究了 1800 年至 2000 年间反映在英语中的语言和文化现象。我们展示了这种方法如何为词汇学、语法演变、集体记忆、技术采用、追求名利、审查和历史流行病学等不同领域提供见解。文化组学将严格的定量研究的范围扩展到涵盖社会科学和人文学科的广泛的新现象。