University of St. Francis, 500 Wilcox St, Joliet, IL, 60435, USA.
Behav Res Methods. 2019 Aug;51(4):1619-1635. doi: 10.3758/s13428-018-1185-6.
Psychological researchers have traditionally focused on lab-based experiments to test their theories and hypotheses. Although the lab provides excellent facilities for controlled testing, some questions are best explored by collecting information that is difficult to obtain in the lab. The vast amounts of data now available to researchers can be a valuable resource in this respect. By incorporating this new realm of data and translating it into traditional laboratory methods, we can expand the reach of the lab into the wilderness of human society. This study demonstrates how the troves of linguistic data generated by humans can be used to test theories about cognition and representation. It also suggests how similar interpretations can be made of other research in cognition. The first case tests a long-standing prediction of Gentner's natural partition hypothesis: that verb meaning is more subject to change due to the textual context in which it appears than is the meaning of nouns. Within a diachronic corpus, verbs and other relational words indeed showed more evidence of semantic change than did concrete nouns. In the second case, corpus statistics were employed to empirically support the existence of phonesthemes-nonmorphemic units of sound that are associated with aspects of meaning. A third study also supported this measure, by demonstrating that it corresponds with performance in a lab experiment. Neither of these questions can be adequately explored without the use of big data in the form of linguistic corpora.
心理学研究者传统上专注于基于实验室的实验来检验他们的理论和假设。尽管实验室提供了极好的控制测试设施,但有些问题最好通过收集实验室难以获得的信息来探索。在这方面,研究人员现在可以利用大量的数据。通过将这个新的数据领域纳入传统的实验室方法,我们可以将实验室的范围扩展到人类社会的荒野。本研究展示了如何利用人类生成的大量语言数据来检验关于认知和表示的理论。它还表明,如何对认知研究中的其他研究进行类似的解释。第一个案例检验了 Gentner 的自然分区假设的一个长期预测:动词的含义比名词的含义更容易受到它所出现的文本上下文的影响。在历时语料库中,动词和其他关系词确实比具体名词表现出更多语义变化的证据。在第二个案例中,语料库统计数据被用来从经验上支持存在音位-非词素单位的声音,这些声音与意义的某些方面相关联。第三个研究也支持了这一措施,通过证明它与实验室实验中的表现相对应。如果不使用语言语料库的形式的大数据,这些问题都无法得到充分的探讨。