Department of Psychology, Pennsylvania State University, 140 Moore Building, University Park, PA, 16802, USA.
Department of Cognitive and Behavioral Science, Washington and Lee University, Lexington, VA, 24450, USA.
Behav Res Methods. 2021 Apr;53(2):757-780. doi: 10.3758/s13428-020-01453-w.
Creativity research requires assessing the quality of ideas and products. In practice, conducting creativity research often involves asking several human raters to judge participants' responses to creativity tasks, such as judging the novelty of ideas from the alternate uses task (AUT). Although such subjective scoring methods have proved useful, they have two inherent limitations-labor cost (raters typically code thousands of responses) and subjectivity (raters vary on their perceptions and preferences)-raising classic psychometric threats to reliability and validity. We sought to address the limitations of subjective scoring by capitalizing on recent developments in automated scoring of verbal creativity via semantic distance, a computational method that uses natural language processing to quantify the semantic relatedness of texts. In five studies, we compare the top performing semantic models (e.g., GloVe, continuous bag of words) previously shown to have the highest correspondence to human relatedness judgements. We assessed these semantic models in relation to human creativity ratings from a canonical verbal creativity task (AUT; Studies 1-3) and novelty/creativity ratings from two word association tasks (Studies 4-5). We find that a latent semantic distance factor-comprised of the common variance from five semantic models-reliably and strongly predicts human creativity and novelty ratings across a range of creativity tasks. We also replicate an established experimental effect in the creativity literature (i.e., the serial order effect) and show that semantic distance correlates with other creativity measures, demonstrating convergent validity. We provide an open platform to efficiently compute semantic distance, including tutorials and documentation ( https://osf.io/gz4fc/ ).
创造力研究需要评估创意和产品的质量。在实践中,进行创造力研究通常涉及到让几个人类评估者来判断参与者对创造力任务的反应,例如判断来自替代用途任务(AUT)的创意新颖性。虽然这种主观评分方法已经被证明是有用的,但它们有两个固有的局限性——劳动力成本(评估者通常要对数千个回答进行编码)和主观性(评估者对自己的看法和偏好存在差异)——这对可靠性和有效性提出了经典的心理测量学威胁。我们试图通过利用基于语义距离的自动语言创造力评分的最新进展来解决主观评分的局限性,这是一种使用自然语言处理来量化文本语义相关性的计算方法。在五项研究中,我们比较了先前被证明与人类相关性判断具有最高一致性的表现最佳的语义模型(例如 GloVe,连续词袋)。我们评估了这些语义模型与来自一个经典的语言创造力任务(AUT;研究 1-3)的人类创造力评分以及来自两个词联想任务(研究 4-5)的新颖性/创造力评分的关系。我们发现,一个潜在的语义距离因素——由五个语义模型的共同方差组成——可以可靠且强烈地预测一系列创造力任务中的人类创造力和新颖性评分。我们还复制了创造力文献中的一个既定实验效应(即序列顺序效应),并表明语义距离与其他创造力测量方法相关,表现出收敛效度。我们提供了一个开放的平台,以有效地计算语义距离,包括教程和文档(https://osf.io/gz4fc/)。