Alderete John, Davies Monica
Simon Fraser University, Canada.
Lang Speech. 2019 Jun;62(2):281-317. doi: 10.1177/0023830918765012. Epub 2018 Apr 6.
This work describes a methodology of collecting speech errors from audio recordings and investigates how some of its assumptions affect data quality and composition. Speech errors of all types (sound, lexical, syntactic, etc.) were collected by eight data collectors from audio recordings of unscripted English speech. Analysis of these errors showed that: (i) different listeners find different errors in the same audio recordings, but (ii) the frequencies of error patterns are similar across listeners; (iii) errors collected "online" using on the spot observational techniques are more likely to be affected by perceptual biases than "offline" errors collected from audio recordings; and (iv) datasets built from audio recordings can be explored and extended in a number of ways that traditional corpus studies cannot be.
这项工作描述了一种从音频记录中收集言语错误的方法,并研究了其一些假设如何影响数据质量和构成。八位数据收集者从无脚本英语演讲的音频记录中收集了所有类型(语音、词汇、句法等)的言语错误。对这些错误的分析表明:(i)不同的听众在同一音频记录中发现不同的错误,但(ii)错误模式的频率在听众之间相似;(iii)使用现场观察技术“在线”收集的错误比从音频记录中收集的“离线”错误更容易受到感知偏差的影响;以及(iv)从音频记录构建的数据集可以通过多种传统语料库研究无法采用的方式进行探索和扩展。