Grabe Esther, Kochanski Greg, Coleman John
University of Oxford, UK.
Lang Speech. 2007;50(Pt 3):281-310. doi: 10.1177/00238309070500030101.
The mathematical models of intonation used in speech technology are often inaccessible to linguists. By the same token, phonological descriptions of intonation are rarely used by speech technologists, as they cannot be implemented directly in applications. Consequently, these research communities do not benefit much from each other's insights. In this paper, we explore the interface between the disciplines, in search of bridges between intonational phonology and speech technology. In a corpus of speech data from seven dialects of English, we hand-labeled over 700 sentences and identified seven nuclear accent types. Then we fitted a third-order polynomial to the fundamental frequency (F0) contour in the region around the accent mark. The polynomial captures the local shape (time-dependence) of F0 in a few numbers, in our case, four coefficients. The coefficients were subjected to statistical analysis. Nineteen of the 21 pairs of accent types differed significantly in one or more coefficients. Our approach bridges the gap between intonational phonology and speech technology. It provides quantitative, empirically testable models of intonation labels that can be implemented in applications.
语音技术中使用的语调数学模型往往是语言学家无法接触到的。同样,语音技术专家很少使用语调的音系学描述,因为它们无法直接在应用程序中实现。因此,这些研究群体无法从彼此的见解中获得太多益处。在本文中,我们探索了这两个学科之间的接口,寻找语调音系学和语音技术之间的桥梁。在一个来自七种英语方言的语音数据语料库中,我们手动标注了700多个句子,并识别出七种核心重音类型。然后我们在重音标记周围的区域对基频(F0)轮廓拟合了一个三阶多项式。该多项式用几个数字(在我们的例子中是四个系数)捕捉F0的局部形状(时间依赖性)。对这些系数进行了统计分析。21对重音类型中的19对在一个或多个系数上有显著差异。我们的方法弥合了语调音系学和语音技术之间的差距。它提供了可以在应用程序中实现的、定量的、可通过实证检验的语调标签模型。