Institute of Linguistics, University of Cologne, Cologne, Germany.
J Acoust Soc Am. 2023 Jul 1;154(1):95-107. doi: 10.1121/10.0019850.
Recently, cluster analysis on f0 contours has become a popular method in phonetic research. Cluster analysis provides an automated way of categorising f0 contours, which gives new insights into (phonological) categories of intonation that vary across languages. As cluster analysis can be performed in many different ways, it is important to understand the extent to which these analyses can capture human perception of f0. This study focuses on the way in which f0 contours and differences among them are represented numerically, i.e., a crucial methodological choice preceding cluster analysis. These representations are then compared to the way in which f0 contour differences are perceived by human listeners from two different languages. To this end, four time-series contour representations (equivalent rectangular bandwidth, standardisation, octave-median rescaling, first derivative) and three distance measures [Euclidean distance (L2 norm), Pearson correlation, and dynamic time warping) were tested. The perceived differences were obtained from listeners of German and Papuan Malay, two typologically different languages. Results show that computed contour differences reflect human perception moderately, with dynamic time warping applied to the first derivative of the contour performing best, and showing minimal differences between the languages.
近年来,基于 f0 轮廓的聚类分析已成为语音研究中一种流行的方法。聚类分析为 f0 轮廓的分类提供了一种自动化的方法,这为不同语言的语调(音韵)范畴提供了新的见解。由于聚类分析可以采用多种不同的方式进行,因此了解这些分析在多大程度上可以捕捉人类对 f0 的感知是很重要的。本研究侧重于 f0 轮廓及其差异在数值上的表示方式,即聚类分析之前的一个关键方法学选择。然后,将这些表示与来自两种不同语言的人类听众感知 f0 轮廓差异的方式进行比较。为此,测试了四种时间序列轮廓表示(等效矩形带宽、标准化、八度中值重缩放、一阶导数)和三种距离度量(欧几里得距离(L2 范数)、皮尔逊相关和动态时间规整)。感知差异是从德语和巴布亚马来语的听众中获得的,这两种语言具有不同的类型。结果表明,计算出的轮廓差异与人类感知中等程度地相符,应用于轮廓一阶导数的动态时间规整表现最佳,且在语言之间差异最小。