Department of Electrical and Computer Engineering, Marquette University, 1515 West Wisconsin Avenue, Milwaukee, Wisconsin 53233, USA.
J Acoust Soc Am. 2013 Mar;133(3):1762-9. doi: 10.1121/1.4789936.
This paper investigates the extent of tiger (Panthera tigris) vocal individuality through both qualitative and quantitative approaches using long distance roars from six individual tigers at Omaha's Henry Doorly Zoo in Omaha, NE. The framework for comparison across individuals includes statistical and discriminant function analysis across whole vocalization measures and statistical pattern classification using a hidden Markov model (HMM) with frame-based spectral features comprised of Greenwood frequency cepstral coefficients. Individual discrimination accuracy is evaluated as a function of spectral model complexity, represented by the number of mixtures in the underlying Gaussian mixture model (GMM), and temporal model complexity, represented by the number of sequential states in the HMM. Results indicate that the temporal pattern of the vocalization is the most significant factor in accurate discrimination. Overall baseline discrimination accuracy for this data set is about 70% using high level features without complex spectral or temporal models. Accuracy increases to about 80% when more complex spectral models (multiple mixture GMMs) are incorporated, and increases to a final accuracy of 90% when more detailed temporal models (10-state HMMs) are used. Classification accuracy is stable across a relatively wide range of configurations in terms of spectral and temporal model resolution.
本研究采用定性和定量方法,利用来自内布拉斯加州奥马哈市亨利·多利动物园的六只老虎的远距离咆哮声,调查了老虎( Panthera tigris )声音个体差异的程度。个体间比较的框架包括对整个发声测量值进行统计和判别函数分析,以及使用基于帧的频谱特征的隐马尔可夫模型( HMM )进行统计模式分类,该特征由 Greenwood 频率倒谱系数组成。个体识别准确性作为光谱模型复杂性的函数进行评估,该复杂性由底层高斯混合模型( GMM )中的混合数量表示,以及时间模型复杂性,由 HMM 中的连续状态数量表示。结果表明,发声的时间模式是准确区分的最关键因素。对于这个数据集,使用不复杂的光谱或时间模型的高级特征,总体基线识别准确率约为 70%。当纳入更复杂的光谱模型(多个混合 GMM )时,准确率提高到约 80%,当使用更详细的时间模型( 10 状态 HMM )时,准确率提高到 90%。在光谱和时间模型分辨率方面,分类准确率在相对较宽的配置范围内是稳定的。