San Segundo Eugenia, Tsanas Athanasios, Gómez-Vilda Pedro
Department of Language and Linguistic Science, University of York, Heslington, York, YO10 5DD, UK.
Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK; Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, UK; Sleep and Circadian Neuroscience Institute, Nuffield Department of Medicine, University of Oxford, UK.
Forensic Sci Int. 2017 Jan;270:25-38. doi: 10.1016/j.forsciint.2016.11.020. Epub 2016 Nov 17.
There is a growing consensus that hybrid approaches are necessary for successful speaker characterization in Forensic Speaker Comparison (FSC); hence this study explores the forensic potential of voice features combining source and filter characteristics. The former relate to the action of the vocal folds while the latter reflect the geometry of the speaker's vocal tract. This set of features have been extracted from pause fillers, which are long enough for robust feature estimation while spontaneous enough to be extracted from voice samples in real forensic casework. Speaker similarity was measured using standardized Euclidean Distances (ED) between pairs of speakers: 54 different-speaker (DS) comparisons, 54 same-speaker (SS) comparisons and 12 comparisons between monozygotic twins (MZ). Results revealed that the differences between DS and SS comparisons were significant in both high quality and telephone-filtered recordings, with no false rejections and limited false acceptances; this finding suggests that this set of voice features is highly speaker-dependent and therefore forensically useful. Mean ED for MZ pairs lies between the average ED for SS comparisons and DS comparisons, as expected according to the literature on twin voices. Specific cases of MZ speakers with very high ED (i.e. strong dissimilarity) are discussed in the context of sociophonetic and twin studies. A preliminary simplification of the Vocal Profile Analysis (VPA) Scheme is proposed, which enables the quantification of voice quality features in the perceptual assessment of speaker similarity, and allows for the calculation of perceptual-acoustic correlations. The adequacy of z-score normalization for this study is also discussed, as well as the relevance of heat maps for detecting the so-called phantoms in recent approaches to the biometric menagerie.
越来越多的人达成共识,即混合方法对于法医语音比较(FSC)中成功的说话人特征描述是必要的;因此,本研究探讨了结合源特征和滤波器特征的语音特征的法医潜力。前者与声带的动作有关,而后者反映了说话人声道的几何形状。这组特征是从停顿填充词中提取的,停顿填充词足够长以便进行稳健的特征估计,同时又足够自然,可以从实际法医案件工作中的语音样本中提取。使用说话人对之间的标准化欧几里得距离(ED)来测量说话人相似度:54组不同说话人(DS)比较、54组同一说话人(SS)比较以及12组同卵双胞胎(MZ)之间的比较。结果显示,在高质量录音和电话滤波录音中,DS和SS比较之间的差异均显著,没有错误拒绝且错误接受有限;这一发现表明,这组语音特征高度依赖于说话人,因此在法医方面很有用。正如关于双胞胎声音的文献所预期的那样,MZ对的平均ED介于SS比较和DS比较的平均ED之间。在社会语音学和双胞胎研究的背景下讨论了MZ说话人ED非常高(即非常不相似)的具体案例。提出了语音特征分析(VPA)方案的初步简化方法,该方法能够在说话人相似度的感知评估中对语音质量特征进行量化,并允许计算感知声学相关性。还讨论了本研究中z分数归一化的适用性,以及热图在检测生物特征库最新方法中所谓“幻影”方面的相关性。