Li Xiaochang, Mills Mara
Technol Cult. 2019;60(2S):S129-S160. doi: 10.1353/tech.2019.0066.
This article considers machine methods used in the collection, processing, and application of vocal recordings for speaker identification and speech recognition between 1908 and 1970. The first phonographic archives featured collections of "vocal portraits" that prompted international investigations into the essential features of human voices for individual identification. Visual records of speech later found the same applications, but as "voiceprint identification" via sound spectrography began to achieve legal and commercial success in the 1960s, the procedure attracted more widespread scientific attention, which ultimately discredited both its accuracy and its rationale. At the same time, spectrogram collections spurred a new application-speech recognition by machine. The changing status of the speech spectrogram, from a record of unique features of individual voices to a model of fundamental invariants in speech sounds, was rooted in the demands of automated processing and a corresponding shift from the sound archive to the acoustic database.
本文探讨了1908年至1970年间用于说话人识别和语音识别的语音记录的收集、处理及应用的机器方法。最早的留声机档案以“语音画像”收藏为特色,引发了对用于个人识别的人类声音基本特征的国际调查。语音的视觉记录后来也有同样的应用,但随着20世纪60年代通过声谱图进行的“声纹识别”开始取得法律和商业上的成功,该程序引起了更广泛的科学关注,最终其准确性和原理都受到了质疑。与此同时,声谱图收藏催生了一种新的应用——机器语音识别。语音声谱图的地位不断变化,从个人声音独特特征的记录,转变为语音基本不变量的模型,其根源在于自动化处理的需求以及从声音档案到声学数据库的相应转变。