Department of Measurement, Faculty of Electrical Engineering, Czech Technical University in Prague, Technická 2, Prague, 166 27, Czech Republic.
Sci Rep. 2021 Oct 12;11(1):20185. doi: 10.1038/s41598-021-99811-x.
New methods of securing the distribution of audio content have been widely deployed in the last twenty years. Their impact on perceptive quality has, however, only been seldomly the subject of recent extensive research. We review digital speech watermarking state of the art and provide subjective testing of watermarked speech samples. Latest speech watermarking techniques are listed, with their specifics and potential for further development. Their current and possible applications are evaluated. Open-source software designed to embed watermarking patterns in audio files is used to produce a set of samples that satisfies the requirements of modern speech-quality subjective assessments. The patchwork algorithm that is coded in the application is mainly considered in this analysis. Different watermark robustness levels are used, which allow determining the threshold of detection to human listeners. The subjective listening tests are conducted following ITU-T P.800 Recommendation, which precisely defines the conditions and requirements for subjective testing. Further analysis tries to determine the effects of noise and various disturbances on watermarked speech's perceived quality. A threshold of intelligibility is estimated to allow further openings on speech compression techniques with watermarking. The impact of language or social background is evaluated through an additional experiment involving two groups of listeners. Results show significant robustness of the watermarking implementation, retaining both a reasonable net subjective audio quality and security attributes, despite mild levels of distortion and noise. Extended experiments with Chinese listeners open the door to formulate a hypothesis on perception variations with geographical and social backgrounds.
在过去的二十年中,广泛采用了新的方法来确保音频内容的分发。然而,它们对感知质量的影响很少成为最近广泛研究的主题。我们回顾了数字语音水印的最新技术,并对经过水印处理的语音样本进行了主观测试。列出了最新的语音水印技术,以及它们的具体信息和进一步开发的潜力。评估了它们当前和可能的应用。使用设计用于在音频文件中嵌入水印模式的开源软件来生成一组满足现代语音质量主观评估要求的样本。在这项分析中,主要考虑了应用程序中编码的补丁算法。使用不同的水印鲁棒性级别,可以确定人类听众的检测阈值。根据 ITU-T P.800 建议书进行主观听力测试,该建议书精确地定义了主观测试的条件和要求。进一步的分析试图确定噪声和各种干扰对经过水印处理的语音感知质量的影响。估计可懂度阈值,以允许进一步开放具有水印的语音压缩技术。通过涉及两组听众的额外实验来评估语言或社会背景的影响。结果表明,即使存在轻度失真和噪声,该水印实现具有显著的稳健性,保留了合理的净主观音频质量和安全属性。与中国听众进行的扩展实验为制定与地理和社会背景相关的感知变化假设打开了大门。