Forczmański Paweł
Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, 52 Zolnierska St., 71-210 Szczecin, Poland.
J Voice. 2016 Jan;30(1):127.e21-30. doi: 10.1016/j.jvoice.2015.03.001. Epub 2015 Apr 30.
The article presents a description of the algorithm of singing voice quality assessment that uses selected methods from the field of digital image processing and recognition. It adopts the assumption that an audio signal with recorded vocal exercise can be converted into a visual representation, and processed further, as an image. Presented approach is based on generating a sound spectrogram of a sample in the form of a rectangular matrix, objective improvement of its visual quality based on local changes in brightness and contrast, and scaling to a fixed size. Then, it uses a two-step approach: the construction of a representative database of reference samples and the identification of test samples. The process of building the database uses two-dimensional linear discriminant analysis. Then, the recognition operation is carried out in a reduced feature space that has been obtained by two-dimensional Karhunen-Loeve projection. Classification is done by a variant of Support Vector Machines approach. As it is shown, the results are very encouraging and are competitive to the most powerful state-of-the-art methods.
本文介绍了一种歌唱语音质量评估算法,该算法采用了数字图像处理与识别领域的选定方法。它基于这样一种假设,即录制有声乐练习的音频信号可以转换为视觉表示,并作为图像进行进一步处理。所提出的方法基于以矩形矩阵形式生成样本的声谱图,基于亮度和对比度的局部变化对其视觉质量进行客观改进,并缩放到固定大小。然后,它采用两步法:构建参考样本的代表性数据库和识别测试样本。构建数据库的过程使用二维线性判别分析。然后,在通过二维卡尔胡宁-洛伊夫投影获得的降维特征空间中进行识别操作。分类通过支持向量机方法的一种变体来完成。结果表明,该结果非常令人鼓舞,与最强大的现有技术方法具有竞争力。