Brown J C, Houix O, McAdams S
Physics Department, Wellesley College, Massachusetts 02181, USA.
J Acoust Soc Am. 2001 Mar;109(3):1064-72. doi: 10.1121/1.1342075.
The automatic identification of musical instruments is a relatively unexplored and potentially very important field for its promise to free humans from time-consuming searches on the Internet and indexing of audio material. Speaker identification techniques have been used in this paper to determine the properties (features) which are most effective in identifying a statistically significant number of sounds representing four classes of musical instruments (oboe, sax, clarinet, flute) excerpted from actual performances. Features examined include cepstral coefficients, constant-Q coefficients, spectral centroid, autocorrelation coefficients, and moments of the time wave. The number of these coefficients was varied, and in the case of cepstral coefficients, ten coefficients were sufficient for identification. Correct identifications of 79%-84% were obtained with cepstral coefficients, bin-to-bin differences of the constant-Q coefficients, and autocorrelation coefficients; the latter have not been used previously in either speaker or instrument identification work. These results depended on the training sounds chosen and the number of clusters used in the calculation. Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.
乐器的自动识别是一个相对未被充分探索但可能非常重要的领域,因为它有望将人类从在互联网上耗时的搜索和音频材料索引工作中解放出来。本文采用说话人识别技术来确定哪些属性(特征)对于识别从实际演奏中提取的、代表四类乐器(双簧管、萨克斯管、单簧管、长笛)的大量具有统计学意义的声音最为有效。所研究的特征包括倒谱系数、恒定Q系数、谱质心、自相关系数以及时间波形的矩。这些系数的数量有所变化,对于倒谱系数而言,十个系数就足以进行识别。使用倒谱系数、恒定Q系数的逐频段差异以及自相关系数,正确识别率达到了79% - 84%;自相关系数此前在说话人或乐器识别工作中均未被使用过。这些结果取决于所选择的训练声音以及计算中使用的聚类数量。与针对相同乐器发出的声音进行的人类感知实验相比,结果表明,在这些条件下,计算机在识别木管乐器方面与人类表现相当。