Brown J C
Physics Department, Wellesley College, Massachusetts 02181, USA.
J Acoust Soc Am. 1999 Mar;105(3):1933-41. doi: 10.1121/1.426728.
Cepstral coefficients based on a constant Q transform have been calculated for 28 short (1-2 s) oboe sounds and 52 short saxophone sounds. These were used as features in a pattern analysis to determine for each of these sounds comprising the test set whether it belongs to the oboe or to the sax class. The training set consisted of longer sounds of 1 min or more for each of the instruments. A k-means algorithm was used to calculate clusters for the training data, and Gaussian probability density functions were formed from the mean and variance of each of the clusters. Each member of the test set was then analyzed to determine the probability that it belonged to each of the two classes; and a Bayes decision rule was invoked to assign it to one of the classes. Results have been extremely good and are compared to a human perception experiment identifying a subset of these same sounds.
基于恒定Q变换的倒谱系数已针对28个短(1 - 2秒)双簧管声音和52个短萨克斯管声音进行了计算。这些系数被用作模式分析中的特征,以确定测试集中的每个声音属于双簧管类别还是萨克斯管类别。训练集由每种乐器时长1分钟或更长的声音组成。使用k均值算法计算训练数据的聚类,并根据每个聚类的均值和方差形成高斯概率密度函数。然后对测试集中的每个成员进行分析,以确定其属于两个类别中每个类别的概率;并调用贝叶斯决策规则将其分配到其中一个类别。结果非常好,并与识别这些相同声音子集的人类感知实验进行了比较。