Kim Ji Chul
Department of Psychological Sciences, University of ConnecticutStorrs, CT, USA.
Oscilloscape LLCEast Hartford, CT, USA.
Front Psychol. 2017 May 4;8:666. doi: 10.3389/fpsyg.2017.00666. eCollection 2017.
Tonal melody can imply vertical harmony through a sequence of tones. Current methods for automatic chord estimation commonly use chroma-based features extracted from audio signals. However, the implied harmony of unaccompanied melodies can be difficult to estimate on the basis of chroma content in the presence of frequent nonchord tones. Here we present a novel approach to automatic chord estimation based on the human perception of pitch sequences. We use cohesion and inhibition between pitches in auditory short-term memory to differentiate chord tones and nonchord tones in tonal melodies. We model short-term pitch memory as a gradient frequency neural network, which is a biologically realistic model of auditory neural processing. The model is a dynamical system consisting of a network of tonotopically tuned nonlinear oscillators driven by audio signals. The oscillators interact with each other through nonlinear resonance and lateral inhibition, and the pattern of oscillatory traces emerging from the interactions is taken as a measure of pitch salience. We test the model with a collection of unaccompanied tonal melodies to evaluate it as a feature extractor for chord estimation. We show that chord tones are selectively enhanced in the response of the model, thereby increasing the accuracy of implied harmony estimation. We also find that, like other existing features for chord estimation, the performance of the model can be improved by using segmented input signals. We discuss possible ways to expand the present model into a full chord estimation system within the dynamical systems framework.
调性旋律可以通过一系列音调暗示纵向和声。当前的自动和弦估计方法通常使用从音频信号中提取的基于色度的特征。然而,在存在频繁的非和弦音的情况下,无伴奏旋律的隐含和声可能难以基于色度内容进行估计。在此,我们提出一种基于人类对音高序列感知的自动和弦估计新方法。我们利用听觉短期记忆中音调之间的凝聚和抑制来区分调性旋律中的和弦音和非和弦音。我们将短期音高记忆建模为梯度频率神经网络,这是一种听觉神经处理的生物现实模型。该模型是一个动态系统,由由音频信号驱动的按音调拓扑调整的非线性振荡器网络组成。振荡器通过非线性共振和侧向抑制相互作用,并且将从相互作用中出现的振荡轨迹模式作为音高显著性的度量。我们用一组无伴奏调性旋律对该模型进行测试,以评估其作为和弦估计特征提取器的性能。我们表明,在模型的响应中,和弦音被选择性增强,从而提高了隐含和声估计的准确性。我们还发现,与其他现有的和弦估计特征一样,通过使用分段输入信号可以提高模型的性能。我们讨论了在动态系统框架内将当前模型扩展为完整和弦估计系统的可能方法。