Suppr超能文献

基于组合层次模型的稳健实时音乐转录

Robust Real-Time Music Transcription with a Compositional Hierarchical Model.

作者信息

Pesek Matevž, Leonardis Aleš, Marolt Matija

机构信息

University of Ljubljana, Faculty of Computer and Information Science, Laboratory for computer graphics and multimedia, Ljubljana, Slovenia.

University of Birmingham, School of Computer Science, Centre for Computational Neuroscience and Cognitive Robotics, Birmingham, United Kingdom of Great Britain and Northern Ireland.

出版信息

PLoS One. 2017 Jan 3;12(1):e0169411. doi: 10.1371/journal.pone.0169411. eCollection 2017.

Abstract

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

摘要

本文提出了一种用于稳健音乐转录的新型组合分层模型。其主要特点包括对输入数据进行分层表示的无监督学习、透明度(这使得能够深入了解所学表示),以及稳健性和速度(使其适用于实际应用和实时使用)。该模型由多层组成,每层由多个部分构成。模型的分层性质与音乐中的分层结构非常契合。较低层的部分对应于低级概念(例如音调分音),而较高层的部分则将低级表示组合成更复杂的概念(音符、和弦)。这些层是从音乐信号中以无监督方式学习得到的。每层中的部分是基于统计共现作为学习过程驱动力的前一层部分的组合。在本文中,我们展示了模型的结构,并将其与音乐信息检索领域的其他分层方法进行比较。我们评估了模型在多个基频估计方面的性能。最后,我们详细阐述了该模型针对其他音乐信息检索任务的扩展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74f7/5207709/66ef231b93d0/pone.0169411.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验