• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

时频散射能准确地模拟乐器演奏技巧之间的听觉相似性。

Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.

作者信息

Lostanlen Vincent, El-Hajj Christian, Rossignol Mathias, Lafay Grégoire, Andén Joakim, Lagrange Mathieu

机构信息

LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France.

Lonofi, 57 rue Letort, Paris, 75018 France.

出版信息

EURASIP J Audio Speech Music Process. 2021;2021(1):3. doi: 10.1186/s13636-020-00187-z. Epub 2021 Jan 11.

DOI:10.1186/s13636-020-00187-z
PMID:33488686
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7801324/
Abstract

Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 990±1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.

摘要

诸如颤音、滑音和颤音等乐器演奏技巧,在古典音乐和民间音乐语境中通常都代表着音乐表现力。然而,现有的大多数音乐相似度检索方法都无法描述除所谓“普通”技巧之外的音色,而是将乐器种类作为音色质量的替代指标,并且不允许针对新受试者的感知特性进行定制。在本文中,我们让31名人类受试者将78个孤立音符组织成一组音色类别。对他们的回答进行分析表明,音色感知所依据的分类法比仅由乐器或演奏技巧提供的分类法更为灵活。此外,我们提出了一种机器听觉模型,以恢复跨乐器、消音器和技巧的听觉相似度聚类图。我们的模型依靠联合时频散射特征来提取频谱时间调制作为声学特征。此外,它通过大间隔最近邻(LMNN)度量学习算法,使聚类图中的三元组损失最小化。在一个包含9346个孤立音符的数据集上,我们报告了排名前五的平均精度(AP@5)达到99.0±1%的先进水平。一项消融研究表明,去除联合时频散射变换或度量学习算法都会显著降低性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/17bd57fb7ded/13636_2020_187_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/6fb53b528473/13636_2020_187_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/e0bd8a1bd285/13636_2020_187_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/3283ededaf1e/13636_2020_187_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/02a40be7ce5b/13636_2020_187_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/17bd57fb7ded/13636_2020_187_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/6fb53b528473/13636_2020_187_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/e0bd8a1bd285/13636_2020_187_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/3283ededaf1e/13636_2020_187_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/02a40be7ce5b/13636_2020_187_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b0/7801324/17bd57fb7ded/13636_2020_187_Fig5_HTML.jpg

相似文献

1
Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.时频散射能准确地模拟乐器演奏技巧之间的听觉相似性。
EURASIP J Audio Speech Music Process. 2021;2021(1):3. doi: 10.1186/s13636-020-00187-z. Epub 2021 Jan 11.
2
Modeling Timbre Similarity of Short Music Clips.短音乐片段的音色相似度建模
Front Psychol. 2017 Apr 26;8:639. doi: 10.3389/fpsyg.2017.00639. eCollection 2017.
3
The Timbre Toolbox: extracting audio descriptors from musical signals.音色工具箱:从音乐信号中提取音频描述符。
J Acoust Soc Am. 2011 Nov;130(5):2902-16. doi: 10.1121/1.3642604.
4
Perception of musical timbre by cochlear implant listeners: a multidimensional scaling study.人工耳蜗使用者对音乐音色的感知:多维标度研究。
Ear Hear. 2013 Jul-Aug;34(4):426-36. doi: 10.1097/AUD.0b013e31827535f8.
5
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre.学习声谱时变调制的度量可以揭示对乐器音色的感知。
Nat Hum Behav. 2021 Mar;5(3):369-377. doi: 10.1038/s41562-020-00987-5. Epub 2020 Nov 30.
6
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases.用于孤立音符和独奏乐句中乐器识别的仿生光谱-时间特征。
EURASIP J Audio Speech Music Process. 2015;2015. doi: 10.1186/s13636-015-0070-9.
7
Automatic Assessment of Tone Quality in Violin Music Performance.小提琴音乐演奏中音质的自动评估
Front Psychol. 2019 Mar 14;10:334. doi: 10.3389/fpsyg.2019.00334. eCollection 2019.
8
A Randomized Controlled Crossover Study of the Impact of Online Music Training on Pitch and Timbre Perception in Cochlear Implant Users.一项关于在线音乐训练对人工耳蜗使用者音高和音色感知影响的随机对照交叉研究。
J Assoc Res Otolaryngol. 2019 Jun;20(3):247-262. doi: 10.1007/s10162-018-00704-0. Epub 2019 Feb 27.
9
Examination of spectral timbre cues and musical instrument identification in cochlear implant recipients.人工耳蜗植入者的频谱音色线索及乐器识别检测
Cochlear Implants Int. 2014 Mar;15(2):78-86. doi: 10.1179/1754762813Y.0000000059. Epub 2014 Jan 3.
10
Encoding of natural timbre dimensions in human auditory cortex.人类听觉皮层中自然音色维度的编码。
Neuroimage. 2018 Feb 1;166:60-70. doi: 10.1016/j.neuroimage.2017.10.050. Epub 2017 Nov 4.

本文引用的文献

1
Modeling the onset advantage in musical instrument recognition.乐器识别中的起始优势建模。
J Acoust Soc Am. 2019 Dec;146(6):EL523. doi: 10.1121/1.5141369.
2
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases.用于孤立音符和独奏乐句中乐器识别的仿生光谱-时间特征。
EURASIP J Audio Speech Music Process. 2015;2015. doi: 10.1186/s13636-015-0070-9.
3
Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments.用于识别持续发声乐器的感知显著的频谱-时间调制
J Acoust Soc Am. 2016 Dec;140(6):EL478. doi: 10.1121/1.4971204.
4
Understanding deep convolutional networks.理解深度卷积网络。
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150203. doi: 10.1098/rsta.2015.0203.
5
Acoustic and Categorical Dissimilarity of Musical Timbre: Evidence from Asymmetries Between Acoustic and Chimeric Sounds.音乐音色的声学与类别差异:来自声学与嵌合声音不对称性的证据
Front Psychol. 2016 Jan 5;6:1977. doi: 10.3389/fpsyg.2015.01977. eCollection 2015.
6
Comparison of Methods for Collecting and Modeling Dissimilarity Data: Applications to Complex Sound Stimuli.收集和建模差异数据的方法比较:在复杂声音刺激中的应用
Multivariate Behav Res. 2011 Sep 30;46(5):779-811. doi: 10.1080/00273171.2011.606748.
7
One hundred ways to process time, frequency, rate and scale in the central auditory system: a pattern-recognition meta-analysis.中枢听觉系统中处理时间、频率、速率和尺度的一百种方式:一项模式识别荟萃分析。
Front Comput Neurosci. 2015 Jul 3;9:80. doi: 10.3389/fncom.2015.00080. eCollection 2015.
8
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.可分离的频谱-时间Gabor滤波器组特征:降低用于自动语音识别的稳健特征的复杂度。
J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.
9
A multiresolution analysis for detection of abnormal lung sounds.一种用于检测异常肺音的多分辨率分析方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:3139-42. doi: 10.1109/EMBC.2012.6346630.
10
Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones.管弦乐器音色五个感知维度的声学结构。
J Acoust Soc Am. 2013 Jan;133(1):389-404. doi: 10.1121/1.4770244.