Bloorview Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, Ontario, Canada.
Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada.
PLoS One. 2024 Apr 2;19(4):e0299888. doi: 10.1371/journal.pone.0299888. eCollection 2024.
While the musical instrument classification task is well-studied, there remains a gap in identifying non-pitched percussion instruments which have greater overlaps in frequency bands and variation in sound quality and play style than pitched instruments. In this paper, we present a musical instrument classifier for detecting tambourines, maracas and castanets, instruments that are often used in early childhood music education. We generated a dataset with diverse instruments (e.g., brand, materials, construction) played in different locations with varying background noise and play styles. We conducted sensitivity analyses to optimize feature selection, windowing time, and model selection. We deployed and evaluated our best model in a mixed reality music application with 12 families in a home setting. Our dataset was comprised of over 369,000 samples recorded in-lab and 35,361 samples recorded with families in a home setting. We observed the Light Gradient Boosting Machine (LGBM) model to perform best using an approximate 93 ms window with only 12 mel-frequency cepstral coefficients (MFCCs) and signal entropy. Our best LGBM model was observed to perform with over 84% accuracy across all three instrument families in-lab and over 73% accuracy when deployed to the home. To our knowledge, the dataset compiled of 369,000 samples of non-pitched instruments is first of its kind. This work also suggests that a low feature space is sufficient for the recognition of non-pitched instruments. Lastly, real-world deployment and testing of the algorithms created with participants of diverse physical and cognitive abilities was also an important contribution towards more inclusive design practices. This paper lays the technological groundwork for a mixed reality music application that can detect children's use of non-pitched, percussion instruments to support early childhood music education and play.
虽然乐器分类任务已经得到了很好的研究,但在识别非音高打击乐器方面仍存在差距,这些乐器在频带和声音质量以及演奏风格上的重叠比音高乐器更大。在本文中,我们提出了一种乐器分类器,用于检测手鼓、响板和响棒,这些乐器常用于儿童早期音乐教育。我们生成了一个数据集,其中包含各种乐器(例如品牌、材料、结构)在不同位置演奏,背景噪音和演奏风格各不相同。我们进行了敏感性分析,以优化特征选择、窗口时间和模型选择。我们在一个混合现实音乐应用程序中部署和评估了我们的最佳模型,该应用程序在家庭环境中面向 12 个家庭。我们的数据集由超过 369,000 个在实验室录制的样本和 35,361 个在家庭环境中与家庭录制的样本组成。我们观察到使用大约 93 毫秒的窗口和仅 12 个梅尔频率倒谱系数(MFCC)和信号熵,轻梯度提升机(LGBM)模型表现最佳。我们观察到最佳的 LGBM 模型在实验室中对所有三个乐器家族的表现准确率超过 84%,在家中部署时的准确率超过 73%。据我们所知,这个由 369,000 个非音高乐器样本组成的数据集是同类数据集中的第一个。这项工作还表明,对于非音高乐器的识别,低特征空间是足够的。最后,使用具有不同身体和认知能力的参与者创建的算法进行真实世界的部署和测试也是更具包容性设计实践的重要贡献。本文为能够检测儿童使用非音高、打击乐器的混合现实音乐应用程序奠定了技术基础,以支持儿童早期音乐教育和游戏。