Suppr超能文献

基于音乐的公民科学:视觉如何影响听觉。

Science of music-based citizen science: How seeing influences hearing.

作者信息

Bedoya Daniel, Lascabettes Paul, Fyfe Lawrence, Chew Elaine

机构信息

Laboratoire de Mécanique des Structures et des Systèmes Couplés (LMSSC), Conservatoire National des Arts et Métiers (CNAM), Paris, France.

Institut de Recherche Mathématique Avancée IRMA, University of Strasbourg, Strasbourg, France.

出版信息

PLoS One. 2025 Sep 10;20(9):e0325019. doi: 10.1371/journal.pone.0325019. eCollection 2025.

Abstract

Citizen science engages volunteers to contribute data to scientific projects, often through visual annotation tasks. Hearing based activities are rare and less well understood. Having high quality annotations of performed music structures is essential for reliable algorithmic analysis of recorded music with applications ranging from music information retrieval to music therapy. Music annotations typically begin with an aural input combined with a variety of visual representations, but the impact of the visuals and aural inputs on the annotations are not known. Here, we present a study where participants annotate music segmentation boundaries of variable strengths given only visuals (audio waveform or piano roll) or only audio or both visuals and audio simultaneously. Participants were presented with the set of 33 contrasting theme and variations extracted from a through-recorded performance of Beethoven's 32 Variations in C minor, WoO 80, under differing audiovisual conditions. Their segmentation boundaries were visualized using boundary credence profiles and compared using the unbalanced optimal transport distance, which tracks boundary weights and penalizes boundary removal, and compared to the F-measure. Compared to annotations derived from audio/visual (cross-modal) input (considered as the gold standard for our study), boundary annotations derived from visual (unimodal) input were closer than those derived from audio (unimodal) input. The presence of visuals led to larger peaks in boundary credence profiles, marking clearer global segmentations, while audio helped resolve discrepancies and capture subtle segmentation cues. We conclude that audio and visual inputs can be used as cognitive scaffolding to enhance results in large-scale citizen science annotation of music media and to support data analysis and interpretation. In summary, visuals provide cues for big structures, but complex structural nuances are better discerned by ear.

摘要

公民科学通常通过视觉标注任务让志愿者为科学项目贡献数据。基于听觉的活动很少见且了解较少。对于从音乐信息检索到音乐治疗等应用的录制音乐进行可靠的算法分析而言,拥有高质量的已演奏音乐结构标注至关重要。音乐标注通常始于结合各种视觉表示的听觉输入,但视觉和听觉输入对标注的影响尚不清楚。在此,我们展示了一项研究,参与者在仅给出视觉信息(音频波形或钢琴卷帘图)、仅音频或视觉和音频同时呈现的情况下,对强度可变的音乐分割边界进行标注。参与者在不同视听条件下,对从贝多芬《C小调第32变奏曲》(作品编号WoO 80)的完整录制演奏中提取的33组对比主题及变奏进行标注。他们的分割边界使用边界可信度剖面图进行可视化,并使用不平衡最优传输距离进行比较,该距离可追踪边界权重并对边界去除进行惩罚,同时与F值进行比较。与从音频/视觉(跨模态)输入得出的标注(被视为我们研究的黄金标准)相比,从视觉(单模态)输入得出的边界标注比从音频(单模态)输入得出的更接近。视觉信息的存在导致边界可信度剖面图中出现更大的峰值,标志着更清晰的全局分割,而音频有助于解决差异并捕捉细微的分割线索。我们得出结论,音频和视觉输入可作为认知支架,以提高音乐媒体大规模公民科学标注的结果,并支持数据分析和解释。总之,视觉信息为大结构提供线索,但复杂的结构细微差别靠耳朵能更好地辨别。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验