AUDIAS Research Group, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid, Spain.
Faculty of IT, IT4I Centre of Excellence, Brno University of Technology, Brno, Czech Republic.
PLoS One. 2024 Jul 5;19(7):e0303994. doi: 10.1371/journal.pone.0303994. eCollection 2024.
In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks. In this paper, we present a detailed description of JSS (Joint Source Separation and Sound Event Detection), our joint-training scheme for SSep and SED, and we measure its performance in the DCASE Challenge for SED in domestic environments. Our experiments demonstrate that JSS can improve SED performance, in terms of Polyphonic Sound Detection Score (PSDS), even without additional training data. Additionally, we conduct a thorough analysis of JSS's effectiveness across different event classes and in scenarios with severe event overlap, where it is expected to yield further improvements. Furthermore, we introduce an objective measure to assess the diversity of event predictions across the estimated sources, shedding light on how different training strategies impact the separation of sound events. Finally, we provide graphical examples of the Source Separation and Sound Event Detection steps, aiming to facilitate the interpretation of the JSS methods.
近年来,声音事件检测(SED)和声源分离(SSep)之间的关系引起了越来越多的关注,特别是为了通过利用这两个任务之间的协同作用来提高 SED 的性能。在本文中,我们详细描述了 JSS(联合声源分离和声音事件检测),这是我们针对 SSep 和 SED 的联合训练方案,并在 DCASE 挑战赛中针对室内环境的 SED 测量了其性能。我们的实验表明,JSS 可以提高 SED 的性能,在多音声音检测得分(PSDS)方面,即使没有额外的训练数据。此外,我们还针对不同的事件类和严重的事件重叠场景分析了 JSS 的有效性,预计在这些场景中会有进一步的改进。此外,我们引入了一种客观的度量方法来评估估计源之间的事件预测的多样性,揭示了不同的训练策略如何影响声音事件的分离。最后,我们提供了源分离和声音事件检测步骤的图形示例,旨在促进对 JSS 方法的解释。