从平面嵌入中交互式提取多样的语音单元，无需事先进行声音分割。

Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation.

作者信息

Lorenz Corinna, Hao Xinyu, Tomka Tomas, Rüttimann Linus, Hahnloser Richard H R

机构信息

Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland.

Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, Saclay, France.

出版信息

Front Bioinform. 2023 Jan 13;2:966066. doi: 10.3389/fbinf.2022.966066. eCollection 2022.

DOI:10.3389/fbinf.2022.966066

PMID:36710910

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9880044/

Abstract

Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.

摘要

注释和校对诸如发声等复杂自然行为的数据集是繁琐的任务，因为给定行为的实例需要从背景噪声中正确分割出来，并且必须以最小的误报率进行分类。低维嵌入已被证明对这项任务非常有用，因为它们可以提供数据集的可视化概述，其中不同的行为出现在不同的簇中。然而，低维嵌入会引入误差，因为它们无法保留距离；而且嵌入仅表示固定维度的对象，这与由于持续时间可变而具有可变维度的发声相冲突。为了缓解这些问题，我们引入了一种用于发声的同时分割和聚类的半监督分析方法。我们通过在声谱图的嵌入平面中指定高密度区域对来定义给定的发声类型，一个区域与发声起始相关，另一个与发声结束相关。我们在对嵌入UMAP的成年斑胸草雀发声进行聚类的任务上展示了我们的双邻域（2N）提取方法。我们表明，2N提取允许从连续数据流中识别短和长的发声表现，而无需最初确定数据的特定分割。此外，2N提取的误报率比基于单个定义区域的可比方法低得多。连同我们的方法，我们还展示了一个用于可视化和注释数据的图形用户界面（GUI）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c5/9880044/6ab7239510f6/fbinf-02-966066-g001.jpg

相似文献

Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation.从平面嵌入中交互式提取多样的语音单元，无需事先进行声音分割。

Front Bioinform. 2023 Jan 13;2:966066. doi: 10.3389/fbinf.2022.966066. eCollection 2022.

Interactive Visual Cluster Analysis by Contrastive Dimensionality Reduction.基于对比降维的交互式可视化聚类分析

IEEE Trans Vis Comput Graph. 2023 Jan;29(1):734-744. doi: 10.1109/TVCG.2022.3209423. Epub 2022 Dec 16.

Graphical Image Region Extraction with K-Means Clustering and Watershed.基于K均值聚类和分水岭算法的图形图像区域提取

J Imaging. 2022 Jun 8;8(6):163. doi: 10.3390/jimaging8060163.

Analytical approaches for evaluating passive acoustic monitoring data: A case study of avian vocalizations.评估被动声学监测数据的分析方法：以鸟类发声为例的案例研究。

Ecol Evol. 2022 Apr 21;12(4):e8797. doi: 10.1002/ece3.8797. eCollection 2022 Apr.

Introducing the Software CASE (Cluster and Analyze Sound Events) by Comparing Different Clustering Methods and Audio Transformation Techniques Using Animal Vocalizations.通过使用动物发声比较不同聚类方法和音频转换技术来介绍CASE（声音事件聚类与分析）软件。

Animals (Basel). 2022 Aug 10;12(16):2020. doi: 10.3390/ani12162020.

t-SNE Visualization of Large-Scale Neural Recordings.大规模神经记录的t-SNE可视化

Neural Comput. 2018 Jul;30(7):1750-1774. doi: 10.1162/neco_a_01097. Epub 2018 Jun 12.

The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals.家养斑胸草雀的鸣声库：一种数据驱动的方法来破译通信信号中承载信息的声学特征。

Anim Cogn. 2016 Mar;19(2):285-315. doi: 10.1007/s10071-015-0933-6. Epub 2015 Nov 18.

Template-based automatic recognition of birdsong syllables from continuous recordings.基于模板的从连续录音中自动识别鸟鸣音节

J Acoust Soc Am. 1996 Aug;100(2 Pt 1):1209-19. doi: 10.1121/1.415968.

Automatic reconstruction of physiological gestures used in a model of birdsong production.鸟鸣产生模型中使用的生理手势的自动重建。

J Neurophysiol. 2015 Nov;114(5):2912-22. doi: 10.1152/jn.00385.2015. Epub 2015 Sep 16.

16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.16S rRNA 序列嵌入：核苷酸序列有意义的数值特征表示形式，方便下游分析。

PLoS Comput Biol. 2019 Feb 26;15(2):e1006721. doi: 10.1371/journal.pcbi.1006721. eCollection 2019 Feb.

本文引用的文献

The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。

PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.

Automated annotation of birdsong with a neural network that segments spectrograms.使用对声谱图进行分割的神经网络自动标注鸟鸣。

Elife. 2022 Jan 20;11:e63853. doi: 10.7554/eLife.63853.

Fast and accurate annotation of acoustic signals with deep neural networks.使用深度神经网络快速准确地标注声信号。

Elife. 2021 Nov 1;10:e68837. doi: 10.7554/eLife.68837.

Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires.低维习得特征空间定量个体和群体在声音曲目上的差异。

Elife. 2021 May 14;10:e67855. doi: 10.7554/eLife.67855.

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires.发现、可视化和量化不同动物声谱中的潜在结构。

PLoS Comput Biol. 2020 Oct 15;16(10):e1008228. doi: 10.1371/journal.pcbi.1008228. eCollection 2020 Oct.

Nearest neighbours reveal fast and slow components of motor learning.最近邻揭示了运动学习的快和慢成分。

Nature. 2020 Jan;577(7791):526-530. doi: 10.1038/s41586-019-1892-x. Epub 2020 Jan 8.

Parallels in the sequential organization of birdsong and human speech.鸟鸣和人类言语在顺序组织上的相似性。

Nat Commun. 2019 Aug 12;10(1):3636. doi: 10.1038/s41467-019-11605-y.

Smart computational exploration of stochastic gene regulatory network models using human-in-the-loop semi-supervised learning.使用人机交互半监督学习对随机基因调控网络模型进行智能计算探索。

Bioinformatics. 2019 Dec 15;35(24):5199-5206. doi: 10.1093/bioinformatics/btz420.

A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.一种基于聚类后标记的半监督学习方法在病理图像分类中的应用。

Sci Rep. 2018 May 8;8(1):7193. doi: 10.1038/s41598-018-24876-0.

Reconstruction of vocal interactions in a group of small songbirds.重建一小群鸣禽的群体发声互动。

Nat Methods. 2014 Nov;11(11):1135-7. doi: 10.1038/nmeth.3114. Epub 2014 Sep 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从平面嵌入中交互式提取多样的语音单元，无需事先进行声音分割。

Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献