Suppr超能文献

在一个新数据集中识别语言:ASMR低语语音。

Identifying languages in a novel dataset: ASMR-whispered speech.

作者信息

Song Meishu, Yang Zijiang, Parada-Cabaleiro Emilia, Jing Xin, Yamamoto Yoshiharu, Schuller Björn

机构信息

Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany.

Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan.

出版信息

Front Neurosci. 2023 Jun 15;17:1120311. doi: 10.3389/fnins.2023.1120311. eCollection 2023.

Abstract

INTRODUCTION

The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database.

METHODS

ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database.

RESULTS

Our best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy.

DISCUSSION

For future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community.

摘要

引言

自主感觉经络反应(ASMR)是一种感觉现象的组合,涉及类似静电的刺痛感,它会因某些刺激而出现。尽管ASMR在社交媒体上广受欢迎,但尚未有与ASMR相关刺激的开源数据库,这使得研究界大多无法接触到这一现象,因此几乎完全未被探索。在这方面,我们展示了ASMR低语语音(ASMR-WS)数据库。

方法

ASWR-WS是一个关于低语语音的新型数据库,专门为促进类似ASMR的无声语言识别(无声-LID)系统的发展而定制。ASMR-WS数据库包含38个视频,总时长为10小时36分钟,包括七种目标语言(中文、英文、法文、意大利文、日文、韩文和西班牙文)。除了该数据库,我们还展示了在ASMR-WS数据库上无声-LID的基线结果。

结果

基于2秒长度的片段、CNN分类器和MFCC声学特征,我们在七类问题上的最佳结果实现了85.74%的未加权平均召回率和90.83%的准确率。

讨论

对于未来的工作,我们希望更深入地关注语音样本的时长,因为我们在此处应用的组合中看到了不同的结果。为了推动该领域的进一步研究,研究界可以使用ASMR-WS数据库以及本文基线中考虑的划分方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7020/10308374/a6e6641b01b1/fnins-17-1120311-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验