Song Meishu, Yang Zijiang, Parada-Cabaleiro Emilia, Jing Xin, Yamamoto Yoshiharu, Schuller Björn
Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany.
Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan.
Front Neurosci. 2023 Jun 15;17:1120311. doi: 10.3389/fnins.2023.1120311. eCollection 2023.
The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database.
ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database.
Our best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy.
For future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community.
自主感觉经络反应(ASMR)是一种感觉现象的组合,涉及类似静电的刺痛感,它会因某些刺激而出现。尽管ASMR在社交媒体上广受欢迎,但尚未有与ASMR相关刺激的开源数据库,这使得研究界大多无法接触到这一现象,因此几乎完全未被探索。在这方面,我们展示了ASMR低语语音(ASMR-WS)数据库。
ASWR-WS是一个关于低语语音的新型数据库,专门为促进类似ASMR的无声语言识别(无声-LID)系统的发展而定制。ASMR-WS数据库包含38个视频,总时长为10小时36分钟,包括七种目标语言(中文、英文、法文、意大利文、日文、韩文和西班牙文)。除了该数据库,我们还展示了在ASMR-WS数据库上无声-LID的基线结果。
基于2秒长度的片段、CNN分类器和MFCC声学特征,我们在七类问题上的最佳结果实现了85.74%的未加权平均召回率和90.83%的准确率。
对于未来的工作,我们希望更深入地关注语音样本的时长,因为我们在此处应用的组合中看到了不同的结果。为了推动该领域的进一步研究,研究界可以使用ASMR-WS数据库以及本文基线中考虑的划分方式。