Suppr超能文献

Praditor:一种基于DBSCAN的语音起始检测自动化方法。

Praditor: A DBSCAN-based automation for speech onset detection.

作者信息

Liu Zhengyuan, Yu Xinqi, Hu Wing Chung, Ma Yunxiao, Wang Ruiming, Zhang Haoyun

机构信息

Centre for Cognitive and Brain Sciences, University of Macau, Taipa, Macau SAR, China.

School of Psychology, South China Normal University, Guangzhou, Guangdong, China.

出版信息

Behav Res Methods. 2025 Aug 4;57(9):247. doi: 10.3758/s13428-025-02776-2.

Abstract

Speech onset time (SOT) serves as a critical parameter in speech production research, marking the transition from background noise to the start of the speech signal. While manual annotation remains the gold standard for identifying SOT, its labor-intensive nature can result in considerable fatigue, thereby jeopardizing the accuracy of the annotation. Here, we present Praditor, a semi-automatic speech onset detection tool, leveraging a combination of algorithms consisting of density-based spatial clustering of applications with noise (DBSCAN) and first-derivative thresholding. Praditor offers a user-friendly experience across major platforms, including Windows and macOS, eliminating the need for complex setup procedures and offering a GUI that facilitates the tuning procedure. Furthermore, Praditor is capable of processing both multiple-onset and single-onset audio files regardless of language, and generates a TextGrid file for subsequent verification. To assess the accuracy of Praditor, we compared time difference (TD) scores and executed a linear regression analysis between manual and automatic annotations. Results showed that Praditor was highly accurate in both Mandarin and English datasets, as about 90% of the annotations fell within the range of ±20 ms, with corpus-level tuning achieving slightly lower but acceptable accuracy with respect to file-level tuning. This semi-automatic method is expected to offer a general solution for speech onset annotation in a language-independent manner, catering to not only experienced programmers but also users with little to no prior experience. Praditor is openly available on its official GitHub repository ( https://github.com/Paradeluxe/Praditor ).

摘要

语音起始时间(SOT)是语音产生研究中的一个关键参数,标志着从背景噪声到语音信号开始的转变。虽然人工标注仍然是识别SOT的金标准,但其劳动强度大的性质可能导致相当大的疲劳,从而危及标注的准确性。在此,我们介绍Praditor,一种半自动语音起始检测工具,它利用了基于密度的带噪声应用空间聚类(DBSCAN)和一阶导数阈值化相结合的算法。Praditor在包括Windows和macOS在内的主要平台上提供了用户友好的体验,无需复杂的设置程序,并提供了一个便于调整过程的图形用户界面(GUI)。此外,Praditor能够处理多起始和单起始音频文件,无论语言如何,并生成一个TextGrid文件用于后续验证。为了评估Praditor的准确性,我们比较了时间差(TD)分数,并在人工标注和自动标注之间进行了线性回归分析。结果表明,Praditor在普通话和英语数据集中都具有很高的准确性,约90%的标注落在±20毫秒的范围内,语料库级别的调整相对于文件级别的调整准确性略低但可以接受。这种半自动方法有望以语言无关的方式为语音起始标注提供一个通用解决方案,不仅适用于有经验的程序员,也适用于几乎没有或没有经验的用户。Praditor可在其官方GitHub仓库(https://github.com/Paradeluxe/Praditor)上公开获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验