Praditor：一种基于DBSCAN的语音起始检测自动化方法。

Praditor: A DBSCAN-based automation for speech onset detection.

作者信息

Liu Zhengyuan, Yu Xinqi, Hu Wing Chung, Ma Yunxiao, Wang Ruiming, Zhang Haoyun

机构信息

Centre for Cognitive and Brain Sciences, University of Macau, Taipa, Macau SAR, China.

School of Psychology, South China Normal University, Guangzhou, Guangdong, China.

出版信息

Behav Res Methods. 2025 Aug 4;57(9):247. doi: 10.3758/s13428-025-02776-2.

DOI:10.3758/s13428-025-02776-2

PMID:40759856

Abstract

Speech onset time (SOT) serves as a critical parameter in speech production research, marking the transition from background noise to the start of the speech signal. While manual annotation remains the gold standard for identifying SOT, its labor-intensive nature can result in considerable fatigue, thereby jeopardizing the accuracy of the annotation. Here, we present Praditor, a semi-automatic speech onset detection tool, leveraging a combination of algorithms consisting of density-based spatial clustering of applications with noise (DBSCAN) and first-derivative thresholding. Praditor offers a user-friendly experience across major platforms, including Windows and macOS, eliminating the need for complex setup procedures and offering a GUI that facilitates the tuning procedure. Furthermore, Praditor is capable of processing both multiple-onset and single-onset audio files regardless of language, and generates a TextGrid file for subsequent verification. To assess the accuracy of Praditor, we compared time difference (TD) scores and executed a linear regression analysis between manual and automatic annotations. Results showed that Praditor was highly accurate in both Mandarin and English datasets, as about 90% of the annotations fell within the range of ±20 ms, with corpus-level tuning achieving slightly lower but acceptable accuracy with respect to file-level tuning. This semi-automatic method is expected to offer a general solution for speech onset annotation in a language-independent manner, catering to not only experienced programmers but also users with little to no prior experience. Praditor is openly available on its official GitHub repository ( https://github.com/Paradeluxe/Praditor ).

摘要

语音起始时间（SOT）是语音产生研究中的一个关键参数，标志着从背景噪声到语音信号开始的转变。虽然人工标注仍然是识别SOT的金标准，但其劳动强度大的性质可能导致相当大的疲劳，从而危及标注的准确性。在此，我们介绍Praditor，一种半自动语音起始检测工具，它利用了基于密度的带噪声应用空间聚类（DBSCAN）和一阶导数阈值化相结合的算法。Praditor在包括Windows和macOS在内的主要平台上提供了用户友好的体验，无需复杂的设置程序，并提供了一个便于调整过程的图形用户界面（GUI）。此外，Praditor能够处理多起始和单起始音频文件，无论语言如何，并生成一个TextGrid文件用于后续验证。为了评估Praditor的准确性，我们比较了时间差（TD）分数，并在人工标注和自动标注之间进行了线性回归分析。结果表明，Praditor在普通话和英语数据集中都具有很高的准确性，约90%的标注落在±20毫秒的范围内，语料库级别的调整相对于文件级别的调整准确性略低但可以接受。这种半自动方法有望以语言无关的方式为语音起始标注提供一个通用解决方案，不仅适用于有经验的程序员，也适用于几乎没有或没有经验的用户。Praditor可在其官方GitHub仓库（https://github.com/Paradeluxe/Praditor）上公开获取。

相似文献

Praditor: A DBSCAN-based automation for speech onset detection.Praditor：一种基于DBSCAN的语音起始检测自动化方法。

Behav Res Methods. 2025 Aug 4;57(9):247. doi: 10.3758/s13428-025-02776-2.

The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。

Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Short-Term Memory Impairment短期记忆障碍

Prosodic skills in Spanish-speaking adolescents and young adults with Down syndrome.西班牙语为母语的唐氏综合征青少年和成年人的韵律技能。

Int J Lang Commun Disord. 2024 Jul-Aug;59(4):1284-1295. doi: 10.1111/1460-6984.13001. Epub 2023 Dec 28.

Sympathetic nerve blocks for persistent pain in adults with inoperable abdominopelvic cancer.成人无法手术的腹盆腔癌症持续性疼痛的交感神经阻滞。

Cochrane Database Syst Rev. 2024 Jun 6;6(6):CD015229. doi: 10.1002/14651858.CD015229.pub2.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Interventions for childhood apraxia of speech.儿童言语失用症的干预措施。

Cochrane Database Syst Rev. 2018 May 30;5(5):CD006278. doi: 10.1002/14651858.CD006278.pub3.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

本文引用的文献

Prediction during simultaneous interpreting: Evidence from the visual-world paradigm.同声传译中的预测：来自视译实验的证据。

Cognition. 2022 Mar;220:104987. doi: 10.1016/j.cognition.2021.104987. Epub 2021 Dec 15.

Can we use the internet to study speech production? Yes we can! Evidence contrasting online versus laboratory naming latencies and errors.我们可以利用互联网研究言语产生吗？是的，我们可以！在线和实验室命名潜伏期和错误的对比证据。

PLoS One. 2021 Oct 22;16(10):e0258908. doi: 10.1371/journal.pone.0258908. eCollection 2021.

Is there proactive inhibitory control during bilingual and bidialectal language production?双语和双言语言产生过程中是否存在主动抑制控制？

PLoS One. 2021 Sep 14;16(9):e0257355. doi: 10.1371/journal.pone.0257355. eCollection 2021.

Commonalities in alpha and beta neural desynchronizations during prediction in language comprehension and production.语言理解与生成过程中预测时α和β神经去同步化的共性。

Cortex. 2020 Dec;133:328-345. doi: 10.1016/j.cortex.2020.09.026. Epub 2020 Oct 13.

Chinese character handwriting: A large-scale behavioral study and a database.汉字手写：大规模行为研究与数据库。

Behav Res Methods. 2020 Feb;52(1):82-96. doi: 10.3758/s13428-019-01206-4.

Is automatic speech-to-text transcription ready for use in psychological experiments?自动语音转文本转录是否可用于心理实验？

Behav Res Methods. 2018 Dec;50(6):2597-2605. doi: 10.3758/s13428-018-1037-4.

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes.AlignTool：用于心理语言学目的的德语、荷兰语和英语的口语自动时间对齐。

Behav Res Methods. 2018 Apr;50(2):466-489. doi: 10.3758/s13428-017-1002-7.

Chronset: An automated tool for detecting speech onset.Chronset：一种用于检测语音起始的自动化工具。

Behav Res Methods. 2017 Oct;49(5):1864-1881. doi: 10.3758/s13428-016-0830-1.

Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words.1100个中文词汇的效价、唤醒度、具体性、熟悉度、可想象性和语境可用性规范

Behav Res Methods. 2017 Aug;49(4):1374-1385. doi: 10.3758/s13428-016-0793-2.

Praat script to detect syllable nuclei and measure speech rate automatically.用于自动检测音节核心并测量语速的Praat脚本。

Behav Res Methods. 2009 May;41(2):385-90. doi: 10.3758/BRM.41.2.385.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Praditor：一种基于DBSCAN的语音起始检测自动化方法。

Praditor: A DBSCAN-based automation for speech onset detection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献