• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TransStutter:一种基于卷积的深度学习方法,使用 2D Mel 频谱图可视化和基于注意力的特征表示来分类口吃语音。

TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation.

机构信息

School of Computing Science & Engineering, VIT Bhopal University, Sehore 466114, India.

Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 333, Taiwan.

出版信息

Sensors (Basel). 2023 Sep 22;23(19):8033. doi: 10.3390/s23198033.

DOI:10.3390/s23198033
PMID:37836863
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10575465/
Abstract

Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce "TranStutter", a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter's performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter's significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders.

摘要

口吃,一种普遍的神经发育障碍,严重影响流畅的言语,导致不自主的中断和反复的声音模式。本研究解决了对口吃类型进行准确分类的迫切需要。研究人员引入了“TranStutter”,这是一种开创性的基于无卷积转换器的深度学习模型,旨在擅长语音不流畅分类。与传统方法不同,TranStutter利用多头自我注意和位置编码来捕捉复杂的时间模式,从而获得更高的准确性。在这项研究中,研究人员使用了两个基准数据集:口吃事件播客数据集(SEP-28k)和流利银行面试子集。SEP-28k 由 28177 个来自播客的音频剪辑组成,这些剪辑被精心注释为不同的不流畅和非不流畅标签,包括块(BL)、延长(PR)、声音重复(SR)、单词重复(WR)和插入语(IJ)。流利银行子集包括 32 个口吃者(PWS)的 4144 个音频剪辑,提供了多样化的语音样本。TranStutter 的性能经过了严格的评估。在 SEP-28k 上,该模型达到了令人印象深刻的 88.1%的准确率。此外,在流利银行数据集上,TranStutter 的准确率为 80.6%,证明了它的有效性。这些结果突出了 TranStutter 在对口吃的诊断和治疗方面的重大潜力,从而为语音病理学和神经发育研究的发展做出了贡献。多头自我注意和位置编码的创新集成使 TranStutter 脱颖而出,使其能够以前所未有的精度识别细微的不流畅。这种新方法代表了语音病理学领域的重大飞跃,有望为口吃障碍患者提供更准确的诊断和有针对性的干预措施。

相似文献

1
TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation.TransStutter:一种基于卷积的深度学习方法,使用 2D Mel 频谱图可视化和基于注意力的特征表示来分类口吃语音。
Sensors (Basel). 2023 Sep 22;23(19):8033. doi: 10.3390/s23198033.
2
FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.流畅度银行时间戳数据集:用于不流畅检测和自动意图语音识别的更新数据集。
J Speech Lang Hear Res. 2024 Nov 7;67(11):4203-4215. doi: 10.1044/2024_JSLHR-24-00070. Epub 2024 Oct 8.
3
Phonological neighborhood effect in spontaneous speech in adults who stutter.口吃成年人自发言语中的语音邻域效应。
J Fluency Disord. 2018 Dec;58:86-93. doi: 10.1016/j.jfludis.2018.08.005. Epub 2018 Aug 30.
4
Beyond stuttering: Speech disfluencies in normally fluent French-speaking children at age 4.超越口吃:4岁法语流利儿童的言语不流畅现象
Clin Linguist Phon. 2018;32(2):166-179. doi: 10.1080/02699206.2017.1344878. Epub 2017 Aug 24.
5
Preliminary study of disfluency in school-aged children with autism.自闭症儿童口语不流畅的初步研究。
Int J Lang Commun Disord. 2014 Jan-Feb;49(1):75-89. doi: 10.1111/1460-6984.12048. Epub 2013 Sep 11.
6
Characteristics of articulatory gestures in stuttered speech: A case study using real-time magnetic resonance imaging.口吃语音的发音动作特征:一项基于实时磁共振成像的个案研究。
J Commun Disord. 2022 May-Jun;97:106213. doi: 10.1016/j.jcomdis.2022.106213. Epub 2022 Mar 18.
7
Judgments of disfluency by mothers of stuttering and normally fluent children.口吃儿童和正常流利儿童的母亲对口吃不流畅的判断。
J Speech Hear Res. 1989 Sep;32(3):625-34. doi: 10.1044/jshr.3203.625.
8
Gaze aversion to stuttered speech: a pilot study investigating differential visual attention to stuttered and fluent speech.对口吃言语的回避注视:一项对视口吃和流畅言语的差异视觉注意的初步研究。
Int J Lang Commun Disord. 2010 Mar-Apr;45(2):133-44. doi: 10.3109/13682820902763951.
9
Emotional and physiological responses of fluent listeners while watching the speech of adults who stutter.流利听众在观看口吃成年人讲话时的情绪和生理反应。
Int J Lang Commun Disord. 2007 Mar-Apr;42(2):113-29. doi: 10.1080/10610270600850036.
10
Disfluency clusters of children who stutter: relation of stutterings to self-repairs.口吃儿童的不流畅性集群:口吃与自我修正的关系。
J Speech Hear Res. 1995 Oct;38(5):965-77. doi: 10.1044/jshr.3805.965.

引用本文的文献

1
Automated Stuttering Detection Using Deep Learning Techniques.使用深度学习技术的自动口吃检测
J Clin Med. 2025 May 19;14(10):3552. doi: 10.3390/jcm14103552.
2
Identification of the Biomechanical Response of the Muscles That Contract the Most during Disfluencies in Stuttered Speech.口吃言语不流畅时收缩最强烈的肌肉的生物力学反应的识别。
Sensors (Basel). 2024 Apr 20;24(8):2629. doi: 10.3390/s24082629.

本文引用的文献

1
Scientists, society, and stuttering.科学家、社会与口吃
Int J Clin Pract. 2020 Nov;74(11):e13678. doi: 10.1111/ijcp.13678. Epub 2020 Sep 7.
2
A Novel Stuttering Disfluency Classification System Based on Respiratory Biosignals.一种基于呼吸生物信号的新型口吃不流畅分类系统。
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:4660-4663. doi: 10.1109/EMBC.2019.8857891.
3
Management of stuttering using cognitive behavior therapy and mindfulness meditation.运用认知行为疗法和正念冥想治疗口吃
Ind Psychiatry J. 2019 Jan-Jun;28(1):4-12. doi: 10.4103/ipj.ipj_18_19. Epub 2019 Dec 11.
4
Fluency Bank: A new resource for fluency research and practice.流利度库:流利度研究与实践的新资源。
J Fluency Disord. 2018 Jun;56:69-80. doi: 10.1016/j.jfludis.2018.03.002. Epub 2018 Mar 29.
5
Epidemiology of stuttering: 21st century advances.口吃的流行病学:21 世纪的进展。
J Fluency Disord. 2013 Jun;38(2):66-87. doi: 10.1016/j.jfludis.2012.11.002. Epub 2012 Nov 27.
6
The University College London Archive of Stuttered Speech (UCLASS).伦敦大学学院口吃语音档案库(UCLASS)。
J Speech Lang Hear Res. 2009 Apr;52(2):556-69. doi: 10.1044/1092-4388(07-0129).
7
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.