School of Computing Science & Engineering, VIT Bhopal University, Sehore 466114, India.
Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 333, Taiwan.
Sensors (Basel). 2023 Sep 22;23(19):8033. doi: 10.3390/s23198033.
Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce "TranStutter", a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter's performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter's significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders.
口吃,一种普遍的神经发育障碍,严重影响流畅的言语,导致不自主的中断和反复的声音模式。本研究解决了对口吃类型进行准确分类的迫切需要。研究人员引入了“TranStutter”,这是一种开创性的基于无卷积转换器的深度学习模型,旨在擅长语音不流畅分类。与传统方法不同,TranStutter利用多头自我注意和位置编码来捕捉复杂的时间模式,从而获得更高的准确性。在这项研究中,研究人员使用了两个基准数据集:口吃事件播客数据集(SEP-28k)和流利银行面试子集。SEP-28k 由 28177 个来自播客的音频剪辑组成,这些剪辑被精心注释为不同的不流畅和非不流畅标签,包括块(BL)、延长(PR)、声音重复(SR)、单词重复(WR)和插入语(IJ)。流利银行子集包括 32 个口吃者(PWS)的 4144 个音频剪辑,提供了多样化的语音样本。TranStutter 的性能经过了严格的评估。在 SEP-28k 上,该模型达到了令人印象深刻的 88.1%的准确率。此外,在流利银行数据集上,TranStutter 的准确率为 80.6%,证明了它的有效性。这些结果突出了 TranStutter 在对口吃的诊断和治疗方面的重大潜力,从而为语音病理学和神经发育研究的发展做出了贡献。多头自我注意和位置编码的创新集成使 TranStutter 脱颖而出,使其能够以前所未有的精度识别细微的不流畅。这种新方法代表了语音病理学领域的重大飞跃,有望为口吃障碍患者提供更准确的诊断和有针对性的干预措施。