Abdulrahman Ayub Othman, Othman Shanga Ismail, Yasin Gazo Badran, Ali Meer Salam
Department of Computer Science, College of Science, University of Halabja, Kurdistan Region, F.R., Halabja, Iraq.
Data Brief. 2025 Jun 24;61:111826. doi: 10.1016/j.dib.2025.111826. eCollection 2025 Aug.
Speech is the most fundamental and sophisticated channel of human communication, and breakthroughs in Natural Language Processing (NLP) have substantially raised the quality of human-computer interaction. In particular, new wave of deep learning methods have significantly advanced human speech recognition by obtaining fine-grained acoustic cues including pitch, an acoustic feature that can be a critical ingredient in understanding communicative intent. Pitch variation is in particular important for prosodic classification tasks (i.e., statements, questions, and exclamations), which is crucial in tonal and low resource languages such as Kurdish, where intonation holds significant semantic information. This paper presents the dataset of the Statements, Questions, or Exclamations Based on Sound Pitch (SQEBSP) which contains 12,660 professionally-recorded speech audio clips by 431 native Kurdish speakers who reside in the Kurdistan Region of Iraq. Regarding utterances, 10 new phrases were articulated by each speaker per three prosodic categories: statements, questions, and exclamations. All utterances were digitized at 16 kHz and then manually checked for correctness concerning pitch-based classification. The dataset contains equal representation from all three classes, about 4200 samples per class, and metadata such as speaker gender, age group, and sentence identifiers. The original audio files, alongside resources like Mel-Frequency Cepstral Coefficients (MFCCs) and waveform visualizations, can be found on Mendeley Data. The dataset offered has significant advantages for formulating and testing pitch-based speech classification algorithms, furthers the work on pronunciation modelling for languages lacking sufficient resources. It furthermore, aids in developing speech technologies sensitive to dialects.
言语是人类交流最基本、最复杂的渠道,自然语言处理(NLP)的突破极大地提高了人机交互的质量。特别是,新一轮的深度学习方法通过获取包括音高在内的细粒度声学线索,显著推进了人类语音识别,音高是一种声学特征,可能是理解交流意图的关键因素。音高变化对于韵律分类任务(即陈述句、疑问句和感叹句)尤为重要,这在库尔德语等声调语言和资源匮乏的语言中至关重要,在这些语言中语调包含重要的语义信息。本文介绍了基于音高的陈述句、疑问句或感叹句数据集(SQEBSP),该数据集包含居住在伊拉克库尔德地区的431名库尔德语母语者专业录制的12660个语音音频片段。关于话语,每位说话者针对陈述句、疑问句和感叹句这三个韵律类别,每个类别说出10个新短语。所有话语均以16 kHz进行数字化处理,然后手动检查基于音高分类的正确性。该数据集包含所有三个类别的均等样本,每个类别约4200个样本,以及诸如说话者性别、年龄组和句子标识符等元数据。原始音频文件以及诸如梅尔频率倒谱系数(MFCC)和波形可视化等资源可在Mendeley Data上找到。所提供的数据集对于制定和测试基于音高的语音分类算法具有显著优势,推动了缺乏足够资源的语言的发音建模工作。此外,它有助于开发对方言敏感的语音技术。