Department of Communication Sciences & Disorders, Bloomsburg University of Pennsylvania, Bloomsburg, Pennsylvania.
Department of Statistics, Penn State University, University Park, Pennsylvania.
J Voice. 2020 Jan;34(1):9-19. doi: 10.1016/j.jvoice.2018.07.003. Epub 2018 Nov 1.
The objective of this study was to investigate the ability of a two-stage method of cepstral peak identification to effectively discriminate rough vs breathy vs typical voice in sustained vowel productions. It was hypothesized that a dual-stage search for cepstral peak prominences (CPP's) above and below specified quefrency/F cutoffs would result in a CPP difference that would be characteristic of the rough, diplophonic voice type.
Central one-second portions of sustained vowel /a/ productions were obtained from 90 subjects (rough, breathy, and normophonic voices). All voice samples were analyzed using a a two-stage cepstral analysis process in which a CPP difference value was obtained by identifying cepstral peaks above and below a lower limit for expected F (150 Hz for females and 90 Hz for males), called CPP and CPP respectively.
The CPP difference value was observed to be a highly significant predictor, with negative values for this parameter characteristic of a dominant subharmonic in the voice signal and the perception of diplophonic, rough voice. Correct classification of rough vs nonrough voice samples was 82.2% (sensitivity 0.80 and specificity 0.833). In the consideration of three group classification (breathy vs. normophonic vs. rough), models incorporating two predictors (the CPP obtained from a single search through a 60 to 300 Hz frequency range (CPP) and the CPP difference value) correctly classified 78.88% of the voice samples.
Rough, diplophonic voices were consistently observed to have a subharmonic peak that was greater in amplitude than the cepstral peak obtained within the region of the expected F, resulting in a negative value for the CPP difference. The two-stage cepstral analysis process described herein is visually intuitive from the graphical display of a cepstrum and is a simple extended calculation derived from cepstral analysis procedures that have been recommended as essential in the acoustic description of vocal quality.
本研究的目的是探讨双阶段声门波峰值识别方法在持续元音发声中有效区分粗糙声、气息声和典型声的能力。假设在指定的频率/F 截止值上下搜索声门波峰值突起(CPP)的双阶段搜索将产生一个 CPP 差异,该差异将是粗糙、双音声类型的特征。
从 90 名受试者(粗糙声、气息声和正常声)中获取持续元音/a/的中央 1 秒部分。使用双阶段声门波分析过程对所有语音样本进行分析,在该过程中,通过识别低于预期 F 的下限(女性 150Hz,男性 90Hz)的声门波峰值来获得 CPP 差异值,分别称为 CPP 和 CPP。
CPP 差异值是一个高度显著的预测因子,该参数的负值特征是语音信号中主导次谐波和双音、粗糙声音的感知。该参数正确分类粗糙声和非粗糙声样本的比例为 82.2%(灵敏度 0.80,特异性 0.833)。在考虑三组分类(气息声、正常声和粗糙声)时,包含两个预测因子(通过 60 至 300Hz 频率范围(CPP)进行单次搜索获得的 CPP 和 CPP 差异值)的模型正确分类了 78.88%的语音样本。
粗糙、双音声音始终表现出一个幅度大于预期 F 区域内获得的声门波峰值的次谐波峰值,导致 CPP 差异值为负值。本文所述的双阶段声门波分析过程从声门波谱的图形显示来看是直观的,并且是从已经推荐作为嗓音质量声学描述的关键的声门波分析过程中衍生出的简单扩展计算。