Pommée Timothy, Renaud Sara-Eve, Verduyckt Ingrid
École d'orthophonie et d'audiologie, Faculté de médecine, Université de Montréal, Montréal, Québec, Canada.
École d'orthophonie et d'audiologie, Faculté de médecine, Université de Montréal, Montréal, Québec, Canada.
J Voice. 2025 Mar 4. doi: 10.1016/j.jvoice.2025.02.020.
This study aimed to evaluate the inter- and intra-rater reliability of consensus auditory-perceptual evaluation of voice (CAPE-V) auditory-perceptual ratings and explore task-specific differences (sustained vowels versus sentences) in ratings and reliability.
Cross-sectional reliability study using a curated subset of dysphonic voice samples (PVQD).
Thirty voice samples representing varying dysphonia severities were selected from the Perceptual Voice Qualities Database. Eight Quebecois speech-language pathologists (SLPs) rated the samples using the CAPE-V protocol on the Bridge2Practice platform. Ratings included six vocal features on a visual analog scale (VAS) and binary consistency (C/I) judgments. Reliability was assessed using intra-class correlation coefficients (ICCs) for VAS ratings and Gwet's AC1 for C/I ratings. Task effects were analyzed using Wilcoxon signed-rank tests and Spearman correlations.
Overall severity ratings demonstrated good inter-rater reliability for both vowels (ICC = 0.79) and sentences (ICC = 0.87). Pitch and loudness ratings showed low inter-rater reliability (ICCs < 0.5) across tasks. Vowels were rated as more impaired for most features, except strain, which showed higher impairment on sentences. Inter-rater reliability was higher for roughness and breathiness on vowels, whereas strain showed better reliability on sentences. Intra-rater reliability was consistently higher on sentences for all features (ICCs > 0.75 for most). Consistency ratings were more reliable on vowels than sentences for most features, except loudness.
Task type significantly impacts CAPE-V ratings and their reliability. Vowels provided higher inter-rater reliability for roughness and breathiness, while sentences yielded better intra-rater consistency and strain reliability. These findings highlight the need for ongoing refinement of assessment tools and training protocols to ensure accurate and reliable voice evaluations.
本研究旨在评估嗓音共识听觉感知评估(CAPE-V)听觉感知评分的评分者间和评分者内信度,并探讨评分及信度方面的特定任务差异(持续元音与句子)。
采用精选的发声障碍嗓音样本子集(PVQD)进行横断面信度研究。
从感知嗓音质量数据库中选取30个代表不同发声障碍严重程度的嗓音样本。8名魁北克言语语言病理学家(SLP)在Bridge2Practice平台上使用CAPE-V方案对样本进行评分。评分包括视觉模拟量表(VAS)上的六个嗓音特征以及二元一致性(C/I)判断。使用组内相关系数(ICC)评估VAS评分的信度,使用Gwet's AC1评估C/I评分的信度。使用Wilcoxon符号秩检验和Spearman相关性分析任务效应。
总体严重程度评分显示,元音(ICC = 0.79)和句子(ICC = 0.87)的评分者间信度均良好。音高和响度评分在各任务中的评分者间信度较低(ICC < 0.5)。除紧张度外,大多数特征的元音评分显示受损程度更高,紧张度在句子中显示出更高的受损程度。元音粗糙度和呼吸音的评分者间信度更高,而紧张度在句子中显示出更好的信度。所有特征在句子上的评分者内信度始终更高(大多数ICC > 0.75)。除响度外,大多数特征的一致性评分在元音上比在句子上更可靠。
任务类型对CAPE-V评分及其信度有显著影响。元音在粗糙度和呼吸音方面提供了更高的评分者间信度,而句子产生了更好的数据录入员内一致性和紧张度信度。这些发现突出了持续完善评估工具和培训方案以确保准确可靠的嗓音评估的必要性。