Peng Zong H, Ham Kathleen M, Ladlow Jane, Stefaniak Carrie, Jeffery Nicholas D, Thieman Mankin Kelley M
Department of Small Animal Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, USA.
Department of Small Animal Clinical Sciences, University of Florida, Gainesville, Florida, USA.
Vet Surg. 2025 Apr;54(3):573-580. doi: 10.1111/vsu.14171. Epub 2024 Oct 2.
To compare the reliability of respiratory function grading (RFG) scores assigned in-person and remotely via video and electronic stethoscope recordings, evaluated by novice and expert graders.
Prospective study.
Fifty-seven brachycephalic dogs.
Dogs were evaluated in person by expert graders and RFG scores were assigned. Audio and video recordings were made during the in-person evaluations. Four expert and four novice graders evaluated the recordings and assigned an RFG score to each dog. Agreement between in-person and remote RFG scores was assessed using Cohen's kappa statistic. Interobserver reliability was assessed using Fleiss' kappa statistic.
The median RFG score from the in-person assessment was 1 (range, 0-3). Distribution of RFG scores included 12 grade 0 scores, 19 grade 1 scores, 25 grade 2 scores, and 1 grade 3 score. The raw percentage agreements between remote and in-person scores were 68.4%, 59.6%, 64.9%, and 61.4% for the four experts, and 52.6%, 64.9%, 50.9%, and 42.1% for the four novices. Reliability between remote and in-person RFG scores was poor to moderate both for the experts (Cohen's kappa: .48, .37, .46, .41) and novices (Cohen's kappa: .28, .47, .28, .21). Interobserver reliability was moderate among the experts (Fleiss' kappa: .59) and poor among the novices (Fleiss' kappa: .39).
Remote RFG scores had poor to moderate interassessment and interobserver reliability. Novice evaluators performed worse than experts for remote or in-person RFG evaluations.
Remote RFG, as measured in this study, is not reliable for assigning RFG scores. Modifications could be made to remote evaluation to improve reliability. Based upon the performance of novice evaluators, training of evaluators is justified.
比较由新手和专家评分者通过视频和电子听诊器记录进行现场和远程评定的呼吸功能分级(RFG)分数的可靠性。
前瞻性研究。
57只短头犬。
由专家评分者对犬进行现场评估并给出RFG分数。在现场评估期间进行音频和视频记录。4名专家和4名新手评分者评估这些记录,并为每只犬给出一个RFG分数。使用科恩kappa统计量评估现场和远程RFG分数之间的一致性。使用弗莱iss kappa统计量评估观察者间的可靠性。
现场评估的RFG分数中位数为1(范围为0 - 3)。RFG分数分布包括12个0级分数、19个1级分数、25个2级分数和1个3级分数。4名专家的远程和现场分数之间的原始百分比一致性分别为68.4%、59.6%、64.9%和61.4%,4名新手的分别为52.6%、64.9%、50.9%和42.1%。专家(科恩kappa:.48、.37、.46、.41)和新手(科恩kappa:.28、.47、.28、.21)的远程和现场RFG分数之间的可靠性均为差到中等。专家之间的观察者间可靠性为中等(弗莱iss kappa:.59),新手之间为差(弗莱iss kappa:.39)。
远程RFG分数的评估间和观察者间可靠性为差到中等。在远程或现场RFG评估中,新手评估者的表现比专家差。
本研究中测量的远程RFG在评定RFG分数时不可靠。可以对远程评估进行修改以提高可靠性。基于新手评估者的表现,对评估者进行培训是合理的。