University of Bristol Veterinary School, Langford, Somerset BS40 5DU, UK.
Newcastle University, School of Agriculture Food and Rural Development, Newcastle upon Tyne NE1 7RU, UK.
Vet J. 2011 Nov;190(2):e100-e109. doi: 10.1016/j.tvjl.2011.01.012. Epub 2011 Mar 4.
Consistency of assessment is essential to the farm assurance process. This study evaluated the inter-observer reliability of 31 farm assurance assessors, six veterinarians and four researchers for five pig welfare outcome measures proposed for inclusion into the UK pig farm assurance schemes. These were (1) tail lesions, (2) body lesions, (3) lameness, (4) pigs requiring hospitalisation and (5) oral behaviour. The following inter-observer reliability testing methods against a gold standard Trainer were used: a comparison of farm prevalence and the numbers of affected pigs in each pen identified by observers, Cohen's kappa (κ), Kendall's W, proportional agreement, sensitivity, and specificity. All measures achieved potentially high levels of inter-observer reliability and it was concluded that none should be excluded from farm assurance at this stage. However, across all the measures, 45% of observers did not record an overall farm prevalence 'close' to that of the gold standard Trainer. With the level of training and testing that took place in this study there would be a danger of significant bias occurring in a national assessment scheme. The data collected enabled some comparison of the methods used to assess inter-observer reliability. It is suggested that when the aim is to achieve agreement between observers on the overall farm prevalence the inter-observer reliability testing should focus on the closeness of the overall farm prevalence recorded by observers, but that other types of analysis may be helpful during training.
评估的一致性对于农场保证过程至关重要。本研究评估了 31 名农场保证评估员、6 名兽医和 4 名研究人员对五项拟纳入英国猪农场保证计划的猪福利结果衡量标准的观察者间可靠性。这些衡量标准包括:(1)尾巴损伤,(2)身体损伤,(3)跛行,(4)需要住院的猪,以及(5)口腔行为。使用以下针对黄金标准训练员的观察者间可靠性测试方法进行了比较:观察者识别的农场流行率和每栏受影响猪的数量的比较、科恩氏kappa(κ)、肯德尔氏 W、比例一致性、敏感性和特异性。所有衡量标准都达到了观察者间可靠性的潜在高水平,因此在此阶段不应将任何衡量标准排除在农场保证之外。然而,在所有措施中,45%的观察者没有记录到与黄金标准训练员“接近”的总体农场流行率。在本研究中进行的培训和测试水平下,在全国评估计划中可能会出现重大偏差的危险。所收集的数据使我们能够对用于评估观察者间可靠性的方法进行一些比较。建议当目标是在观察者之间就总体农场流行率达成一致时,观察者间可靠性测试应侧重于观察者记录的总体农场流行率的接近程度,但在培训期间可能需要进行其他类型的分析。