Department of Public Health and Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
J Clin Epidemiol. 2011 Jul;64(7):808-14. doi: 10.1016/j.jclinepi.2010.10.015. Epub 2011 Feb 2.
The assessment of inter- and intrarater reliability usually involves more than one level of nesting structures in the collected data, where repeated observations are made by multiple raters. Most approaches, however, are not designed to accommodate both inter- and intrarater reliability jointly, not to mention further difficulties arising when modeling with dichotomous responses. The multiple sources of dependence because of nesting structures and the existence of covariates can result in complexity in inference.
We first establish the equivalence between correlation and kappa under common positive correlation models for multiple raters and then apply a Bayesian generalized linear mixed-effects model to interpret simultaneously both types of reproducibility through different annotations of similarity. In addition to marginal correlations, the correlated random effects among raters are adopted to infer similarity between raters, whereas the correlation for random time effects may contribute to test-retest reliability.
This model accounts for individual covariates and random effects because of subjects, raters, and time, and it covers a wide variety of data structures and types. An application of endodontic radiographic examinations is illustrated.
This Bayesian hierarchical correlation model offers a wide applicability, flexibility, and feasibility in modeling inter- and intrarater reliability together.
评估组内和组间可靠性通常涉及到数据收集中存在多个嵌套结构的层次,其中多个评估者进行了重复观察。然而,大多数方法并非专门设计用于同时评估组内和组间可靠性,更不用说在使用二分类响应进行建模时会出现进一步的困难。由于嵌套结构和协变量的存在,多个来源的依赖性可能导致推断的复杂性。
我们首先在多个评估者的常见正相关模型下建立相关性和 Kappa 之间的等价性,然后应用贝叶斯广义线性混合效应模型通过相似性的不同注释来同时解释两种类型的可重复性。除了边缘相关性外,还采用评估者之间的相关随机效应来推断评估者之间的相似性,而随机时间效应的相关性可能有助于测试-再测试可靠性。
该模型考虑了个体协变量和由于受试者、评估者和时间引起的随机效应,并且涵盖了广泛的各种数据结构和类型。最后通过牙髓 X 光检查的应用来说明。
这种贝叶斯层次相关性模型在同时建模组内和组间可靠性方面具有广泛的适用性、灵活性和可行性。