Gilles F H, Sobel E L, Leviton A, Tavaré C J, Hedley-Whyte E T
Department of Pathology, Childrens Hospital Los Angeles, CA 90027.
J Neuropathol Exp Neurol. 1994 Nov;53(6):559-71. doi: 10.1097/00005072-199411000-00003.
We studied intraobserver reproducibility in recognizing the presence or absence of 57 histologic feature or patterns in a random subset of tumors (822) from the Childhood Brain Tumor Consortium database. The study protocol maximized consistency of the observer. We found that only six histologic features had high (> or = 0.75) reliability estimates while a large number had intermediate estimates of 0.50-0.74. Supratentorial or infratentorial tumor location sometimes altered reliability. Reliability estimates were unacceptable for certain histologic features often used as diagnostic criteria, descriptors of tumor characteristics, or markers of anaplasia. We hypothesize that low reliability reflects, in part, the need for more specific operational definitions, particularly those with subjective boundaries (e.g. granular bodies) may also contribute to low reliability. We also show that the kappa statistic, a commonly used measure of reliability, is inappropriate for very common or uncommon histologic features (e.g. features at the extremes of prevalence in the study cases) and we offer a simple empiric method for determining when an alternative measure, the Jaccard statistic, is appropriate.
我们在儿童脑肿瘤联盟数据库中随机抽取的肿瘤子集(822个)中,研究了观察者在识别57种组织学特征或模式存在与否时的内部观察者再现性。研究方案使观察者的一致性最大化。我们发现,只有六种组织学特征具有较高(≥0.75)的可靠性估计值,而大量特征的可靠性估计值处于中等水平(0.50 - 0.74)。幕上或幕下肿瘤位置有时会改变可靠性。对于某些常被用作诊断标准、肿瘤特征描述或间变标志物的组织学特征,可靠性估计值是不可接受的。我们推测,低可靠性部分反映了对更具体操作定义的需求,特别是那些具有主观界限的特征(如颗粒体)也可能导致低可靠性。我们还表明,kappa统计量(一种常用的可靠性度量)对于非常常见或不常见的组织学特征(如研究病例中患病率处于极端水平的特征)并不适用,并且我们提供了一种简单的经验方法来确定何时使用替代度量——Jaccard统计量是合适的