Bacchetti Peter, Boylan Ross
University of California, San Francisco, USA.
Int J Biostat. 2009;5(1):Article 5. doi: 10.2202/1557-4679.1139.
For both clinical and research purposes, biopsies are used to classify liver damage known as fibrosis on an ordinal multi-state scale ranging from no damage to cirrhosis. Misclassification can arise from reading error (misreading of a specimen) or sampling error (the specimen does not accurately represent the liver). Studies of biopsy accuracy have not attempted to synthesize these two sources of error or to estimate actual misclassification rates from either source. Using data from two studies of reading error and two of sampling error, we find surprisingly large possible misclassification rates, including a greater than 50% chance of misclassification for one intermediate stage of fibrosis. We find that some readers tend to misclassify consistently low or consistently high, and some specimens tend to be misclassified low while others tend to be misclassified high. Non-invasive measures of liver fibrosis have generally been evaluated by comparison to simultaneous biopsy results, but biopsy appears to be too unreliable to be considered a gold standard. Non-invasive measures may therefore be more useful than such comparisons suggest. Both stochastic uncertainty and uncertainty about our model assumptions appear to be substantial. Improved studies of biopsy accuracy would include large numbers of both readers and specimens, greater effort to reduce or eliminate reading error in studies of sampling error, and careful estimation of misclassification rates rather than less useful quantities such as kappa statistics.
出于临床和研究目的,活检被用于在从无损伤到肝硬化的有序多状态量表上对称为纤维化的肝损伤进行分类。错误分类可能源于读片错误(对标本的误读)或抽样错误(标本不能准确代表肝脏)。活检准确性的研究尚未尝试综合这两种错误来源,也未估计任何一种来源的实际错误分类率。利用两项关于读片错误的研究数据和两项关于抽样错误的研究数据,我们发现可能的错误分类率惊人地高,包括在纤维化的一个中间阶段错误分类的可能性超过50%。我们发现一些读者倾向于始终将结果误判为低或高,一些标本倾向于被误判为低,而另一些则倾向于被误判为高。肝纤维化的非侵入性检测通常通过与同步活检结果进行比较来评估,但活检似乎过于不可靠,不能被视为金标准。因此,非侵入性检测可能比此类比较所显示的更有用。随机不确定性和我们模型假设的不确定性似乎都很大。改进活检准确性的研究将包括大量的读者和标本,在抽样错误研究中加大减少或消除读片错误的力度,以及仔细估计错误分类率,而不是使用诸如kappa统计量等不太有用的量。