Department of Neurosurgery and.
J Neurosurg. 2014 May;120(5):1179-87. doi: 10.3171/2014.2.JNS131262. Epub 2014 Mar 14.
The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice.
Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater.
Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters.
The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.
本研究旨在使用当前成像模式,在接近常规临床实践的环境中,检查经常使用的动静脉畸形(AVM)分级量表的观察者可靠性,包括 5 级 Spetzler-Martin 量表、3 级 Spetzler-Ponce 量表和 Pollock-Flickinger 基于放射外科的量表。
5 名经验丰富的评估者,包括 1 名血管神经外科医生、2 名神经放射科医生和 2 名高级神经外科住院医师,分别独立评估了 15 例 MRI 研究、15 例 CT 血管造影和 15 例数字减影血管造影的初始诊断。对每个成像模式的 5 次扫描进行了重复性评估,以测量内部评估者的可靠性。在初始评估 3 个月后,评估者重新评估了那些存在分歧的扫描。在第二次评估中,评估者被要求用评论和插图来证明他们的评级。采用多位评估者的广义 Kappa(κ)分析、Kendall 一致性系数(W)和组内相关系数(ICC)来确定评估者之间的可靠性。对于内部评估者可靠性分析,使用 Cohen's kappa(κ)、Kendall 相关系数(tau-b)和 ICC 来评估每位评估者的重复测量一致性。
在初始评估时,整体 5 级 Spetzler-Martin 量表的评估者间可靠性为中等至良好(ICC=0.69)至极强(Kendall's W=0.73),在重新评估时有所提高。CT 血管造影的评估结果具有最高的一致性,其次是 MRI 和数字减影血管造影。整体 3 级 Spetzler-Ponce 分级的评估者间可靠性在初始评估时为中等至良好(ICC=0.68)至强(Kendall's W=0.70),在重新评估时有所提高,且与 5 级 Spetzler-Martin 量表的一致性相当。整体 Pollock-Flickinger 基于放射外科的分级的评估者间可靠性为极好(ICC=0.89)至极强(Kendall's W=0.81)。整体 5 级 Spetzler-Martin 分级的内部评估者可靠性在 5 名评估者中的 3 名中为极好(ICC>0.75),在另外 2 名中为中等至良好(ICC>0.40)。
5 级 Spetzler-Martin 量表、3 级 Spetzler-Ponce 量表和 Pollock-Flickinger 基于放射外科的量表均显示出高度的一致性。重新评估时可靠性的提高是由初始评估的培训效果和为评级辩护的要求解释的,这突出了作为常规临床实践一部分的用于科学目的的分级的潜在缺点。