Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland.
Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland.
Eur J Radiol. 2019 May;114:45-50. doi: 10.1016/j.ejrad.2019.02.023. Epub 2019 Feb 19.
To investigate and compare the reproducibility and accuracy of qualitative ratings and quantitative texture analysis (TA) in detection and grading of lumbar spinal stenosis (LSS) in magnetic resonance imaging (MR) scans of the lumbar spine.
From a nationwide multicenter and multidisciplinary lumbar stenosis outcome study (LSOS) register 82 patients, undergoing MR scans of the lumbar spine due to clinical indication of spinal claudication, with a single level central or lateral severe LSS were included. In total 343 transaxial T2-weighted images of the lumbar spine were included from one to five levels (L1 to S1) per patient. One expert radiologist serving as reference standard rated LSS grade according to a standard four-point (normal to severe) as well as to an eight-point Schizas grading scale. DICOM data were then rescaled to a defined pixel size. Two independent readers performed qualitative ratings analogous to expert reader in addition to TA of spinal canals by manually placing two regions of interest (ROI) per image reflecting qualitative scales: (1) dural sac only (2) inner contour of the spinal canal including epidural fat and bilateral recesses. Interreader agreements of qualitative and quantitative parameters were assessed by Cohen's Kappa (κ) and intraclass correlation (ICC), respectively. TA feature reduction was performed by ICC threshold > 0.75. Remaining features were analyzed with machine learning algorithms (Weka 3 tool) for correlation with LSS grades using 10-fold cross validation.
Qualitative ratings showed only moderate reproducibility for both LSS classification systems but high correlation with cut-off cross-sectional area (CSA) <130mm² for severe spinal stenosis. In quantitative TA of both ROIs, machine learning analysis with a decision tree classifier revealed higher performances for LSS grading compared to qualitative assessments using the reference CSA cut-off, respectively.
Qualitative LSS grading independent of classification system shows moderate reproducibility. TA with machine learning offers highly reproducible quantitative parameters that increase accuracy for severe LSS detection with minor impact of grading score and CSA border definition.
研究并比较定性评分和定量纹理分析(TA)在磁共振成像(MR)腰椎扫描中检测和分级腰椎椎管狭窄(LSS)的再现性和准确性。
从全国多中心、多学科腰椎狭窄症结局研究(LSOS)登记处,共纳入 82 例因脊柱跛行的临床指征而行腰椎 MR 扫描的患者,这些患者存在单一水平的中央或侧方严重 LSS。每位患者的 1 至 5 个水平(L1 至 S1)共纳入了 343 张腰椎横轴 T2 加权图像。一位专家放射科医师作为参考标准,根据标准的四级(正常至严重)和八级 Schizas 分级量表对 LSS 分级进行评分。DICOM 数据随后按定义的像素大小进行重新缩放。两位独立的读者进行了类似于专家读者的定性评分,此外还对椎管进行了 TA,通过手动放置每个图像的两个感兴趣区域(ROI)来进行:(1)仅硬脊膜囊;(2)椎管的内轮廓,包括硬膜外脂肪和双侧隐窝。通过 Cohen's Kappa(κ)和组内相关系数(ICC)评估定性和定量参数的读者间一致性。通过 ICC 阈值>0.75 进行 TA 特征降维。使用 10 折交叉验证,使用机器学习算法(Weka 3 工具)分析剩余特征与 LSS 分级的相关性。
两种 LSS 分类系统的定性评分仅显示出中等的再现性,但与严重椎管狭窄的横截面积(CSA)<130mm²的截断值高度相关。在两个 ROI 的定量 TA 中,使用决策树分类器的机器学习分析显示,与使用参考 CSA 截断值的定性评估相比,LSS 分级的性能更高。
独立于分类系统的 LSS 分级定性评估显示出中等的再现性。使用机器学习的 TA 提供了高度可重复的定量参数,可提高严重 LSS 检测的准确性,同时对分级评分和 CSA 边界定义的影响较小。