From the Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Singapore 119074 (J.T.P.D.H., A.M., Y.L.T., S.L., Y.S.C., S.E.E., S.T.Q.); Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore (J.T.P.D.H., A.M., Y.L.T., S.L., Y.S.C., S.E.E., S.T.Q.); NUS Graduate School, Integrative Sciences and Engineering Programme, National University of Singapore, Singapore (L.Z.); Department of Computer Science, School of Computing, National University of Singapore, Singapore (K.Y., B.C.O.); Department of Radiology, Dammam Medical Complex, Dammam, Saudi Arabia (D.A.R.A.); Biostatistics Unit, Yong Loo Lin School of Medicine, Singapore (Q.V.Y., Y.H.C.); University Spine Centre, Department of Orthopaedic Surgery, National University Health System, Singapore (J.H.T., N.K.); and Department of Radiological Sciences, University of California, Irvine, Orange, Calif (H.Y.).
Radiology. 2021 Jul;300(1):130-138. doi: 10.1148/radiol.2021204289. Epub 2021 May 11.
Background Assessment of lumbar spinal stenosis at MRI is repetitive and time consuming. Deep learning (DL) could improve -productivity and the consistency of reporting. Purpose To develop a DL model for automated detection and classification of lumbar central canal, lateral recess, and neural -foraminal stenosis. Materials and Methods In this retrospective study, lumbar spine MRI scans obtained from September 2015 to September 2018 were included. Studies of patients with spinal instrumentation or studies with suboptimal image quality, as well as postgadolinium studies and studies of patients with scoliosis, were excluded. Axial T2-weighted and sagittal T1-weighted images were used. Studies were split into an internal training set (80%), validation set (9%), and test set (11%). Training data were labeled by four radiologists using predefined gradings (normal, mild, moderate, and severe). A two-component DL model was developed. First, a convolutional neural network (CNN) was trained to detect the region of interest (ROI), with a second CNN for classification. An internal test set was labeled by a musculoskeletal radiologist with 31 years of experience (reference standard) and two subspecialist radiologists (radiologist 1: A.M., 5 years of experience; radiologist 2: J.T.P.D.H., 9 years of experience). DL model performance on an external test set was evaluated. Detection recall (in percentage), interrater agreement (Gwet κ), sensitivity, and specificity were calculated. Results Overall, 446 MRI lumbar spine studies were analyzed (446 patients; mean age ± standard deviation, 52 years ± 19; 240 women), with 396 patients in the training (80%) and validation (9%) sets and 50 (11%) in the internal test set. For internal testing, DL model and radiologist central canal recall were greater than 99%, with reduced neural foramina recall for the DL model (84.5%) and radiologist 1 (83.9%) compared with radiologist 2 (97.1%) ( < .001). For internal testing, dichotomous classification (normal or mild vs moderate or severe) showed almost-perfect agreement for both radiologists and the DL model, with respective κ values of 0.98, 0.98, and 0.96 for the central canal; 0.92, 0.95, and 0.92 for lateral recesses; and 0.94, 0.95, and 0.89 for neural foramina ( < .001). External testing with 100 MRI scans of lumbar spines showed almost perfect agreement for the DL model for dichotomous classification of all ROIs (κ, 0.95-0.96; < .001). Conclusion A deep learning model showed comparable agreement with subspecialist radiologists for detection and classification of central canal and lateral recess stenosis, with slightly lower agreement for neural foraminal stenosis at lumbar spine MRI. © RSNA, 2021 See also the editorial by Hayashi in this issue.
背景 磁共振成像(MRI)下对腰椎管狭窄症的评估是重复性的且耗时的。深度学习(DL)可以提高生产力和报告的一致性。
目的 开发一种用于自动检测和分类腰椎中央管、侧隐窝和神经-椎间孔狭窄的 DL 模型。
材料与方法 本回顾性研究纳入了 2015 年 9 月至 2018 年 9 月获得的腰椎 MRI 扫描。排除了脊柱内固定患者的研究、图像质量不佳的研究、钆后研究以及脊柱侧弯患者的研究。使用轴向 T2 加权和矢状 T1 加权图像。研究分为内部训练集(80%)、验证集(9%)和测试集(11%)。训练数据由四位放射科医生使用预定义的分级(正常、轻度、中度和重度)进行标记。开发了一个由两部分组成的深度学习模型。首先,使用卷积神经网络(CNN)来检测感兴趣区域(ROI),然后使用第二个 CNN 进行分类。由一位有 31 年经验的肌肉骨骼放射科医生(参考标准)和两位放射科专家(放射科医生 1:A.M.,5 年经验;放射科医生 2:J.T.P.D.H.,9 年经验)对内部测试集进行标记。评估了外部测试集上的 DL 模型性能。计算检测召回率(百分比)、组内一致性(Gwet κ)、敏感性和特异性。
结果 共分析了 446 项腰椎 MRI 研究(446 例患者;平均年龄±标准差,52 岁±19 岁;240 名女性),其中 396 例患者在训练(80%)和验证(9%)集中,50 例(11%)在内部测试集中。对于内部测试,DL 模型和放射科医生的中央管召回率均大于 99%,但 DL 模型和放射科医生 1 的神经孔召回率均低于放射科医生 2(分别为 84.5%和 83.9%)(<.001)。对于内部测试,二分类(正常或轻度与中度或重度)对于放射科医生和 DL 模型都显示出几乎完美的一致性,相应的 κ 值分别为 0.98、0.98 和 0.96 用于中央管;0.92、0.95 和 0.92 用于侧隐窝;0.94、0.95 和 0.89 用于神经孔(<.001)。在对 100 项腰椎 MRI 进行外部测试时,DL 模型对所有 ROI 的二分类显示出与放射科专家几乎完全一致的结果(κ 值为 0.95-0.96;<.001)。
结论 深度学习模型在检测和分类中央管和侧隐窝狭窄方面与放射科专家具有相当的一致性,对于腰椎 MRI 的神经孔狭窄的一致性略低。