From the Image Sciences Institute (S.G.M.v.V., N.L., M.A.V., I.I.), Departments of Radiology (B.K.V., T.L., P.A.d.J., W.B.V.), Experimental Cardiology (I.E.M.B.), and Radiotherapy (D.H.J.G.v.d.B.), and Imaging Division (H.M.V.), University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, the Netherlands; Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands (N.L.); Departments of Biomedical Engineering and Physics (S.G.M.v.V., I.I.) and Radiology and Nuclear Medicine (I.I.), and Amsterdam Cardiovascular Sciences (I.I.), Amsterdam University Medical Center, University of Amsterdam, the Netherlands; Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands (N.L.); Department of Cardiology, Meander Medical Center, Amersfoort, the Netherlands (I.E.M.B.); Department of Medicine, University of Mississippi Medical Center, Jackson, Miss (A.C.); and Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tenn (J.G.T., J.J.C.).
Radiology. 2020 Apr;295(1):66-79. doi: 10.1148/radiol.2020191621. Epub 2020 Feb 11.
Background Although several deep learning (DL) calcium scoring methods have achieved excellent performance for specific CT protocols, their performance in a range of CT examination types is unknown. Purpose To evaluate the performance of a DL method for automatic calcium scoring across a wide range of CT examination types and to investigate whether the method can adapt to different types of CT examinations when representative images are added to the existing training data set. Materials and Methods The study included 7240 participants who underwent various types of nonenhanced CT examinations that included the heart: coronary artery calcium (CAC) scoring CT, diagnostic CT of the chest, PET attenuation correction CT, radiation therapy treatment planning CT, CAC screening CT, and low-dose CT of the chest. CAC and thoracic aorta calcification (TAC) were quantified using a convolutional neural network trained with 1181 low-dose chest CT examinations (baseline), a small set of examinations of the respective type supplemented to the baseline (data specific), and a combination of examinations of all available types (combined). Supplemental training sets contained 199-568 CT images depending on the calcium burden of each population. The DL algorithm performance was evaluated with intraclass correlation coefficients (ICCs) between DL and manual (Agatston) CAC and (volume) TAC scoring and with linearly weighted κ values for cardiovascular risk categories (Agatston score; cardiovascular disease risk categories: 0, 1-10, 11-100, 101-400, >400). Results At baseline, the DL algorithm yielded ICCs of 0.79-0.97 for CAC and 0.66-0.98 for TAC across the range of different types of CT examinations. ICCs improved to 0.84-0.99 (CAC) and 0.92-0.99 (TAC) for CT protocol-specific training and to 0.85-0.99 (CAC) and 0.96-0.99 (TAC) for combined training. For assignment of cardiovascular disease risk category, the κ value for all test CT scans was 0.90 (95% confidence interval [CI]: 0.89, 0.91) for the baseline training. It increased to 0.92 (95% CI: 0.91, 0.93) for both data-specific and combined training. Conclusion A deep learning calcium scoring algorithm for quantification of coronary and thoracic calcium was robust, despite substantial differences in CT protocol and variations in subject population. Augmenting the algorithm training with CT protocol-specific images further improved algorithm performance. © RSNA, 2020 See also the editorial by Vannier in this issue.
背景 尽管有几种深度学习(DL)钙评分方法在特定 CT 方案中取得了优异的性能,但它们在各种 CT 检查类型中的性能尚不清楚。目的 评估一种用于自动钙评分的 DL 方法在广泛的 CT 检查类型中的性能,并研究当在现有训练数据集添加有代表性的图像时,该方法是否可以适应不同类型的 CT 检查。
材料与方法 该研究纳入了 7240 名接受各种非增强 CT 检查的患者,这些检查包括心脏:冠状动脉钙(CAC)评分 CT、胸部诊断 CT、正电子发射断层扫描衰减校正 CT、放射治疗计划 CT、CAC 筛查 CT 和胸部低剂量 CT。使用经过 1181 例胸部低剂量 CT 检查(基线)训练的卷积神经网络对 CAC 和胸主动脉钙化(TAC)进行定量,将特定类型的小部分检查补充到基线(特定数据)中,并结合所有可用类型的检查(综合)。补充训练集根据每个人群的 CAC 负担包含 199-568 张 CT 图像。通过 DL 和手动(Agatston)CAC 和(体积)TAC 评分之间的组内相关系数(ICC)以及心血管风险类别(Agatston 评分;心血管疾病风险类别:0、1-10、11-100、101-400、>400)的线性加权κ 值来评估 DL 算法的性能。
结果 在基线时,对于不同类型的 CT 检查,DL 算法对 CAC 的 ICC 为 0.79-0.97,对 TAC 的 ICC 为 0.66-0.98。对于特定 CT 方案的训练,ICC 提高至 0.84-0.99(CAC)和 0.92-0.99(TAC),对于综合训练,ICC 提高至 0.85-0.99(CAC)和 0.96-0.99(TAC)。对于所有测试 CT 扫描,心血管疾病风险类别的κ 值为 0.90(95%置信区间[CI]:0.89,0.91),用于基线训练。对于特定数据和综合训练,该值增加到 0.92(95%CI:0.91,0.93)。
结论 尽管 CT 方案存在显著差异,患者人群也存在差异,但用于量化冠状动脉和胸主动脉钙的深度学习钙评分算法具有稳健性。通过使用特定 CT 方案的图像来增强算法训练,进一步提高了算法性能。
© 2020 RSNA. 本期亦见 Vannier 编辑述评。