Federau Christian, Christensen Soren, Scherrer Nino, Ospel Johanna M, Schulze-Zachau Victor, Schmidt Noemi, Breit Hanns-Christian, Maclaren Julian, Lansberg Maarten, Kozerke Sebastian
Institute for Biomedical Engineering, ETH Zürich und University of Zürich, Gloriastrasse 35, 8092 Zürich, Switzerland (C.F., N. Scherrer, S.K.); Stanford Stroke Center, Department of Neurology, Stanford University, Stanford, Calif (S.C., J.M., M.L.); and Division of Diagnostic and Interventional Neuroradiology, Department of Radiology, University Hospital Basel, Basel, Switzerland (J.O., V.S.Z., N. Schmidt, H.C.B.).
Radiol Artif Intell. 2020 Sep 16;2(5):e190217. doi: 10.1148/ryai.2020190217. eCollection 2020 Sep.
To compare the segmentation and detection performance of a deep learning model trained on a database of human-labeled clinical stroke lesions on diffusion-weighted (DW) images to a model trained on the same database enhanced with synthetic stroke lesions.
In this institutional review board-approved study, a stroke database of 962 cases (mean patient age ± standard deviation, 65 years ± 17; 255 male patients; 449 scans with DW positive stroke lesions) and a normal database of 2027 patients (mean age, 38 years ± 24; 1088 female patients) were used. Brain volumes with synthetic stroke lesions on DW images were produced by warping the relative signal increase of real strokes to normal brain volumes. A generic three-dimensional (3D) U-Net was trained on four different databases to generate four different models: 375 neuroradiologist-labeled clinical DW positive stroke cases (CDB); 2000 synthetic cases (S2DB); CDB plus 2000 synthetic cases (CS2DB); and CDB plus 40 000 synthetic cases (CS40DB). The models were tested on 20% ( = 192) of the cases of the stroke database, which were excluded from the training set. Segmentation accuracy was characterized using Dice score and lesion volume of the stroke segmentation, and statistical significance was tested using a paired two-tailed Student test. Detection sensitivity and specificity were compared with labeling done by three neuroradiologists.
The performance of the 3D U-Net model trained on the CS40DB (mean Dice score, 0.72) was better than models trained on the CS2DB (Dice score, 0.70; < .001) or the CDB (Dice score, 0.65; < .001). The deep learning model (CS40DB) was also more sensitive (91% [95% confidence interval {CI}: 89%, 93%]) than each of the three human readers (human reader 3, 84% [95% CI: 81%, 87%]; human reader 1, 78% [95% CI: 75%, 81%]; human reader 2, 79% [95% CI: 76%, 82%]), but was less specific (75% [95% CI: 72%, 78%]) than each of the three human readers (human reader 3, 96% [95% CI: 94%, 98%]; human reader 1, 92% [95% CI: 90%, 94%]; human reader 2, 89% [95% CI: 86%, 91%]).
Deep learning training for segmentation and detection of stroke lesions on DW images was significantly improved by enhancing the training set with synthetic lesions.© RSNA, 2020.
比较在人类标记的扩散加权(DW)图像上的临床中风病变数据库上训练的深度学习模型与在同一数据库上用合成中风病变增强训练的模型的分割和检测性能。
在这项经机构审查委员会批准的研究中,使用了一个包含962例病例的中风数据库(患者平均年龄±标准差,65岁±17岁;男性患者255例;449次扫描有DW阳性中风病变)和一个包含2027例患者的正常数据库(平均年龄,38岁±24岁;女性患者1088例)。通过将真实中风的相对信号增加扭曲到正常脑体积来生成具有DW图像上合成中风病变的脑体积。在四个不同的数据库上训练一个通用的三维(3D)U-Net,以生成四个不同的模型:375例由神经放射科医生标记的临床DW阳性中风病例(CDB);2000例合成病例(S2DB);CDB加2000例合成病例(CS2DB);以及CDB加40000例合成病例(CS40DB)。这些模型在中风数据库中20%(=192例)的病例上进行测试,这些病例被排除在训练集之外。使用Dice分数和中风分割的病变体积来表征分割准确性,并使用配对双尾学生检验来测试统计学意义。将检测敏感性和特异性与三位神经放射科医生的标记结果进行比较。
在CS40DB上训练的3D U-Net模型的性能(平均Dice分数,0.72)优于在CS2DB(Dice分数,0.70;P<0.001)或CDB(Dice分数,0.65;P<0.001)上训练的模型。深度学习模型(CS40DB)也比三位人类读者中的每一位都更敏感(91%[95%置信区间{CI}:89%,93%])(人类读者3,84%[95%CI:81%,87%];人类读者1,78%[95%CI:75%,81%];人类读者2,79%[95%CI:76%,82%]),但特异性(75%[95%CI:72%,78%])低于三位人类读者中的每一位(人类读者3,96%[95%CI:94%,98%];人类读者1,92%[95%CI:90%,94%];人类读者2,89%[95%CI:86%,91%])。
通过用合成病变增强训练集,显著提高了DW图像上中风病变分割和检测的深度学习训练。©RSNA,2020。