Blüthgen Christian, Becker Anton S, Vittoria de Martini Ilaria, Meier Andreas, Martini Katharina, Frauenfelder Thomas
Institute for Diagnostic and Interventional Radiology, University Hospital of Zurich, Switzerland.
Institute for Diagnostic and Interventional Radiology, University Hospital of Zurich, Switzerland.
Eur J Radiol. 2020 May;126:108925. doi: 10.1016/j.ejrad.2020.108925. Epub 2020 Mar 9.
To evaluate a deep learning based image analysis software for the detection and localization of distal radius fractures.
A deep learning system (DLS) was trained on 524 wrist radiographs (166 showing fractures). Performance was tested on internal (100 radiographs, 42 showing fractures) and external test sets (200 radiographs, 100 showing fractures). Single and combined views of the radiographs were shown to DLS and three readers. Readers were asked to indicate fracture location with regions of interest (ROI). The DLS yielded scores (range 0-1) and a heatmap. Detection performance was expressed as AUC, sensitivity and specificity at the optimal threshold and compared to radiologists' performance. Heatmaps were compared to radiologists' ROIs.
The DLS showed excellent performance on the internal test set (AUC 0.93 (95% confidence interval (CI) 0.82-0.98) - 0.96 (0.87-1.00), sensitivity 0.81 (0.58-0.95) - 0.90 (0.70-0.99), specificity 0.86 (0.68-0.96) - 1.0 (0.88-1.0)). DLS performance decreased on the external test set (AUC 0.80 (0.71-0.88) - 0.89 (0.81-0.94), sensitivity 0.64 (0.49-0.77) - 0.92 (0.81-0.98), specificity 0.60 (0.45-0.74) - 0.90 (0.78-0.97)). Radiologists' performance was comparable on internal data (sensitivity 0.71 (0.48-0.89) - 0.95 (0.76-1.0), specificity 0.52 (0.32-0.71) - 0.97 (0.82-1.0)) and better on external data (sensitivity 0.88 (0.76-0.96) - 0.98 (0.89-1.0), specificities 0.66 (0.51-0.79) - 1.0 (0.93-1.0), p < 0.05). In over 90%, the areas of peak activation aligned with radiologists' annotations.
The DLS was able to detect and localize wrist fractures with a performance comparable to radiologists, using only a small dataset for training.
评估一款基于深度学习的图像分析软件用于检测和定位桡骨远端骨折。
在524张腕部X光片(166张显示骨折)上训练深度学习系统(DLS)。在内部测试集(100张X光片,42张显示骨折)和外部测试集(200张X光片,100张显示骨折)上测试性能。将X光片的单视图和组合视图展示给DLS和三位阅片者。要求阅片者用感兴趣区域(ROI)指出骨折位置。DLS给出分数(范围0 - 1)和一张热图。检测性能以最佳阈值下的AUC、灵敏度和特异度表示,并与放射科医生的性能进行比较。将热图与放射科医生的ROI进行比较。
DLS在内部测试集上表现出色(AUC为0.93(95%置信区间(CI)0.82 - 0.98) - 0.96(0.87 - 1.00),灵敏度为0.81(0.58 - 0.95) - 0.90(0.70 - 0.99),特异度为0.86(0.68 - 0.96) - 1.0(0.88 - 1.0))。DLS在外部测试集上性能下降(AUC为0.80(0.71 - 0.88) - 0.89(0.81 - 0.94),灵敏度为0.64(0.49 - 0.77) - 0.92(0.81 - 0.98),特异度为0.60(0.45 - 0.74) - 0.90(0.78 - 0.97))。放射科医生在内部数据上的性能相当(灵敏度为0.71(0.48 - 0.89) - 0.95(0.76 - 1.0),特异度为0.52(0.32 - 0.71) - 0.97(0.82 - 1.0)),在外部数据上表现更好(灵敏度为0.88(0.76 - 0.96) - 0.98(0.89 - 1.0),特异度为0.66(0.51 - 0.79) - 1.0(0.93 - ;p < 0.05)。超过90%的激活峰值区域与放射科医生的标注一致。
DLS仅使用少量数据集进行训练,就能以与放射科医生相当的性能检测和定位腕部骨折。