University Hospital Basel, University of Basel, Clinic of Radiology and Nuclear Medicine, University of Basel, Petersgraben 4, 4031, Basel, Switzerland.
Amsler Consulting Basel, Gundeldingerrain 111, 4059, Basel, Switzerland.
Eur Radiol. 2021 Sep;31(9):6816-6824. doi: 10.1007/s00330-021-07811-2. Epub 2021 Mar 19.
To evaluate the performance of a deep convolutional neural network (DCNN) in detecting and classifying distal radius fractures, metal, and cast on radiographs using labels based on radiology reports. The secondary aim was to evaluate the effect of the training set size on the algorithm's performance.
A total of 15,775 frontal and lateral radiographs, corresponding radiology reports, and a ResNet18 DCNN were used. Fracture detection and classification models were developed per view and merged. Incrementally sized subsets served to evaluate effects of the training set size. Two musculoskeletal radiologists set the standard of reference on radiographs (test set A). A subset (B) was rated by three radiology residents. For a per-study-based comparison with the radiology residents, the results of the best models were merged. Statistics used were ROC and AUC, Youden's J statistic (J), and Spearman's correlation coefficient (ρ).
The models' AUC/J on (A) for metal and cast were 0.99/0.98 and 1.0/1.0. The models' and residents' AUC/J on (B) were similar on fracture (0.98/0.91; 0.98/0.92) and multiple fragments (0.85/0.58; 0.91/0.70). Training set size and AUC correlated on metal (ρ = 0.740), cast (ρ = 0.722), fracture (frontal ρ = 0.947, lateral ρ = 0.946), multiple fragments (frontal ρ = 0.856), and fragment displacement (frontal ρ = 0.595).
The models trained on a DCNN with report-based labels to detect distal radius fractures on radiographs are suitable to aid as a secondary reading tool; models for fracture classification are not ready for clinical use. Bigger training sets lead to better models in all categories except joint affection.
• Detection of metal and cast on radiographs is excellent using AI and labels extracted from radiology reports. • Automatic detection of distal radius fractures on radiographs is feasible and the performance approximates radiology residents. • Automatic classification of the type of distal radius fracture varies in accuracy and is inferior for joint involvement and fragment displacement.
评估基于放射报告标签的深度卷积神经网络(DCNN)在检测和分类桡骨远端骨折、金属和石膏方面的性能。次要目的是评估训练集大小对算法性能的影响。
共使用了 15775 张前后位和侧位 X 线片、相应的放射学报告和 ResNet18 DCNN。针对每个视图开发了骨折检测和分类模型,并进行了合并。递增大小的子集用于评估训练集大小的影响。两名肌肉骨骼放射科医生在 X 线片上设置参考标准(测试集 A)。一个子集(B)由 3 名放射科住院医师进行评估。为了与放射科住院医师进行基于每项研究的比较,合并了最佳模型的结果。使用的统计数据包括 ROC 和 AUC、Youden 的 J 统计量(J)和 Spearman 相关系数(ρ)。
模型在(A)中对金属和石膏的 AUC/J 分别为 0.99/0.98 和 1.0/1.0。模型和住院医师在(B)中对骨折(0.98/0.91;0.98/0.92)和多处碎片(0.85/0.58;0.91/0.70)的 AUC/J 相似。金属(ρ=0.740)、石膏(ρ=0.722)、骨折(正面 ρ=0.947,侧面 ρ=0.946)、多处碎片(正面 ρ=0.856)和碎片移位(正面 ρ=0.595)的训练集大小与 AUC 相关。
使用基于报告标签的 DCNN 训练的模型能够检测 X 线片中的桡骨远端骨折,适合作为辅助二次阅读工具;骨折分类模型尚不适用于临床应用。除关节病变外,所有类别中更大的训练集都能得到更好的模型。
• 使用 AI 和从放射学报告中提取的标签,对 X 线片中的金属和石膏进行检测效果极佳。• 自动检测 X 线片中的桡骨远端骨折是可行的,其性能与放射科住院医师相近。• 自动分类桡骨远端骨折的类型在准确性上存在差异,对于关节受累和碎片移位的分类效果较差。