Den Hengst Stella, Borren Noor, Van Lieshout Esther M M, Doornberg Job N, Van Walsum Theo, Wijffels Mathieu M E, Verhofstad Michael H J
Trauma Research Unit, Department of Surgery, Erasmus MC, University Medical Center.
Department of Trauma Surgery, University Medical Centre Groningen and Groningen University, Groningen, The Netherlands.
J Thorac Imaging. 2025 Sep 1;40(5):e0833. doi: 10.1097/RTI.0000000000000833.
Trauma-induced rib fractures are common injuries. The gold standard for diagnosing rib fractures is computed tomography (CT), but the sensitivity in the acute setting is low, and interpreting CT slices is labor-intensive. This has led to the development of new diagnostic approaches leveraging deep learning (DL) models. This systematic review and pooled analysis aimed to compare the performance of DL models in the detection, segmentation, and classification of rib fractures based on CT scans.
A literature search was performed using various databases for studies describing DL models detecting, segmenting, or classifying rib fractures from CT data. Reported performance metrics included sensitivity, false-positive rate, F1-score, precision, accuracy, and mean average precision. A meta-analysis was performed on the sensitivity scores to compare the DL models with clinicians.
Of the 323 identified records, 25 were included. Twenty-one studies reported on detection, four on segmentation, and 10 on classification. Twenty studies had adequate data for meta-analysis. The gold standard labels were provided by clinicians who were radiologists and orthopedic surgeons. For detecting rib fractures, DL models had a higher sensitivity (86.7%; 95% CI: 82.6%-90.2%) than clinicians (75.4%; 95% CI: 68.1%-82.1%). In classification, the sensitivity of DL models for displaced rib fractures (97.3%; 95% CI: 95.6%-98.5%) was significantly better than that of clinicians (88.2%; 95% CI: 84.8%-91.3%).
DL models for rib fracture detection and classification achieved promising results. With better sensitivities than clinicians for detecting and classifying displaced rib fractures, the future should focus on implementing DL models in daily clinics.
Level III-systematic review and pooled analysis.
创伤性肋骨骨折是常见损伤。诊断肋骨骨折的金标准是计算机断层扫描(CT),但在急性期其敏感性较低,且解读CT切片工作量大。这促使了利用深度学习(DL)模型的新诊断方法的发展。本系统评价和汇总分析旨在比较基于CT扫描的DL模型在肋骨骨折检测、分割和分类方面的性能。
使用多个数据库进行文献检索,以查找描述从CT数据中检测、分割或分类肋骨骨折的DL模型的研究。报告的性能指标包括敏感性、假阳性率、F1分数、精确度、准确度和平均精度。对敏感性评分进行荟萃分析,以将DL模型与临床医生进行比较。
在323条识别出的记录中,纳入了25条。21项研究报告了检测情况,4项报告了分割情况,10项报告了分类情况。20项研究有足够的数据进行荟萃分析。金标准标签由放射科医生和骨科医生等临床医生提供。对于肋骨骨折检测,DL模型的敏感性(86.7%;95%CI:82.6%-90.2%)高于临床医生(75.4%;95%CI:68.1%-82.1%)。在分类方面,DL模型对移位肋骨骨折的敏感性(97.3%;95%CI:95.6%-98.5%)明显优于临床医生(88.2%;95%CI:84.8%-91.3%)。
用于肋骨骨折检测和分类的DL模型取得了有前景的结果。在检测和分类移位肋骨骨折方面,DL模型的敏感性优于临床医生,未来应专注于在日常临床中应用DL模型。
III级——系统评价和汇总分析。