Pradella Maurice, Weikert Thomas, Sperl Jonathan I, Kärgel Rainer, Cyriac Joshy, Achermann Rita, Sauter Alexander W, Bremerich Jens, Stieltjes Bram, Brantner Philipp, Sommer Gregor
Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Petersgraben 4, 4031 Basel, Switzerland.
Siemens Healthineers, Siemensstraße 3, 91301 Forchheim, Germany.
Quant Imaging Med Surg. 2021 Oct;11(10):4245-4257. doi: 10.21037/qims-21-142.
Manually performed diameter measurements on ECG-gated CT-angiography (CTA) represent the gold standard for diagnosis of thoracic aortic dilatation. However, they are time-consuming and show high inter-reader variability. Therefore, we aimed to evaluate the accuracy of measurements of a deep learning-(DL)-algorithm in comparison to those of radiologists and evaluated measurement times (MT).
We retrospectively analyzed 405 ECG-gated CTA exams of 371 consecutive patients with suspected aortic dilatation between May 2010 and June 2019. The DL-algorithm prototype detected aortic landmarks (deep reinforcement learning) and segmented the lumen of the thoracic aorta (multi-layer convolutional neural network). It performed measurements according to AHA-guidelines and created visual outputs. Manual measurements were performed by radiologists using centerline technique. Human performance variability (HPV), MT and DL-performance were analyzed in a research setting using a linear mixed model based on 21 randomly selected, repeatedly measured cases. DL-algorithm results were then evaluated in a clinical setting using matched differences. If the differences were within 5 mm for all locations, the cases was regarded as coherent; if there was a discrepancy >5 mm at least at one location (incl. missing values), the case was completely reviewed.
HPV ranged up to ±3.4 mm in repeated measurements under research conditions. In the clinical setting, 2,778/3,192 (87.0%) of DL-algorithm's measurements were coherent. Mean differences of paired measurements between DL-algorithm and radiologists at aortic sinus and ascending aorta were -0.45±5.52 and -0.02±3.36 mm. Detailed analysis revealed that measurements at the aortic root were over-/underestimated due to a tilted measurement plane. In total, calculated time saved by DL-algorithm was 3:10 minutes/case.
The DL-algorithm provided coherent results to radiologists at almost 90% of measurement locations, while the majority of discrepent cases were located at the aortic root. In summary, the DL-algorithm assisted radiologists in performing AHA-compliant measurements by saving 50% of time per case.
在心电图门控CT血管造影(CTA)上手动进行直径测量是诊断胸主动脉扩张的金标准。然而,这些测量耗时且在不同阅片者之间显示出较高的变异性。因此,我们旨在评估深度学习(DL)算法测量与放射科医生测量的准确性,并评估测量时间(MT)。
我们回顾性分析了2010年5月至2019年6月期间371例连续怀疑主动脉扩张患者的405次心电图门控CTA检查。DL算法原型检测主动脉标志点(深度强化学习)并分割胸主动脉管腔(多层卷积神经网络)。它根据美国心脏协会(AHA)指南进行测量并生成视觉输出。放射科医生使用中心线技术进行手动测量。在研究环境中,基于21个随机选择的、重复测量的病例,使用线性混合模型分析了人类表现变异性(HPV)、MT和DL性能。然后在临床环境中使用匹配差异评估DL算法结果。如果所有位置的差异在5毫米以内,则该病例被视为一致;如果至少在一个位置(包括缺失值)存在>5毫米的差异,则对该病例进行全面复查。
在研究条件下的重复测量中,HPV高达±3.4毫米。在临床环境中,DL算法的测量中有2778/3192(87.0%)是一致的。DL算法与放射科医生在主动脉窦和升主动脉处的配对测量平均差异分别为-0.45±5.52毫米和-0.02±3.36毫米。详细分析显示,由于测量平面倾斜,主动脉根部的测量值被高估/低估。总的来说,DL算法计算出每个病例节省的时间为3分10秒。
DL算法在几乎90%的测量位置为放射科医生提供了一致的结果,而大多数不一致的病例位于主动脉根部。总之,DL算法通过每个病例节省50%的时间,协助放射科医生进行符合AHA标准的测量。