Radiology, Alfred Health, Melbourne, Victoria, Australia
annalise.ai, Sydney, New South Wales, Australia.
BMJ Open. 2021 Dec 7;11(12):e053024. doi: 10.1136/bmjopen-2021-053024.
To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset.
A retrospective case-control study was undertaken.
Community radiology clinics and hospitals in Australia and the USA.
A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images.
DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant.
When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976-0.986) for detecting simple pneumothorax and 0.997 (0.995-0.999) for detecting tension pneumothorax.
Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.
评估一款市售的全面性胸部 X 线摄影深度学习卷积神经网络(DCNN)检测单纯性和张力性气胸的能力,并根据以下亚组分层评估:肋间引流管的存在、肋骨、锁骨、肩胛骨或肱骨骨折或肋骨切除术、皮下气肿和直立与非直立体位。假设与整个测试数据集相比,在这些亚组中的每一个中,性能都不会有显著差异。
回顾性病例对照研究。
澳大利亚和美国的社区放射科诊所和医院。
由三名胸放射学专家对 2557 例胸部 X 线摄影研究的测试数据集进行了地面实况调查,以确定是否存在单纯性或张力性气胸以及除体位以外的每个亚组。X 线摄影体位是从放射技师在图像上的注释中得出的。
使用受试者工作特征曲线下面积(AUC)评估 DCNN 在整个测试集以及每个亚组中检测单纯性和张力性气胸的性能。AUC 差异超过 0.05 被认为具有临床意义。
与整个测试集相比,DCNN 在所有亚组中检测单纯性和张力性气胸的性能在统计学上均无差异。DCNN 检测单纯性气胸的 AUC 为 0.981(0.976-0.986),检测张力性气胸的 AUC 为 0.997(0.995-0.999)。
隐藏分层对深度学习在临床实践中的潜在失败具有重要意义。本研究表明,全面训练的 DCNN 可以在检测气胸的几个具有临床意义的亚组中对隐藏分层具有弹性。