Institute of Pediatrics, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, Guangdong, China.
Department of Anesthesiology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, Guangdong, China.
Pediatr Pulmonol. 2021 May;56(5):1036-1044. doi: 10.1002/ppul.25229. Epub 2021 Mar 19.
Comparing the efficacy of a deep-learning model in classifying the etiology of pneumonia on pediatric chest X-rays (CXRs) with that of human readers.
We built a clinical-pediatric CXR set containing 4035 patients to exploit a deep-learning model called Resnet-50 for differentiating viral from bacterial pneumonia. The dataset was split into training (80%) and validation (20%). Model performance was assessed by receiver operating characteristic curve and area under the curve (AUC) on the first test set of 400 CXRs collected from different studies. For the second test set composed of 100 independent examinations obtained from the daily clinical practice at our institution, the kappa coefficient was selected to measure the interrater agreement in a pairwise fashion for the reference standard, all reviewers, and the model. Gradient-weighted class activation mapping was used to visualize the significant areas contributing to the model prediction.
On the first test set, the best-performing classifier achieved an AUC of 0.919 (p < .001), with a sensitivity of 79.0% and specificity of 88.9%. On the second test set, the classifier achieved performance similar to that of human experts, which resulted in a sensitivity of 74.3% and specificity of 90.8%, positive and negative likelihood ratios of 8.1 and 0.3, respectively. Contingence tables and kappa values further revealed that expert reviewers and model reached substantial agreements on differentiating the etiology of pediatric pneumonia.
This study demonstrated that the model performed similarly as human reviewers and recognized the regions of pathology on CXRs.
比较深度学习模型在分类儿科胸部 X 光片(CXR)肺炎病因方面的效能与人类读者的效能。
我们构建了一个包含 4035 名患者的临床儿科 CXR 数据集,以利用名为 Resnet-50 的深度学习模型来区分病毒性和细菌性肺炎。该数据集分为训练(80%)和验证(20%)两部分。在第一个包含来自不同研究的 400 张 CXR 的测试集上,通过接收者操作特征曲线和曲线下面积(AUC)评估模型性能。对于由我们机构日常临床实践中获得的 100 个独立检查组成的第二个测试集,采用 Kappa 系数以评估参考标准、所有审阅者和模型的两两间的组内一致性。使用梯度加权类激活映射来可视化对模型预测有重要贡献的显著区域。
在第一个测试集上,表现最佳的分类器获得了 0.919 的 AUC(p<0.001),其敏感性为 79.0%,特异性为 88.9%。在第二个测试集上,该分类器的表现与人类专家相似,其敏感性为 74.3%,特异性为 90.8%,阳性和阴性似然比分别为 8.1 和 0.3。列联表和 Kappa 值进一步表明,专家审阅者和模型在区分儿科肺炎病因方面达成了实质性一致。
本研究表明,该模型的表现与人类审阅者相似,并识别了 CXR 上的病理学区域。