Mitra Arka, Chakravarty Arunava, Ghosh Nirmalya, Sarkar Tandra, Sethuraman Ramanathan, Sheet Debdoot
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:1225-1228. doi: 10.1109/EMBC44109.2020.9175246.
Chest radiographs are primarily employed for the screening of pulmonary and cardio-/thoracic conditions. Being undertaken at primary healthcare centers, they require the presence of an on-premise reporting Radiologist, which is a challenge in low and middle income countries. This has inspired the development of machine learning based automation of the screening process. While recent efforts demonstrate a performance benchmark using an ensemble of deep convolutional neural networks (CNN), our systematic search over multiple standard CNN architectures identified single candidate CNN models whose classification performances were found to be at par with ensembles. Over 63 experiments spanning 400 hours, executed on a 11.3 FP32 TensorTFLOPS compute system, we found the Xception and ResNet-18 architectures to be consistent performers in identifying co-existing disease conditions with an average AUC of 0.87 across nine pathologies. We conclude on the reliability of the models by assessing their saliency maps generated using the randomized input sampling for explanation (RISE) method and qualitatively validating them against manual annotations locally sourced from an experienced Radiologist. We also draw a critical note on the limitations of the publicly available CheXpert dataset primarily on account of disparity in class distribution in training vs. testing sets, and unavailability of sufficient samples for few classes, which hampers quantitative reporting due to sample insufficiency.
胸部X光片主要用于筛查肺部和心肺/胸科疾病。在初级医疗保健中心进行胸部X光检查时,需要有现场报告的放射科医生,这在低收入和中等收入国家是一项挑战。这激发了基于机器学习的筛查过程自动化的发展。虽然最近的努力展示了使用深度卷积神经网络(CNN)集成的性能基准,但我们对多个标准CNN架构的系统搜索确定了单个候选CNN模型,其分类性能与集成模型相当。在一个11.3 FP32 TensorTFLOPS计算系统上进行了超过63次实验,历时400小时,我们发现Xception和ResNet-18架构在识别九种病理共存疾病状况方面表现稳定,平均AUC为0.87。我们通过评估使用随机输入采样解释(RISE)方法生成的显著性图并根据从经验丰富的放射科医生那里本地获取的手动注释进行定性验证,来确定模型的可靠性。我们还对公开可用的CheXpert数据集的局限性提出了批评,主要是由于训练集与测试集的类别分布存在差异,以及少数类别没有足够的样本,这因样本不足而妨碍了定量报告。