Center for Data Science, New York University, New York, NY, USA.
Department of Radiology, NYU Langone Health, New York, NY, USA.
Sci Rep. 2022 Apr 27;12(1):6877. doi: 10.1038/s41598-022-10526-z.
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson's paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.
深度神经网络(DNN)在基于图像的医学诊断中显示出了潜力,但由于它们可能会因为与潜在病理学无关的原因而失败,所以不能完全信任。人类不太可能犯这样肤浅的错误,因为他们使用的特征是基于医学科学的。因此,了解 DNN 是否使用与人类不同的特征非常重要。为此,我们提出了一个用于比较医学诊断中人类和机器感知的框架。我们从扰动鲁棒性的角度来比较,并通过进行亚组分析来减轻辛普森悖论。该框架通过乳腺癌筛查的案例研究得到了验证,我们分别分析了微钙化和软组织病变。虽然还不能确定人类和 DNN 是否使用不同的特征来检测微钙化,但我们发现,对于软组织病变,DNN 依赖于放射科医生忽略的高频成分。此外,这些特征位于放射科医生认为最可疑的图像区域之外。只有通过亚组分析才能看到人类和机器之间的这种差异,这凸显了将医学领域知识纳入比较的重要性。