IEEE Trans Med Imaging. 2020 Apr;39(4):1184-1194. doi: 10.1109/TMI.2019.2945514. Epub 2019 Oct 7.
We present a deep convolutional neural network for breast cancer screening exam classification, trained, and evaluated on over 200000 exams (over 1000000 images). Our network achieves an AUC of 0.895 in predicting the presence of cancer in the breast, when tested on the screening population. We attribute the high accuracy to a few technical advances. 1) Our network's novel two-stage architecture and training procedure, which allows us to use a high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. 2) A custom ResNet-based network used as a building block of our model, whose balance of depth and width is optimized for high-resolution medical images. 3) Pretraining the network on screening BI-RADS classification, a related task with more noisy labels. 4) Combining multiple input views in an optimal way among a number of possible choices. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and show that our model is as accurate as experienced radiologists when presented with the same data. We also show that a hybrid model, averaging the probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To further understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, the model's design, training procedure, errors, and properties of its internal representations. Our best models are publicly available at https://github.com/nyukat/breast_cancer_classifier.
我们提出了一种用于乳腺癌筛查检查分类的深度卷积神经网络,该网络在超过 200000 次检查(超过 1000000 张图像)上进行了训练和评估。当在筛查人群中进行测试时,我们的网络在预测乳腺癌的存在方面达到了 0.895 的 AUC。我们将高精度归因于一些技术进步。1)我们的网络新颖的两级架构和训练程序,允许我们使用大容量的补丁级网络从像素级标签和从宏观乳房级标签学习的网络学习。2)我们的模型中使用的自定义 ResNet 网络作为构建块,其深度和宽度的平衡针对高分辨率医学图像进行了优化。3)在筛查 BI-RADS 分类上对网络进行预训练,这是一项具有更多噪声标签的相关任务。4)在许多可能的选择中以最佳方式组合多个输入视图。为了验证我们的模型,我们进行了一项有 14 位读者参与的读者研究,每位读者阅读了 720 次筛查乳房 X 光检查,并表明当呈现相同数据时,我们的模型与经验丰富的放射科医生一样准确。我们还表明,将放射科医生预测的恶性肿瘤概率与我们的神经网络预测进行平均的混合模型比两者单独使用更准确。为了进一步了解我们的结果,我们对我们的网络在筛查人群的不同子群体、模型设计、训练程序、错误以及其内部表示的特性方面的性能进行了全面分析。我们的最佳模型可在 https://github.com/nyukat/breast_cancer_classifier 上获得。