Siemens Healthineers, Digital Technology and Innovation, Princeton, NJ, USA.
Siemens Healthineers, Digital Technology and Innovation, Princeton, NJ, USA.
Med Image Anal. 2021 Feb;68:101855. doi: 10.1016/j.media.2020.101855. Epub 2020 Oct 14.
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance. An additional example is the classification of anatomical views based on 2D Ultrasound images. Often, the anatomical context captured in a frame is not sufficient to recognize the underlying anatomy. Current machine learning solutions for these problems are typically limited to providing probabilistic predictions, relying on the capacity of underlying models to adapt to limited information and the high degree of label noise. In practice, however, this leads to overconfident systems with poor generalization on unseen data. To account for this, we propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure which captures the confidence of the system in the predicted output. We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams including computed radiography, ultrasonography and magnetic resonance imaging. In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks, e.g., by 8% to 0.91 with an expected rejection rate of under 25% for the classification of different abnormalities in chest radiographs. In addition, we show that using uncertainty-driven bootstrapping to filter the training data, one can achieve a significant increase in robustness and accuracy. Finally, we present a multi-reader study showing that the predictive uncertainty is indicative of reader errors.
医学图像的解释是一项具有挑战性的任务,通常由于伪影、遮挡、对比度有限等因素而变得复杂。最值得注意的是胸部 X 光摄影的情况,其中异常的检测和分类存在很高的观察者间变异性。这主要是由于数据中没有明确的证据或疾病表现的主观定义。另一个例子是基于 2D 超声图像对解剖视图的分类。通常,一帧中捕获的解剖结构不足以识别潜在的解剖结构。当前针对这些问题的机器学习解决方案通常仅限于提供概率预测,依赖于基础模型适应有限信息和高度标签噪声的能力。然而,在实践中,这导致了过度自信的系统,对未见数据的泛化能力较差。为了解决这个问题,我们提出了一个系统,该系统不仅学习分类的概率估计,还学习捕获系统对预测输出置信度的显式不确定性度量。我们认为,这种方法对于解释不同放射学检查(包括计算机放射摄影、超声和磁共振成像)的医学图像所固有的模糊性是必不可少的。在我们的实验中,我们证明了基于预测不确定性的样本拒绝可以显著提高各种任务的 ROC-AUC,例如,通过将预测异常的分类的预期拒绝率降低到 25%以下,将不同异常的分类的 ROC-AUC 从 0.83 提高到 0.91。此外,我们还表明,使用不确定性驱动的自举对训练数据进行过滤,可以显著提高稳健性和准确性。最后,我们进行了一项多读者研究,表明预测不确定性是读者错误的指示。