Cui Can, Wang Yaohong, Bao Shunxing, Tang Yucheng, Deng Ruining, Remedios Lucas W, Asad Zuhayr, Roland Joseph T, Lau Ken S, Liu Qi, Coburn Lori A, Wilson Keith T, Landman Bennett A, Huo Yuankai
Vanderbilt University, Nashville TN 37235, USA.
Vanderbilt University Medical Center, Nashville TN 37215, USA.
Med Image Learn Ltd Noisy Data (2023). 2023 Oct;14307:82-92. doi: 10.1007/978-3-031-44917-8_8. Epub 2023 Oct 8.
Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were often employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed "unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).
最近已经开发了许多异常检测方法,尤其是深度学习方法,用于通过在训练期间仅使用正常图像来识别异常图像形态。不幸的是,许多先前的异常检测方法是针对特定的“已知”异常(例如脑肿瘤、骨骨折、细胞类型)进行优化的。此外,即使在训练过程中仅使用正常图像,但在验证过程中(例如轮次选择、超参数调整)通常会使用异常图像,这可能会无意中泄露所谓的“未知”异常。在本研究中,我们通过以下方式研究了医学图像中通用异常检测的这两个重要方面:(1)在四个医学数据集上比较各种异常检测方法;(2)研究在仅使用正常图像的验证阶段如何无偏地选择最优异常检测模型这一不可避免但经常被忽视的问题;(3)提出一种简单的决策级集成方法,以在不知道异常情况的情况下利用不同类型异常检测的优势。我们的实验结果表明,在所有数据集中,没有一种评估方法始终能取得最佳性能。我们提出的方法总体上提高了性能的稳健性(平均AUC为0.956)。