使用分布内投票进行分布外检测，以胸部 X 射线分类为例。

Out-of-distribution detection with in-distribution voting using the medical example of chest x-ray classification.

机构信息

Munich Institute of Biomedical Engineering and the School of Computation, Information, and Technology, Technical University of Munich, Munich, Germany.

Institute for History and Ethics in Medicine and Munich School of Technology in Society, Technical University of Munich, Munich, Germany.

出版信息

Med Phys. 2024 Apr;51(4):2721-2732. doi: 10.1002/mp.16790. Epub 2023 Oct 13.

DOI:10.1002/mp.16790

Abstract

BACKGROUND

Deep learning models are being applied to more and more use cases with astonishing success stories, but how do they perform in the real world? Models are typically tested on specific cleaned data sets, but when deployed in the real world, the model will encounter unexpected, out-of-distribution (OOD) data.

PURPOSE

To investigate the impact of OOD radiographs on existing chest x-ray classification models and to increase their robustness against OOD data.

METHODS

The study employed the commonly used chest x-ray classification model, CheXnet, trained on the chest x-ray 14 data set, and tested its robustness against OOD data using three public radiography data sets: IRMA, Bone Age, and MURA, and the ImageNet data set. To detect OOD data for multi-label classification, we proposed in-distribution voting (IDV). The OOD detection performance is measured across data sets using the area under the receiver operating characteristic curve (AUC) analysis and compared with Mahalanobis-based OOD detection, MaxLogit, MaxEnergy, self-supervised OOD detection (SS OOD), and CutMix.

RESULTS

Without additional OOD detection, the chest x-ray classifier failed to discard any OOD images, with an AUC of 0.5. The proposed IDV approach trained on ID (chest x-ray 14) and OOD data (IRMA and ImageNet) achieved, on average, 0.999 OOD AUC across the three data sets, surpassing all other OOD detection methods. Mahalanobis-based OOD detection achieved an average OOD detection AUC of 0.982. IDV trained solely with a few thousand ImageNet images had an AUC 0.913, which was considerably higher than MaxLogit (0.726), MaxEnergy (0.724), SS OOD (0.476), and CutMix (0.376).

CONCLUSIONS

The performance of all tested OOD detection methods did not translate well to radiography data sets, except Mahalanobis-based OOD detection and the proposed IDV method. Consequently, training solely on ID data led to incorrect classification of OOD images as ID, resulting in increased false positive rates. IDV substantially improved the model's ID classification performance, even when trained with data that will not occur in the intended use case or test set (ImageNet), without additional inference overhead or performance decrease in the target classification. The corresponding code is available at https://gitlab.lrz.de/IP/a-knee-cannot-have-lung-disease.

摘要

背景

深度学习模型在越来越多的应用案例中取得了惊人的成功，但它们在现实世界中的表现如何？模型通常在特定的清理数据集上进行测试，但在部署到现实世界时，模型将遇到意想不到的、分布外（OOD）数据。

目的

研究 OOD 射线照片对现有胸部 X 射线分类模型的影响，并提高模型对 OOD 数据的鲁棒性。

方法

该研究采用常用的胸部 X 射线分类模型 CheXnet，在 chest x-ray 14 数据集上进行训练，并使用三个公共射线照片数据集（IRMA、Bone Age 和 MURA）和 ImageNet 数据集测试其对 OOD 数据的鲁棒性。为了对多标签分类进行 OOD 检测，我们提出了分布内投票（IDV）。使用接收器操作特征曲线（AUC）分析在数据集之间测量 OOD 检测性能，并与基于马氏距离的 OOD 检测、MaxLogit、MaxEnergy、自监督 OOD 检测（SS OOD）和 CutMix 进行比较。

结果

在没有额外的 OOD 检测的情况下，胸部 X 射线分类器未能丢弃任何 OOD 图像，AUC 为 0.5。在 ID（chest x-ray 14）和 OOD 数据（IRMA 和 ImageNet）上训练的提议的 IDV 方法在三个数据集上平均实现了 0.999 的 OOD AUC，超过了所有其他 OOD 检测方法。基于马氏距离的 OOD 检测的平均 OOD 检测 AUC 为 0.982。仅使用几千张 ImageNet 图像训练的 IDV 的 AUC 为 0.913，明显高于 MaxLogit（0.726）、MaxEnergy（0.724）、SS OOD（0.476）和 CutMix（0.376）。

结论

除了基于马氏距离的 OOD 检测和提议的 IDV 方法外，所有测试的 OOD 检测方法的性能都不能很好地转化为射线照片数据集。因此，仅在 ID 数据上进行训练会导致 OOD 图像被错误地分类为 ID，从而导致误报率增加。IDV 极大地提高了模型的 ID 分类性能，即使是在训练时使用不会出现在预期用例或测试集中的数据（ImageNet），也不会增加额外的推断开销或目标分类的性能下降。相应的代码可在 https://gitlab.lrz.de/IP/a-knee-cannot-have-lung-disease 上获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用分布内投票进行分布外检测，以胸部 X 射线分类为例。

Out-of-distribution detection with in-distribution voting using the medical example of chest x-ray classification.

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

使用分布内投票进行分布外检测，以胸部 X 射线分类为例。

Out-of-distribution detection with in-distribution voting using the medical example of chest x-ray classification.

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献