Suppr超能文献

MOOD 2020:医学图像上的分布外检测和定位的公共基准。

MOOD 2020: A Public Benchmark for Out-of-Distribution Detection and Localization on Medical Images.

出版信息

IEEE Trans Med Imaging. 2022 Oct;41(10):2728-2738. doi: 10.1109/TMI.2022.3170077. Epub 2022 Sep 30.

Abstract

Detecting Out-of-Distribution (OoD) data is one of the greatest challenges in safe and robust deployment of machine learning algorithms in medicine. When the algorithms encounter cases that deviate from the distribution of the training data, they often produce incorrect and over-confident predictions. OoD detection algorithms aim to catch erroneous predictions in advance by analysing the data distribution and detecting potential instances of failure. Moreover, flagging OoD cases may support human readers in identifying incidental findings. Due to the increased interest in OoD algorithms, benchmarks for different domains have recently been established. In the medical imaging domain, for which reliable predictions are often essential, an open benchmark has been missing. We introduce the Medical-Out-Of-Distribution-Analysis-Challenge (MOOD) as an open, fair, and unbiased benchmark for OoD methods in the medical imaging domain. The analysis of the submitted algorithms shows that performance has a strong positive correlation with the perceived difficulty, and that all algorithms show a high variance for different anomalies, making it yet hard to recommend them for clinical practice. We also see a strong correlation between challenge ranking and performance on a simple toy test set, indicating that this might be a valuable addition as a proxy dataset during anomaly detection algorithm development.

摘要

检测离群数据(Out-of-Distribution,OoD)是机器学习算法在医学领域安全稳健部署的最大挑战之一。当算法遇到偏离训练数据分布的情况时,它们通常会产生不正确且过于自信的预测。OoD 检测算法旨在通过分析数据分布和检测潜在的故障实例,提前捕捉错误的预测。此外,标记 OoD 病例可能有助于人类读者识别偶然发现。由于对 OoD 算法的兴趣增加,最近已经为不同领域建立了基准。在医学成像领域,可靠的预测通常至关重要,但一直缺少可靠的 OoD 基准。我们引入了医学离群分析挑战(Medical-Out-Of-Distribution-Analysis-Challenge,MOOD),作为医学成像领域 OoD 方法的开放、公平和无偏基准。对提交算法的分析表明,性能与感知难度呈强正相关,并且所有算法在不同异常情况下的方差都很高,这使得它们难以推荐用于临床实践。我们还发现挑战排名与简单玩具测试集上的性能之间存在很强的相关性,这表明这可能是异常检测算法开发过程中代理数据集的有价值补充。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验