置信度得分：目标检测性能评估中被遗忘的维度。

Confidence Score: The Forgotten Dimension of Object Detection Performance Evaluation.

机构信息

Marduk Technologies OÜ, 12618 Tallinn, Estonia.

National Center for Robotics and Internet of Things Technology, Communication and Information Technologies Research Institute, King Abdulaziz City for Science and Technology-KACST, Riyadh 11442, Saudi Arabia.

出版信息

Sensors (Basel). 2021 Jun 25;21(13):4350. doi: 10.3390/s21134350.

DOI:10.3390/s21134350

PMID:34202089

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8271464/

Abstract

When deploying a model for object detection, a confidence score threshold is chosen to filter out false positives and ensure that a predicted bounding box has a certain minimum score. To achieve state-of-the-art performance on benchmark datasets, most neural networks use a rather low threshold as a high number of false positives is not penalized by standard evaluation metrics. However, in scenarios of Artificial Intelligence (AI) applications that require high confidence scores (e.g., due to legal requirements or consequences of incorrect detections are severe) or a certain level of model robustness is required, it is unclear which base model to use since they were mainly optimized for benchmark scores. In this paper, we propose a method to find the optimum performance point of a model as a basis for fairer comparison and deeper insights into the trade-offs caused by selecting a confidence score threshold.

摘要

在部署用于对象检测的模型时，会选择置信度得分阈值来过滤掉误报，并确保预测的边界框具有一定的最低得分。为了在基准数据集上实现最先进的性能，大多数神经网络使用相当低的阈值，因为标准评估指标不会因大量误报而受到惩罚。然而，在需要高置信度得分的人工智能 (AI) 应用场景中（例如，由于法律要求或错误检测的后果严重），或者需要一定水平的模型鲁棒性的情况下，由于它们主要针对基准得分进行优化，因此不清楚应该使用哪个基础模型。在本文中，我们提出了一种方法来找到模型的最佳性能点，作为更公平的比较和更深入了解选择置信度得分阈值所带来的权衡的基础。