Wang Xiaochuan, Zhang Bo, Wang Fei, Bao Tao, Lu Zhiqing, Bao Jiawei
China Ship Scientific Research Center, Wuxi, China.
Taihu Laboratory of Deepsea Technology Science, Wuxi, China.
PLoS One. 2025 Jul 7;20(7):e0327694. doi: 10.1371/journal.pone.0327694. eCollection 2025.
Traditional uncertainty sampling methods in active learning often neglect category information, leading to imbalanced sample selection in multi-class computer vision tasks. Our approach integrates category information with uncertainty sampling through a novel active learning framework to address this limitation. Our method employs a pre-trained VGG16 architecture and cosine similarity metrics to efficiently extract category features without requiring additional model training. The framework combines these features with traditional uncertainty measures to ensure balanced sampling across classes while maintaining computational efficiency. Extensive experiments across both object detection and image classification tasks validate our method's effectiveness. For object detection, our approach achieves competitive mAP scores while ensuring balanced category representation. For image classification, our method achieves accuracy comparable to state-of-the-art approaches while reducing computational overhead by up to 80%. The results validate our approach's ability to balance sampling efficiency with dataset representativeness across different computer vision tasks. This work offers a practical, efficient solution for large-scale data annotation in domains with limited labeled data and diverse class distributions.
主动学习中的传统不确定性采样方法在多类计算机视觉任务中常常忽略类别信息,导致样本选择不均衡。我们的方法通过一个新颖的主动学习框架将类别信息与不确定性采样相结合,以解决这一局限性。我们的方法采用预训练的VGG1,6架构和余弦相似性度量,无需额外的模型训练就能有效地提取类别特征。该框架将这些特征与传统的不确定性度量相结合,以确保在保持计算效率的同时跨类别进行均衡采样。在目标检测和图像分类任务上进行的大量实验验证了我们方法的有效性。对于目标检测而言,我们的方法在确保类别表示均衡的同时取得了具有竞争力的平均精度均值(mAP)分数。对于图像分类,我们的方法在将计算开销降低多达80%的情况下,实现了与最先进方法相当的准确率。这些结果验证了我们的方法在不同计算机视觉任务中平衡采样效率与数据集代表性的能力。这项工作为在标记数据有限且类别分布多样的领域中进行大规模数据标注提供了一个实用、高效的解决方案。