Liu Ziwei, Miao Zhongqi, Zhan Xiaohang, Wang Jiayun, Gong Boqing, Yu Stella X
IEEE Trans Pattern Anal Mach Intell. 2024 Mar;46(3):1836-1851. doi: 10.1109/TPAMI.2022.3200091. Epub 2024 Feb 6.
Real world data often exhibits a long-tailed and open-ended (i.e., with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set which includes both known and open classes. OLTR++ handles imbalanced classification, few-shot learning, open-set recognition, and active learning in one integrated algorithm, whereas existing classification approaches often focus only on one or two aspects and deliver poorly over the entire spectrum. The key challenges are: 1) how to share visual knowledge between head and tail classes, 2) how to reduce confusion between tail and open classes, and 3) how to actively explore open classes with learned knowledge. Our algorithm, OLTR++, maps images to a feature space such that visual concepts can relate to each other through a memory association mechanism and a learned metric (dynamic meta-embedding) that both respects the closed world classification of seen classes and acknowledges the novelty of open classes. Additionally, we propose an active learning scheme based on visual memory, which learns to recognize open classes in a data-efficient manner for future expansions. On three large-scale open long-tailed datasets we curated from ImageNet (object-centric), Places (scene-centric), and MS1M (face-centric) data, as well as three standard benchmarks (CIFAR-10-LT, CIFAR-100-LT, and iNaturalist-18), our approach, as a unified framework, consistently demonstrates competitive performance. Notably, our approach also shows strong potential for the active exploration of open classes and the fairness analysis of minority groups.
现实世界的数据通常呈现出长尾和开放式(即存在未见类)的分布。一个实用的识别系统必须在多数(头部)类和少数(尾部)类之间取得平衡,在整个分布上进行泛化,并在未见类(开放类)的实例出现时识别出新的类别。我们将开放长尾识别++(OLTR++)定义为从这种自然分布的数据中学习,并针对包含已知类和开放类的平衡测试集优化分类准确率。OLTR++在一个集成算法中处理不平衡分类、少样本学习、开放集识别和主动学习,而现有的分类方法通常只关注一两个方面,在整个范围内表现不佳。关键挑战在于:1)如何在头部类和尾部类之间共享视觉知识,2)如何减少尾部类和开放类之间的混淆,3)如何利用所学知识主动探索开放类。我们的算法OLTR++将图像映射到一个特征空间,使得视觉概念可以通过一种记忆关联机制和一种学习度量(动态元嵌入)相互关联,这种机制既尊重所见类的封闭世界分类,又承认开放类的新颖性。此外,我们提出了一种基于视觉记忆的主动学习方案,该方案以数据高效的方式学习识别开放类以便未来扩展。在我们从ImageNet(以物体为中心)、Places(以场景为中心)和MS1M(以人脸为中心)数据中精心挑选的三个大规模开放长尾数据集以及三个标准基准(CIFAR-10-LT、CIFAR-100-LT和iNaturalist-18)上,我们的方法作为一个统一框架,始终展现出有竞争力的性能。值得注意的是,我们的方法在主动探索开放类和少数群体公平性分析方面也显示出强大的潜力。