Kumar Sandeep, Sharma Amit, Shokeen Vikrant, Azar Ahmad Taher, Amin Syed Umar, Khan Zafar Iqbal
Maharaja Surajmal Institute of Technology, New Delhi, India.
IMS Engineering College, Ghaziabad, India.
Sci Rep. 2024 Oct 4;14(1):23092. doi: 10.1038/s41598-024-71125-8.
Modern natural language processing (NLP) state-of-the-art (SoTA) deep learning (DL) models have hundreds of millions of parameters, making them extremely complex. Large datasets are required for training these models, and while pretraining has reduced this requirement, human-labelled datasets are still necessary for fine-tuning. Few-shot learning (FSL) techniques, such as meta-learning, try to train models from smaller datasets to mitigate this cost. However, the tasks used to evaluate these meta-learners frequently diverge from the problems in the real world that they are meant to resolve. This work aims to apply meta-learning to a problem that is more pertinent to the real world: class incremental learning (IL). In this scenario, after completing its training, the model learns to classify newly introduced classes. One unique quality of meta-learners is that they can generalise from a small sample size to classes that have never been seen before, which makes them especially useful for class incremental learning (IL). The method describes how to emulate class IL using proxy new classes. This method allows a meta-learner to complete the task without the need for retraining. To generate predictions, the transformer-based aggregation function in a meta-learner that modifies data from examples across all classes has been proposed. The principal contributions of the model include concurrently considering the entire support and query sets, and prioritising attention to crucial samples, such as the question, to increase the significance of its impact during inference. The outcomes demonstrate that the model surpasses prevailing benchmarks in the industry. Notably, most meta-learners demonstrate significant generalisation in the context of class IL even without specific training for this task. This paper establishes a high-performing baseline for subsequent transformer-based aggregation techniques, thereby emphasising the practical significance of meta-learners in class IL.
现代自然语言处理(NLP)的先进(SoTA)深度学习(DL)模型有数亿个参数,这使得它们极其复杂。训练这些模型需要大量数据集,虽然预训练减少了这一需求,但微调仍需要人工标注的数据集。少样本学习(FSL)技术,如元学习,试图从小型数据集中训练模型以降低成本。然而,用于评估这些元学习者的任务往往与它们旨在解决的现实世界问题不一致。这项工作旨在将元学习应用于一个与现实世界更相关的问题:类别增量学习(IL)。在这种情况下,模型在完成训练后,要学会对新引入的类别进行分类。元学习者的一个独特之处在于,它们可以从小样本量推广到从未见过的类别,这使得它们在类别增量学习(IL)中特别有用。该方法描述了如何使用代理新类别来模拟类别IL。这种方法允许元学习者在无需重新训练的情况下完成任务。为了生成预测,提出了元学习者中基于Transformer的聚合函数,该函数可修改来自所有类别的示例数据。该模型的主要贡献包括同时考虑整个支持集和查询集,并优先关注关键样本,如问题,以提高其在推理过程中的影响的重要性。结果表明,该模型超越了行业内现有的基准。值得注意的是,即使没有针对此任务进行特定训练,大多数元学习者在类别IL的背景下也表现出显著的泛化能力。本文为后续基于Transformer的聚合技术建立了一个高性能基线,从而强调了元学习者在类别IL中的实际意义。