School of Automation, Guangdong University of Technology, Guangzhou 510006, China.
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.
Sensors (Basel). 2022 Jun 11;22(12):4420. doi: 10.3390/s22124420.
Meta-learning frameworks have been proposed to generalize machine learning models for domain adaptation without sufficient label data in computer vision. However, text classification with meta-learning is less investigated. In this paper, we propose SumFS to find global top-ranked sentences by extractive summary and improve the local vocabulary category features. The SumFS consists of three modules: (1) an unsupervised text summarizer that removes redundant information; (2) a weighting generator that associates feature words with attention scores to weight the lexical representations of words; (3) a regular meta-learning framework that trains with limited labeled data using a ridge regression classifier. In addition, a marine news dataset was established with limited label data. The performance of the algorithm was tested on THUCnews, Fudan, and marine news datasets. Experiments show that the SumFS can maintain or even improve accuracy while reducing input features. Moreover, the training time of each epoch is reduced by more than 50%.
元学习框架已经被提出,以在计算机视觉中没有足够的标签数据的情况下,对机器学习模型进行泛化以进行领域自适应。然而,元学习在文本分类中的应用研究较少。在本文中,我们提出了 SumFS,通过抽取式摘要找到全局排名最高的句子,并改进局部词汇类别特征。SumFS 由三个模块组成:(1) 一个无监督的文本摘要器,用于去除冗余信息;(2) 一个权重生成器,将特征词与注意力得分相关联,以对词的词汇表示进行加权;(3) 一个基于岭回归分类器的有限标签数据的正则元学习框架。此外,还建立了一个有限标签数据的海洋新闻数据集。该算法在 THUCnews、Fudan 和海洋新闻数据集上进行了性能测试。实验表明,SumFS 可以在保持甚至提高准确性的同时减少输入特征。此外,每个时期的训练时间减少了 50%以上。