Department of Computer Science, School of Science, Loughborough University, Loughborough, Leicestershire, United Kingdom.
Center for the Studies of Information Resources, Wuhan University, Wuhan, Hubei, China.
PLoS One. 2023 Jul 28;18(7):e0289204. doi: 10.1371/journal.pone.0289204. eCollection 2023.
As hieroglyphic languages, such as Chinese, differ from alphabetic languages, researchers have always been interested in using internal glyph features to enhance semantic representation. However, the models used in such studies are becoming increasingly computationally expensive, even for simple tasks like text classification. In this paper, we aim to balance model performance and computation cost in glyph-aware Chinese text classification tasks. To address this issue, we propose a lightweight ensemble learning method for glyph-aware Chinese text classification (LEGACT) that consists of typical shallow networks as base learners and machine learning classifiers as meta-learners. Through model design and a series of experiments, we demonstrate that an ensemble approach integrating shallow neural networks can achieve comparable results even when compared to large-scale transformer models. The contribution of this paper includes a lightweight yet powerful solution for glyph-aware Chinese text classification and empirical evidence of the significance of glyph features for hieroglyphic language representation. Moreover, this paper emphasizes the importance of assembling shallow neural networks with proper ensemble strategies to reduce computational workload in predictive tasks.
作为象形文字,如中文,与字母文字不同,研究人员一直有兴趣使用内部字形特征来增强语义表示。然而,此类研究中使用的模型变得越来越计算密集,即使对于文本分类等简单任务也是如此。在本文中,我们旨在平衡字形感知的中文文本分类任务中的模型性能和计算成本。为了解决这个问题,我们提出了一种轻量级的基于字形感知的中文文本分类集成学习方法(LEGACT),它由典型的浅层网络作为基学习器和机器学习分类器作为元学习器组成。通过模型设计和一系列实验,我们证明了集成浅层神经网络的方法可以在与大规模转换器模型相比时,达到可比的结果。本文的贡献包括了一种轻量级但强大的字形感知中文文本分类解决方案,以及字形特征对象形语言表示的重要性的实证证据。此外,本文强调了使用适当的集成策略将浅层神经网络组装起来以减少预测任务中的计算工作量的重要性。
J Am Med Inform Assoc. 2019-11-1
Sensors (Basel). 2021-3-22
Sensors (Basel). 2023-1-9
BMC Med Inform Decis Mak. 2021-7-30