文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

组合浅层还是集成深层?迈向有向汉字分类的轻量级解决方案。

Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification.

机构信息

Department of Computer Science, School of Science, Loughborough University, Loughborough, Leicestershire, United Kingdom.

Center for the Studies of Information Resources, Wuhan University, Wuhan, Hubei, China.

出版信息

PLoS One. 2023 Jul 28;18(7):e0289204. doi: 10.1371/journal.pone.0289204. eCollection 2023.


DOI:10.1371/journal.pone.0289204
PMID:37506054
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10381045/
Abstract

As hieroglyphic languages, such as Chinese, differ from alphabetic languages, researchers have always been interested in using internal glyph features to enhance semantic representation. However, the models used in such studies are becoming increasingly computationally expensive, even for simple tasks like text classification. In this paper, we aim to balance model performance and computation cost in glyph-aware Chinese text classification tasks. To address this issue, we propose a lightweight ensemble learning method for glyph-aware Chinese text classification (LEGACT) that consists of typical shallow networks as base learners and machine learning classifiers as meta-learners. Through model design and a series of experiments, we demonstrate that an ensemble approach integrating shallow neural networks can achieve comparable results even when compared to large-scale transformer models. The contribution of this paper includes a lightweight yet powerful solution for glyph-aware Chinese text classification and empirical evidence of the significance of glyph features for hieroglyphic language representation. Moreover, this paper emphasizes the importance of assembling shallow neural networks with proper ensemble strategies to reduce computational workload in predictive tasks.

摘要

作为象形文字,如中文,与字母文字不同,研究人员一直有兴趣使用内部字形特征来增强语义表示。然而,此类研究中使用的模型变得越来越计算密集,即使对于文本分类等简单任务也是如此。在本文中,我们旨在平衡字形感知的中文文本分类任务中的模型性能和计算成本。为了解决这个问题,我们提出了一种轻量级的基于字形感知的中文文本分类集成学习方法(LEGACT),它由典型的浅层网络作为基学习器和机器学习分类器作为元学习器组成。通过模型设计和一系列实验,我们证明了集成浅层神经网络的方法可以在与大规模转换器模型相比时,达到可比的结果。本文的贡献包括了一种轻量级但强大的字形感知中文文本分类解决方案,以及字形特征对象形语言表示的重要性的实证证据。此外,本文强调了使用适当的集成策略将浅层神经网络组装起来以减少预测任务中的计算工作量的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/4ee8e02396b2/pone.0289204.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/b3a88992c9fd/pone.0289204.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/80d6a19b5242/pone.0289204.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/ec911abbf534/pone.0289204.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/11997da4354e/pone.0289204.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/1463b316dd2d/pone.0289204.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/cb19f3b271b0/pone.0289204.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/4ee8e02396b2/pone.0289204.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/b3a88992c9fd/pone.0289204.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/80d6a19b5242/pone.0289204.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/ec911abbf534/pone.0289204.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/11997da4354e/pone.0289204.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/1463b316dd2d/pone.0289204.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/cb19f3b271b0/pone.0289204.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0074/10381045/4ee8e02396b2/pone.0289204.g007.jpg

相似文献

[1]
Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification.

PLoS One. 2023

[2]
GlyReShot: A glyph-aware model with label refinement for few-shot Chinese agricultural named entity recognition.

Heliyon. 2024-6-3

[3]
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

J Am Med Inform Assoc. 2019-11-1

[4]
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.

BMC Med Res Methodol. 2022-7-2

[5]
MuTCELM: An optimal multi-TextCNN-based ensemble learning for text classification.

Heliyon. 2024-9-30

[6]
MRI-Based Brain Tumor Classification Using Ensemble of Deep Features and Machine Learning Classifiers.

Sensors (Basel). 2021-3-22

[7]
A Lightweight Sentiment Analysis Framework for a Micro-Intelligent Terminal.

Sensors (Basel). 2023-1-9

[8]
Feature-enhanced text-inception model for Chinese long text classification.

Sci Rep. 2023-2-6

[9]
CapsTM: capsule network for Chinese medical text matching.

BMC Med Inform Decis Mak. 2021-7-30

[10]
Construction and Research on Chinese Semantic Mapping Based on Linguistic Features and Sparse Self-Learning Neural Networks.

Comput Intell Neurosci. 2022

本文引用的文献

[1]
A model of integrating convolution and BiGRU dual-channel mechanism for Chinese medical text classifications.

PLoS One. 2023

[2]
A survey on text classification: Practical perspectives on the Italian language.

PLoS One. 2022

[3]
Authorship identification using ensemble learning.

Sci Rep. 2022-6-9

[4]
A new framework based on features modeling and ensemble learning to predict query performance.

PLoS One. 2021

[5]
FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.

PLoS One. 2020-2-6

[6]
An ensemble learning approach jointly modeling main and interaction effects in genetic association studies.

Genet Epidemiol. 2008-5

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索