• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

低资源场景下用于语法自动校正的动态解码与双合成数据

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.

作者信息

Musyafa Ahmad, Gao Ying, Solyman Aiman, Khan Siraj, Cai Wentian, Khan Muhammad Faizan

机构信息

School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.

Department of Informatics Engineering, Pamulang University, South Tangerang, Indonesia.

出版信息

PeerJ Comput Sci. 2024 Jul 5;10:e2122. doi: 10.7717/peerj-cs.2122. eCollection 2024.

DOI:10.7717/peerj-cs.2122
PMID:38983192
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11232608/
Abstract

Grammar error correction systems are pivotal in the field of natural language processing (NLP), with a primary focus on identifying and correcting the grammatical integrity of written text. This is crucial for both language learning and formal communication. Recently, neural machine translation (NMT) has emerged as a promising approach in high demand. However, this approach faces significant challenges, particularly the scarcity of training data and the complexity of grammar error correction (GEC), especially for low-resource languages such as Indonesian. To address these challenges, we propose InSpelPoS, a confusion method that combines two synthetic data generation methods: the Inverted Spellchecker and Patterns+POS. Furthermore, we introduce an adapted seq2seq framework equipped with a dynamic decoding method and state-of-the-art Transformer-based neural language models to enhance the accuracy and efficiency of GEC. The dynamic decoding method is capable of navigating the complexities of GEC and correcting a wide range of errors, including contextual and grammatical errors. The proposed model leverages the contextual information of words and sentences to generate a corrected output. To assess the effectiveness of our proposed framework, we conducted experiments using synthetic data and compared its performance with existing GEC systems. The results demonstrate a significant improvement in the accuracy of Indonesian GEC compared to existing methods.

摘要

语法错误纠正系统在自然语言处理(NLP)领域至关重要,主要专注于识别和纠正书面文本的语法完整性。这对于语言学习和正式交流都至关重要。最近,神经机器翻译(NMT)已成为一种需求旺盛的有前途的方法。然而,这种方法面临重大挑战,特别是训练数据的稀缺以及语法错误纠正(GEC)的复杂性,尤其是对于像印尼语这样的低资源语言。为了应对这些挑战,我们提出了InSpelPoS,一种结合了两种合成数据生成方法的混淆方法:反向拼写检查器和模式+词性标注。此外,我们引入了一个经过改进的序列到序列框架,配备了动态解码方法和基于Transformer的最先进神经语言模型,以提高GEC的准确性和效率。动态解码方法能够应对GEC的复杂性,并纠正各种错误,包括上下文和语法错误。所提出的模型利用单词和句子的上下文信息来生成纠正后的输出。为了评估我们提出的框架的有效性,我们使用合成数据进行了实验,并将其性能与现有的GEC系统进行了比较。结果表明,与现有方法相比,印尼语GEC的准确性有了显著提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/4771d875f621/peerj-cs-10-2122-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/750e1630abad/peerj-cs-10-2122-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/a1a6e1838d04/peerj-cs-10-2122-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/4771d875f621/peerj-cs-10-2122-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/750e1630abad/peerj-cs-10-2122-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/a1a6e1838d04/peerj-cs-10-2122-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe1/11232608/4771d875f621/peerj-cs-10-2122-g003.jpg

相似文献

1
Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.低资源场景下用于语法自动校正的动态解码与双合成数据
PeerJ Comput Sci. 2024 Jul 5;10:e2122. doi: 10.7717/peerj-cs.2122. eCollection 2024.
2
Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios.低资源场景下用于有效语法纠正的半监督学习与双向解码
PeerJ Comput Sci. 2023 Oct 24;9:e1639. doi: 10.7717/peerj-cs.1639. eCollection 2023.
3
Korean Grammatical Error Correction Based on Transformer with Copying Mechanisms and Grammatical Noise Implantation Methods.基于带有复制机制和语法噪声注入方法的 Transformer 的韩语语法错误纠正。
Sensors (Basel). 2021 Apr 10;21(8):2658. doi: 10.3390/s21082658.
4
A Computational Neural Network Model for College English Grammar Correction.基于计算神经网络的大学英语语法纠错模型。
Comput Intell Neurosci. 2022 Sep 5;2022:9592200. doi: 10.1155/2022/9592200. eCollection 2022.
5
NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning.NmTHC:一种基于具有迁移学习的生成式神经机器翻译模型的混合错误纠正方法。
BMC Genomics. 2024 Jun 7;25(1):573. doi: 10.1186/s12864-024-10446-4.
6
A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking.A7׳ta:关于用于语法检查的单语阿拉伯语平行语料库的数据。 (注:这里的“A7׳ta”可能是特定的名称或术语,由于不清楚其确切含义,所以保留原样翻译)
Data Brief. 2018 Dec 4;22:237-240. doi: 10.1016/j.dib.2018.11.146. eCollection 2019 Feb.
7
Research on Automatic Error Correction Method in English Writing Based on Deep Neural Network.基于深度神经网络的英文写作自动纠错方法研究。
Comput Intell Neurosci. 2022 Mar 10;2022:2709255. doi: 10.1155/2022/2709255. eCollection 2022.
8
Design of Chinese Grammar Recognition and Error Correction Model Based on the Deep Neural Network.基于深度神经网络的汉语语法识别与纠错模型设计。
J Environ Public Health. 2022 Aug 24;2022:2614899. doi: 10.1155/2022/2614899. eCollection 2022.
9
The neural machine translation models for the low-resource Kazakh-English language pair.针对低资源哈萨克语-英语语言对的神经机器翻译模型。
PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.
10
Automatic Correction of Real-Word Errors in Spanish Clinical Texts.西班牙语临床文本中真实错误的自动纠正。
Sensors (Basel). 2021 Apr 21;21(9):2893. doi: 10.3390/s21092893.

本文引用的文献

1
Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios.低资源场景下用于有效语法纠正的半监督学习与双向解码
PeerJ Comput Sci. 2023 Oct 24;9:e1639. doi: 10.7717/peerj-cs.1639. eCollection 2023.
2
LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search.LinearFold:通过 5'-to-3' 动态规划和束搜索进行线性时间近似 RNA 折叠。
Bioinformatics. 2019 Jul 15;35(14):i295-i304. doi: 10.1093/bioinformatics/btz375.