Suppr超能文献

基于自对比学习的文本释义生成研究

Research of text paraphrase generation based on self-contrastive learning.

作者信息

Yuan Ling, Yu Hai Ping, Ren Junlin, Sun Ping

机构信息

School of Computing Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, People's Republic of China.

Wuhan Vocational College of Software and Engineering (Wuhan Open University), Wuhan, Hubei, People's Republic of China.

出版信息

PLoS One. 2025 Sep 2;20(9):e0327613. doi: 10.1371/journal.pone.0327613. eCollection 2025.

Abstract

The goal of this study is to improve the quality and diversity of text paraphrase generation, a critical task in Natural Language Generation (NLG) that requires producing semantically equivalent sentences with varied structures and expressions. Existing approaches often fail to generate paraphrases that are both high-quality and diverse, limiting their applicability in tasks such as machine translation, dialogue systems, and automated content rewriting. To address this gap, we introduce two self-contrastive learning models designed to enhance paraphrase generation: the Contrastive Generative Adversarial Network (ContraGAN) for supervised learning and the Contrastive Model with Metrics (ContraMetrics) for unsupervised learning. ContraGAN leverages a learnable discriminator within an adversarial framework to refine the quality of generated paraphrases, while ContraMetrics incorporates multi-metric filtering and keyword-guided prompts to improve unsupervised generation diversity. Experiments on benchmark datasets demonstrate that both models achieve significant improvements over state-of-the-art methods. ContraGAN enhances semantic fidelity with a 0.46 gain in BERTScore and improves fluency with a 1.57 reduction in perplexity. In addition, ContraMetrics achieves gains of 0.37 and 3.34 in iBLEU and P-BLEU, respectively, reflecting greater diversity and lexical richness. These results validate the effectiveness of our models in addressing key challenges in paraphrase generation, offering practical solutions for diverse NLG applications.

摘要

本研究的目标是提高文本释义生成的质量和多样性,这是自然语言生成(NLG)中的一项关键任务,需要生成结构和表达各异但语义等效的句子。现有方法往往无法生成高质量且多样的释义,限制了它们在机器翻译、对话系统和自动内容重写等任务中的适用性。为了弥补这一差距,我们引入了两种旨在增强释义生成的自对比学习模型:用于监督学习的对比生成对抗网络(ContraGAN)和用于无监督学习的带度量的对比模型(ContraMetrics)。ContraGAN在对抗框架内利用可学习的判别器来优化生成释义的质量,而ContraMetrics结合了多度量过滤和关键词引导的提示来提高无监督生成的多样性。在基准数据集上的实验表明,这两种模型都比现有方法有显著改进。ContraGAN在BERTScore上提高了0.46,增强了语义保真度,并在困惑度上降低了1.57,提高了流畅性。此外,ContraMetrics在iBLEU和P-BLEU上分别提高了0.37和3.34,反映出更高的多样性和词汇丰富度。这些结果验证了我们的模型在解决释义生成中的关键挑战方面的有效性,为各种NLG应用提供了切实可行的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b5d/12404545/27860d92f6ab/pone.0327613.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验