Suppr超能文献

带有准确内容复制的表格到文本生成。

Table to text generation with accurate content copying.

机构信息

State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, 100024, China.

出版信息

Sci Rep. 2021 Nov 23;11(1):22750. doi: 10.1038/s41598-021-00813-6.

Abstract

Generating fluent, coherent, and informative text from structured data is called table-to-text generation. Copying words from the table is a common method to solve the "out-of-vocabulary" problem, but it's difficult to achieve accurate copying. In order to overcome this problem, we invent an auto-regressive framework based on the transformer that combines a copying mechanism and language modeling to generate target texts. Firstly, to make the model better learn the semantic relevance between table and text, we apply a word transformation method, which incorporates the field and position information into the target text to acquire the position of where to copy. Then we propose two auxiliary learning objectives, namely table-text constraint loss and copy loss. Table-text constraint loss is used to effectively model table inputs, whereas copy loss is exploited to precisely copy word fragments from a table. Furthermore, we improve the text search strategy to reduce the probability of generating incoherent and repetitive sentences. The model is verified by experiments on two datasets and better results are obtained than the baseline model. On WIKIBIO, the result is improved from 45.47 to 46.87 on BLEU and from 41.54 to 42.28 on ROUGE. On ROTOWIRE, the result is increased by 4.29% on CO metric, and 1.93 points higher on BLEU.

摘要

从结构化数据生成流畅、连贯、信息丰富的文本称为表格到文本生成。从表格中复制单词是解决“词汇外”问题的常用方法,但很难做到准确复制。为了克服这个问题,我们发明了一种基于转换器的自回归框架,该框架结合了复制机制和语言建模来生成目标文本。首先,为了使模型更好地学习表格和文本之间的语义相关性,我们应用了一种词转换方法,该方法将字段和位置信息纳入目标文本,以获取要复制的位置。然后,我们提出了两个辅助学习目标,即表格-文本约束损失和复制损失。表格-文本约束损失用于有效地对表格输入进行建模,而复制损失用于从表格中精确地复制单词片段。此外,我们改进了文本搜索策略,以降低生成不连贯和重复句子的概率。该模型在两个数据集上进行了验证,与基线模型相比,取得了更好的结果。在 WIKIBIO 上,BLEU 从 45.47 提高到 46.87,ROUGE 从 41.54 提高到 42.28。在 ROTOWIRE 上,CO 指标提高了 4.29%,BLEU 提高了 1.93 分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5715/8611016/866abd851055/41598_2021_813_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验