College of Computer Science, Huaiyin Institute of Technology, Huaian 223003, China.
Comput Intell Neurosci. 2022 May 23;2022:1787369. doi: 10.1155/2022/1787369. eCollection 2022.
Keywords are usually one or more words or phrases that describe the subject information of the document. The traditional automatic keywords extraction methods cannot obtain the keywords which do not appear in the document, and the semantic information is not considered in the extraction process. In this paper, we introduce a novel Keyword Generation Model based on Topic-aware and Title-guide (KGM-TT). In the KGM-TT, the neural topic model is used to identify the latent topic words, and a hierarchical encoder technology with attention mechanism is able to encode the title and its content, respectively. The keywords are generated by the recurrent neural network with attention and replication mechanism in our model. This model can not only generate the keywords which do not appeared in the source document but also use the topic information and the highly summative word meaning in the title to assist the generation of keywords. The experimental results show that the 1 value of this model is about 10% higher than that of CopyRNN and CopyCNN.
关键词通常是描述文档主题信息的一个或多个单词或短语。传统的自动关键词提取方法无法获取文档中未出现的关键词,并且在提取过程中不考虑语义信息。在本文中,我们引入了一种基于主题感知和标题引导的新型关键词生成模型(KGM-TT)。在 KGM-TT 中,使用神经主题模型来识别潜在的主题词,并且具有注意力机制的分层编码器技术能够分别对标题及其内容进行编码。关键词是通过我们模型中的具有注意力和复制机制的循环神经网络生成的。该模型不仅可以生成文档中未出现的关键词,还可以利用主题信息和标题中高度总结性的词义来辅助关键词的生成。实验结果表明,该模型的 1 值比 CopyRNN 和 CopyCNN 高约 10%。