Suppr超能文献

层次主题和话题建模。

Hierarchical Theme and Topic Modeling.

出版信息

IEEE Trans Neural Netw Learn Syst. 2016 Mar;27(3):565-78. doi: 10.1109/TNNLS.2015.2414658. Epub 2015 Mar 30.

Abstract

Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated.

摘要

考虑到文本语料库中的层次数据分组,例如单词、句子和文档,我们分别对句子和单词进行结构学习,并从文档集合中推断潜在主题和主题。通过无监督过程探索不同数据分组下主题和主题之间的关系,而不限制聚类的数量。提出了一种树状折断过程来为不同的句子绘制主题比例。我们构建了一个层次主题和主题模型,该模型使用贝叶斯非参数技术灵活地表示异构文档。提取主题句和主题词。在实验中,评估了所提出的方法在为句子和相应单词构建语义树结构方面的有效性。说明了使用树模型选择有表现力的句子进行文档摘要的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验