Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Peshawar, Pakistan.
Department of Computer Engineering, Bahria University, Islamabad, Pakistan.
PLoS One. 2022 Mar 3;17(3):e0264481. doi: 10.1371/journal.pone.0264481. eCollection 2022.
Topic models extract latent concepts from texts in the form of topics. Lifelong topic models extend topic models by learning topics continuously based on accumulated knowledge from the past which is updated continuously as new information becomes available. Hierarchical topic modeling extends topic modeling by extracting topics and organizing them into a hierarchical structure. In this study, we combine the two and introduce hierarchical lifelong topic models. Hierarchical lifelong topic models not only allow to examine the topics at different levels of granularity but also allows to continuously adjust the granularity of the topics as more information becomes available. A fundamental issue in hierarchical lifelong topic modeling is the extraction of rules that are used to preserve the hierarchical structural information among the rules and will continuously update based on new information. To address this issue, we introduce a network communities based rule mining approach for hierarchical lifelong topic models (NHLTM). The proposed approach extracts hierarchical structural information among the rules by representing textual documents as graphs and analyzing the underlying communities in the graph. Experimental results indicate improvement of the hierarchical topic structures in terms of topic coherence that increases from general to specific topics.
主题模型以主题的形式从文本中提取潜在概念。终身主题模型通过基于过去积累的知识持续学习主题来扩展主题模型,这些知识会随着新信息的出现而不断更新。层次主题建模通过提取主题并将它们组织成层次结构来扩展主题建模。在本研究中,我们将这两种方法结合起来,引入了层次化终身主题模型。层次化终身主题模型不仅允许在不同粒度级别上检查主题,还允许随着更多信息的出现,不断调整主题的粒度。层次化终身主题建模中的一个基本问题是提取规则,这些规则用于保留规则之间的层次结构信息,并根据新信息不断更新。为了解决这个问题,我们引入了一种基于网络社区的规则挖掘方法用于层次化终身主题模型(NHLTM)。所提出的方法通过将文本文件表示为图,并分析图中的基础社区,从文本文件中提取规则之间的层次结构信息。实验结果表明,层次主题结构在主题连贯性方面得到了改进,从一般主题到具体主题的连贯性都有所提高。