Austin Eric, Makwana Shraddha, Trabelsi Amine, Largeron Christine, Zaïane Osmar R
University of Alberta, Edmonton, AB T6G 2R3 Canada.
Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1 Canada.
Data Sci Eng. 2024;9(1):41-61. doi: 10.1007/s41019-023-00239-2. Epub 2024 Mar 13.
Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.
主题建模旨在发现文本文档集合中的潜在主题。它在社会学、观点分析和媒体研究等各个领域都有广泛应用。在这些领域中,拥有易于解释、多样且连贯的主题至关重要。一种高效的主题建模技术应该能够准确识别扁平主题和层次主题,这在那些主题可以按逻辑排列成树状格式的学科中特别有用。在本文中,我们提出了社区主题(Community Topic),这是一种利用词共现网络挖掘社区并生成主题的新颖算法。我们还使用多种指标对所提出的方法进行评估,并将其与常用基线进行比较,证实了它的良好性能。社区主题能够快速识别扁平主题和主题层次结构,便于按需探索子主题和超主题。它在不同语言的数据集上也取得了良好的结果。