Suppr超能文献

通过词共现网络上的社区发现来揭示扁平主题和层次主题

Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network.

作者信息

Austin Eric, Makwana Shraddha, Trabelsi Amine, Largeron Christine, Zaïane Osmar R

机构信息

University of Alberta, Edmonton, AB T6G 2R3 Canada.

Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1 Canada.

出版信息

Data Sci Eng. 2024;9(1):41-61. doi: 10.1007/s41019-023-00239-2. Epub 2024 Mar 13.

Abstract

Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.

摘要

主题建模旨在发现文本文档集合中的潜在主题。它在社会学、观点分析和媒体研究等各个领域都有广泛应用。在这些领域中,拥有易于解释、多样且连贯的主题至关重要。一种高效的主题建模技术应该能够准确识别扁平主题和层次主题,这在那些主题可以按逻辑排列成树状格式的学科中特别有用。在本文中,我们提出了社区主题(Community Topic),这是一种利用词共现网络挖掘社区并生成主题的新颖算法。我们还使用多种指标对所提出的方法进行评估,并将其与常用基线进行比较,证实了它的良好性能。社区主题能够快速识别扁平主题和主题层次结构,便于按需探索子主题和超主题。它在不同语言的数据集上也取得了良好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0689/10980674/df10e109e384/41019_2023_239_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验