Muthusami R, Mani Kandan N, Saritha K, Narenthiran B, Nagaprasad N, Ramaswamy Krishnaraj
Department of Computer Applications, Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India.
Department of Mechanical Engineering, P.A. College of Engineering and Technology, Pollachi, 642002, Tamil Nadu, India.
Sci Rep. 2024 May 25;14(1):12003. doi: 10.1038/s41598-024-61738-4.
The online channel has affected many facets of an individual's identity, commercial, social policy, and culture, among others. It implies that discovering the topics on which these brief writings are focused, as well as examining the qualities of these short texts is critical. Another key issue that has been identified is the evaluation of newly discovered topics in terms of topic quality, which includes topic separation and coherence. A topic modeling method has been shown to be an outstanding aid in the linguistic interpretation of quite tiny texts. Based on the underlying strategy, topic models are divided into two categories: probabilistic methods and non-probabilistic methods. In this research, short texts are analyzed using topic models, including latent Dirichlet allocation (LDA) for probabilistic topic modeling and non-negative matrix factorization (NMF) for non-probabilistic topic modeling. A novel approach for topic evaluation is used, such as clustering methods and silhouette analysis on both models, to investigate performance in terms of quality. The experiment results indicate that the proposed evaluation method outperforms on both LDA and NMF.
在线渠道已经影响了个人身份、商业、社会政策和文化等多个方面。这意味着,发现这些简短文字所关注的主题,以及审视这些短文的特质至关重要。另一个已被确认的关键问题是,根据主题质量对新发现的主题进行评估,这包括主题分离和连贯性。事实证明,一种主题建模方法在对非常简短的文本进行语言解释时是一种出色的辅助手段。基于基本策略,主题模型分为两类:概率方法和非概率方法。在本研究中,使用主题模型对短文进行分析,包括用于概率主题建模的潜在狄利克雷分配(LDA)和用于非概率主题建模的非负矩阵分解(NMF)。采用了一种新颖的主题评估方法,例如对这两种模型进行聚类方法和轮廓分析,以研究其在质量方面的表现。实验结果表明,所提出的评估方法在LDA和NMF上均表现更优。