HGATT_LR：利用超图注意力层和逻辑回归进行评论性文本分类

HGATT_LR: transforming review text classification with hypergraphs attention layer and logistic regression.

作者信息

Pradeepa S, Jomy Elizabeth, Vimal S, Hassan Md Mehedi, Dhiman Gaurav, Karim Asif, Kang Dongwann

机构信息

Department of Information Technology, School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, 613401, India.

Department of Computer Science and Engineering, School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, 613401, India.

出版信息

Sci Rep. 2024 Aug 23;14(1):19614. doi: 10.1038/s41598-024-70565-6.

DOI:10.1038/s41598-024-70565-6

PMID:39179733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11343862/

Abstract

Text classification plays a major role in research such as sentiment analysis, opinion mining, and customer feedback analysis. Text classification using hypergraph algorithms is effective in capturing the intricate relationships between words and phrases in documents. The method entails text preprocessing, keyword extraction, feature selection, text classification, and performance metric evaluation. Here, we proposed a Hypergraph Attention Layer with Logistic Regression (HGATT_LR) for text classification in the Amazon review data set. The essential keywords are extracted by utilizing the Latent Dirichlet Allocation (LDA) technique. To build a hypergraph attention layer, feature selection based on node-level and edge-level attention is assessed. The resultant features are passed as an input of Logistic regression for text classification. Through a comparison analysis of different text classifiers on the Amazon data set, the performance metrics are assessed. Text classification using hypergraph Attention Network has been shown to achieve 88% accuracy which is better compared to other state-of-the-art algorithms. The proposed model is scalable and may be easily enhanced with more training data. The solution highlights the utility of hypergraph approaches for text classification as well as their applicability to real-world datasets with complicated interactions between text parts. This type of analysis will empower the business people will improve the quality of the product.

摘要

文本分类在诸如情感分析、观点挖掘和客户反馈分析等研究中发挥着重要作用。使用超图算法的文本分类在捕捉文档中单词和短语之间的复杂关系方面是有效的。该方法包括文本预处理、关键词提取、特征选择、文本分类和性能指标评估。在此，我们提出了一种带有逻辑回归的超图注意力层（HGATT_LR）用于亚马逊评论数据集中的文本分类。通过利用潜在狄利克雷分配（LDA）技术提取基本关键词。为了构建超图注意力层，评估基于节点级和边级注意力的特征选择。所得特征作为逻辑回归的输入用于文本分类。通过对亚马逊数据集上不同文本分类器的比较分析，评估性能指标。使用超图注意力网络的文本分类已被证明能达到88%的准确率，与其他现有算法相比表现更好。所提出的模型具有可扩展性，并且可以通过更多训练数据轻松增强。该解决方案突出了超图方法在文本分类中的实用性及其对文本部分之间具有复杂交互的现实世界数据集的适用性。这种类型的分析将使业务人员能够提高产品质量。