Suppr超能文献

基于元启发式优化算法的印地语新冠疫情文本分类集成混合模型

Ensemble hybrid model for Hindi COVID-19 text classification with metaheuristic optimization algorithm.

作者信息

Jain Vipin, Kashyap Kanchan Lata

机构信息

SCSE, VIT University Bhopal, 466114 Madhya Pradesh, India.

出版信息

Multimed Tools Appl. 2023;82(11):16839-16859. doi: 10.1007/s11042-022-13937-2. Epub 2022 Oct 24.

Abstract

A SARS-CoV-2 virus has spread around the globe since March 2020. Millions of people infected worldwide with coronavirus. People from every country expressed their sentiments about coronavirus on social media. The aim of this work is to determine the general public opinion of Indian Twitter users about coronavirus. The Hindi tweets posted about COVID-19 is used as input data for sentiment analysis. The natural language processing is applied on input data for feature extraction. Further, the optimal features are selected from the pre-processed data using the metaheuristic based Grey wolf optimization technique. Finally, a hybrid of convolution neural network(CNN) and a long short-term memory (LSTM) model pair is employed to categorize the sentiments as positive, negative, and neutral. The outcome of the proposed model is compared with other machine learning techniques, namely, Random Forest, Decision Tree, K-Nearest Neighbor, Naive Bayes, Support vector machine (SVM), CNN, LSTM, LSTM-CNN, and CNN-LSTM. The highest accuracy of 87.75%, 88.41%, 87.89%, 85.54%, 89.11%, 91.46%, 88.72%, 91.54%, and 92.34% is obtained by Random Forest, Decision Tree, K-Nearest Neighbor, Naive Bayes, SVM, CNN, LSTM, LSTM-CNN, and CNN-LSTM, respectively. The proposed ensemble hybrid model gives the highest 95.54%, 91.44%, 89.63%, and 90.87% classification accuracy, precision, recall, and F-score, respectively.

摘要

自2020年3月以来,一种新型冠状病毒(SARS-CoV-2)在全球范围内传播。全球数百万人感染了新冠病毒。来自各个国家的人们在社交媒体上表达了他们对新冠病毒的看法。这项工作的目的是确定印度推特用户对新冠病毒的公众舆论。关于COVID-19的印地语推文被用作情感分析的输入数据。将自然语言处理应用于输入数据以进行特征提取。此外,使用基于元启发式的灰狼优化技术从预处理数据中选择最优特征。最后,采用卷积神经网络(CNN)和长短期记忆(LSTM)模型对的混合模型将情感分类为积极、消极和中性。将所提出模型的结果与其他机器学习技术进行比较,即随机森林、决策树、K近邻、朴素贝叶斯、支持向量机(SVM)、CNN、LSTM、LSTM-CNN和CNN-LSTM。随机森林、决策树、K近邻、朴素贝叶斯、SVM、CNN、LSTM、LSTM-CNN和CNN-LSTM分别获得了87.75%、88.41%、87.89%、85.54%、89.11%、91.46%、88.72%、91.54%和92.34%的最高准确率。所提出的集成混合模型分别给出了95.54%、91.44%、89.63%和90.87%的最高分类准确率、精确率、召回率和F值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2350/9589711/21a744724408/11042_2022_13937_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验