Suppr超能文献

用于尼泊尔 COVID-19 相关推文分类的混合特征提取方法。

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification.

机构信息

Central Department of Computer Science and Information Technology, Tribhuvan University, 44600 Kathmandu, Nepal.

School of Engineering and Technology, Central Queensland University, Rockhampton 4701, QLD, Australia.

出版信息

Comput Intell Neurosci. 2022 Mar 9;2022:5681574. doi: 10.1155/2022/5681574. eCollection 2022.

Abstract

COVID-19 is one of the deadliest viruses, which has killed millions of people around the world to this date. The reason for peoples' death is not only linked to its infection but also to peoples' mental states and sentiments triggered by the fear of the virus. People's sentiments, which are predominantly available in the form of posts/tweets on social media, can be interpreted using two kinds of information: syntactical and semantic. Herein, we propose to analyze peoples' sentiment using both kinds of information (syntactical and semantic) on the COVID-19-related twitter dataset available in the Nepali language. For this, we, first, use two widely used text representation methods: TF-IDF and FastText and then combine them to achieve the hybrid features to capture the highly discriminating features. Second, we implement nine widely used machine learning classifiers (Logistic Regression, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, Decision Trees, Random Forest, Extreme Tree classifier, AdaBoost, and Multilayer Perceptron), based on the three feature representation methods: TF-IDF, FastText, and Hybrid. To evaluate our methods, we use a publicly available Nepali-COVID-19 tweets dataset, NepCov19Tweets, which consists of Nepali tweets categorized into three classes (Positive, Negative, and Neutral). The evaluation results on the NepCOV19Tweets show that the hybrid feature extraction method not only outperforms the other two individual feature extraction methods while using nine different machine learning algorithms but also provides excellent performance when compared with the state-of-the-art methods.

摘要

新冠病毒是一种致命病毒,迄今为止已在全球范围内导致数百万人死亡。人们死亡的原因不仅与感染有关,还与人们因感染病毒而产生的精神状态和情绪有关。人们的情绪主要以社交媒体上的帖子/推文的形式出现,可以通过两种信息来解释:语法和语义。在此,我们提出在尼泊尔语的新冠相关推特数据集中,使用语法和语义两种信息来分析人们的情绪。为此,我们首先使用两种广泛使用的文本表示方法:TF-IDF 和 FastText,然后将它们结合起来以获得混合特征,从而捕获高度区分的特征。其次,我们根据三种特征表示方法(TF-IDF、FastText 和 Hybrid)实现了九种广泛使用的机器学习分类器(逻辑回归、支持向量机、朴素贝叶斯、K-最近邻、决策树、随机森林、极端树分类器、AdaBoost 和多层感知机)。为了评估我们的方法,我们使用了一个公开的尼泊尔语新冠病毒推文数据集 NepCov19Tweets,该数据集包含分类为三类(积极、消极和中性)的尼泊尔语推文。在 NepCOV19Tweets 上的评估结果表明,混合特征提取方法不仅在使用九种不同机器学习算法时优于其他两种单独的特征提取方法,而且与最先进的方法相比,也提供了出色的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb00/8906125/b958fbdfcd32/CIN2022-5681574.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验