Jamil Ramish, Ashraf Imran, Rustam Furqan, Saad Eysha, Mehmood Arif, Choi Gyu Sang
Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan.
Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea.
PeerJ Comput Sci. 2021 Aug 25;7:e645. doi: 10.7717/peerj-cs.645. eCollection 2021.
Sarcasm emerges as a common phenomenon across social networking sites because people express their negative thoughts, hatred and opinions using positive vocabulary which makes it a challenging task to detect sarcasm. Although various studies have investigated the sarcasm detection on baseline datasets, this work is the first to detect sarcasm from a multi-domain dataset that is constructed by combining Twitter and News Headlines datasets. This study proposes a hybrid approach where the convolutional neural networks (CNN) are used for feature extraction while the long short-term memory (LSTM) is trained and tested on those features. For performance analysis, several machine learning algorithms such as random forest, support vector classifier, extra tree classifier and decision tree are used. The performance of both the proposed model and machine learning algorithms is analyzed using the term frequency-inverse document frequency, bag of words approach, and global vectors for word representations. Experimental results indicate that the proposed model surpasses the performance of the traditional machine learning algorithms with an accuracy of 91.60%. Several state-of-the-art approaches for sarcasm detection are compared with the proposed model and results suggest that the proposed model outperforms these approaches concerning the precision, recall and F1 scores. The proposed model is accurate, robust, and performs sarcasm detection on a multi-domain dataset.
讽刺在社交网站上是一种常见现象,因为人们使用积极的词汇来表达他们的负面想法、仇恨和观点,这使得检测讽刺成为一项具有挑战性的任务。尽管各种研究已经对基线数据集上的讽刺检测进行了调查,但这项工作是首次从通过合并推特和新闻标题数据集构建的多领域数据集中检测讽刺。本研究提出了一种混合方法,其中卷积神经网络(CNN)用于特征提取,而长短期记忆网络(LSTM)则基于这些特征进行训练和测试。为了进行性能分析,使用了几种机器学习算法,如随机森林、支持向量分类器、极端随机树分类器和决策树。使用词频逆文档频率、词袋方法和词表示的全局向量来分析所提出模型和机器学习算法的性能。实验结果表明,所提出的模型以91.60%的准确率超过了传统机器学习算法的性能。将几种先进的讽刺检测方法与所提出的模型进行比较,结果表明,在所提出的模型在精确率、召回率和F1分数方面优于这些方法。所提出的模型准确、稳健,并且能够在多领域数据集上进行讽刺检测。