Singh Gopendra Vikram, Ghosh Soumitra, Firdaus Mauajama, Ekbal Asif, Bhattacharyya Pushpak
Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103, India.
University of Alberta, Edmonton, Alberta, Canada.
Sci Rep. 2024 May 28;14(1):12204. doi: 10.1038/s41598-024-58944-5.
In the era of social media, the use of emojis and code-mixed language has become essential in online communication. However, selecting the appropriate emoji that matches a particular sentiment or emotion in the code-mixed text can be difficult. This paper presents a novel task of predicting multiple emojis in English-Hindi code-mixed sentences and proposes a new dataset called SENTIMOJI, which extends the SemEval 2020 Task 9 SentiMix dataset. Our approach is based on exploiting the relationship between emotion, sentiment, and emojis to build an end-to-end framework. We replace the self-attention sublayers in the transformer encoder with simple linear transformations and use the RMS-layer norm instead of the normal layer norm. Moreover, we employ Gated Linear Unit and Fully Connected layers to predict emojis and identify the emotion and sentiment of a tweet. Our experimental results on the SENTIMOJI dataset demonstrate that the proposed multi-task framework outperforms the single-task framework. We also show that emojis are strongly linked to sentiment and emotion and that identifying sentiment and emotion can aid in accurately predicting the most suitable emoji. Our work contributes to the field of natural language processing and can help in the development of more effective tools for sentiment analysis and emotion recognition in code-mixed languages. The codes and data will be available at https://www.iitp.ac.in/~ai-nlp-ml/resources.html#SENTIMOJI to facilitate research.
在社交媒体时代,表情符号和语码混合语言的使用在在线交流中变得至关重要。然而,在语码混合文本中选择与特定情感相匹配的合适表情符号可能很困难。本文提出了一项在英语-印地语语码混合句子中预测多个表情符号的新任务,并提出了一个名为SENTIMOJI的新数据集,该数据集扩展了SemEval 2020任务9的SentiMix数据集。我们的方法基于利用情感、情绪和表情符号之间的关系来构建一个端到端框架。我们用简单的线性变换替换了Transformer编码器中的自注意力子层,并使用RMS层归一化代替了普通层归一化。此外,我们采用门控线性单元和全连接层来预测表情符号,并识别推文的情绪和情感。我们在SENTIMOJI数据集上的实验结果表明,所提出的多任务框架优于单任务框架。我们还表明,表情符号与情感和情绪紧密相关,识别情感和情绪有助于准确预测最合适的表情符号。我们的工作为自然语言处理领域做出了贡献,并有助于开发更有效的工具,用于语码混合语言中的情感分析和情绪识别。代码和数据将在https://www.iitp.ac.in/~ai-nlp-ml/resources.html#SENTIMOJI上提供,以方便研究。