Saddozai Furqan Khan, Badri Sahar K, Alghazzawi Daniyal, Khattak Asad, Asghar Muhammad Zubair
Gomal Research Institute of Computing, Faculty of Computing, Gomal University, D.I.Khan, KP, Pakistan.
Information Systems Department, Faculty of Computing and Information Technology, King Abdul Aziz University, Jeddah, Saudi Arabia.
PeerJ Comput Sci. 2025 Apr 16;11:e2801. doi: 10.7717/peerj-cs.2801. eCollection 2025.
The rapid proliferation of social media platforms has facilitated the expression of opinions but also enabled the spread of hate speech. Detecting multimodal hate speech in low-resource multilingual contexts poses significant challenges. This study presents a deep learning framework that integrates bidirectional long short-term memory (BiLSTM) and EfficientNetB1 to classify hate speech in Urdu-English tweets, leveraging both text and image modalities. We introduce multimodal multilingual hate speech (MMHS11K), a manually annotated dataset comprising 11,000 multimodal tweets. Using an early fusion strategy, text and image features were combined for classification. Experimental results demonstrate that the BiLSTM+EfficientNetB1 model outperforms unimodal and baseline multimodal approaches, achieving an F1-score of 81.2% for Urdu tweets and 75.5% for English tweets. This research addresses critical gaps in multilingual and multimodal hate speech detection, offering a foundation for future advancements.
社交媒体平台的迅速扩散既促进了观点的表达,但也使得仇恨言论得以传播。在资源匮乏的多语言环境中检测多模态仇恨言论面临着重大挑战。本研究提出了一个深度学习框架,该框架整合了双向长短期记忆(BiLSTM)和高效神经网络B1(EfficientNetB1),以利用文本和图像模态对乌尔都语-英语推文中的仇恨言论进行分类。我们引入了多模态多语言仇恨言论(MMHS11K),这是一个包含11000条多模态推文的人工标注数据集。使用早期融合策略,将文本和图像特征结合起来进行分类。实验结果表明,BiLSTM+EfficientNetB1模型优于单模态和基线多模态方法,乌尔都语推文的F1分数达到81.2%,英语推文的F1分数达到75.5%。本研究解决了多语言和多模态仇恨言论检测中的关键空白,为未来的进展奠定了基础。