Nitya Harshitha T, Prabu M, Suganya E, Sountharrajan S, Bavirisetti Durga Prasad, Gadde Navya, Uppu Lakshmi Sahithi
Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India.
Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India.
Front Artif Intell. 2024 Mar 6;7:1269366. doi: 10.3389/frai.2024.1269366. eCollection 2024.
The emergence of social media has given rise to a variety of networking and communication opportunities, as well as the well-known issue of cyberbullying, which is continuously on the rise in the current world. Researchers have been actively addressing cyberbullying for a long time by applying machine learning and deep learning techniques. However, although these algorithms have performed well on artificial datasets, they do not provide similar results when applied to real-time datasets with high levels of noise and imbalance. Consequently, finding generic algorithms that can work on dynamic data available across several platforms is critical. This study used a unique hybrid random forest-based CNN model for text classification, combining the strengths of both approaches. Real-time datasets from Twitter and Instagram were collected and annotated to demonstrate the effectiveness of the proposed technique. The performance of various ML and DL algorithms was compared, and the RF-based CNN model outperformed them in accuracy and execution speed. This is particularly important for timely detection of bullying episodes and providing assistance to victims. The model achieved an accuracy of 96% and delivered results 3.4 seconds faster than standard CNN models.
社交媒体的出现带来了各种网络和交流机会,以及网络欺凌这一广为人知且在当今世界持续上升的问题。长期以来,研究人员一直在通过应用机器学习和深度学习技术积极应对网络欺凌。然而,尽管这些算法在人工数据集上表现良好,但在应用于具有高噪声和不平衡的实时数据集时,却无法提供类似的结果。因此,找到能够处理跨多个平台的动态数据的通用算法至关重要。本研究使用了一种独特的基于混合随机森林的卷积神经网络(CNN)模型进行文本分类,结合了两种方法的优势。收集并标注了来自推特和照片墙的实时数据集,以证明所提出技术的有效性。比较了各种机器学习(ML)和深度学习(DL)算法的性能,基于随机森林的CNN模型在准确性和执行速度方面优于它们。这对于及时检测欺凌事件并为受害者提供帮助尤为重要。该模型的准确率达到了96%,比标准CNN模型快3.4秒得出结果。