Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.
Department of Computer Science and Engineering, Varendra University, Rajshahi, Bangladesh.
PLoS One. 2024 Jul 15;19(7):e0307027. doi: 10.1371/journal.pone.0307027. eCollection 2024.
The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment difficult in resource-limited environments. In this paper, we process both structured and unstructured data, determining the polarity of text using the TextBlob scheme to determine the sentiment of news headlines. We propose a Stochastic Gradient Descent (SGD)-based Ridge classifier (RC) for blending SGDR with an advanced string processing technique to effectively classify news articles. Additionally, we explore existing supervised and unsupervised ML algorithms to gauge the effectiveness of our SGDR classifier. The scalability and generalization capability of SGD and L2 regularization techniques in RCs to handle overfitting and balance bias and variance provide the proposed SGDR with better classification capability. Experimental results highlight that our string processing pipeline significantly boosts the performance of all ML models. Notably, our ensemble SGDR classifier surpasses all state-of-the-art ML algorithms, achieving an impressive 98.12% accuracy. McNemar's significance tests reveal that our SGDR classifier achieves a 1% significance level improvement over K-Nearest Neighbor, Decision Tree, and AdaBoost and a 5% significance level improvement over other algorithms. These findings underscore the superior proficiency of linear models in news categorization compared to tree-based and nonlinear counterparts. This study contributes valuable insights into the efficacy of the proposed methodology, elucidating its potential for news categorization and sentiment analysis.
社交媒体的兴起改变了人们对联系的看法。基于机器学习(ML)的情感分析和新闻分类有助于理解情感和获取新闻。然而,大多数研究都集中在需要大量资源和减缓推理时间的复杂模型上,这使得在资源有限的环境中难以部署。在本文中,我们处理了结构化和非结构化数据,使用 TextBlob 方案确定文本的极性,以确定新闻标题的情感。我们提出了一种基于随机梯度下降(SGD)的岭分类器(RC),用于将 SGDR 与先进的字符串处理技术混合,以有效地对新闻文章进行分类。此外,我们还探索了现有的监督和无监督 ML 算法,以衡量我们的 SGDR 分类器的有效性。SGD 和 L2 正则化技术在 RC 中的可扩展性和泛化能力可以处理过拟合并平衡偏差和方差,为所提出的 SGDR 提供了更好的分类能力。实验结果表明,我们的字符串处理管道显著提高了所有 ML 模型的性能。值得注意的是,我们的集成 SGDR 分类器超越了所有最先进的 ML 算法,达到了令人印象深刻的 98.12%的准确率。McNemar 的显著性检验表明,我们的 SGDR 分类器在 K-最近邻、决策树和 AdaBoost 方面提高了 1%的显著性水平,在其他算法方面提高了 5%的显著性水平。这些发现强调了线性模型在新闻分类方面相对于树状和非线性模型的卓越优势。本研究为所提出方法的有效性提供了有价值的见解,阐明了其在新闻分类和情感分析方面的潜力。