Cam Handan, Cam Alper Veli, Demirel Ugur, Ahmed Sana
Department of Management Information Systems, Faculty of Economic and Administrative Science, Gumushane University, 29000, Gumushane, Turkey.
Department of Health Care Management, Faculty of Health Sciences, Gumushane University, 29000, Gumushane, Turkey.
Heliyon. 2023 Dec 17;10(1):e23784. doi: 10.1016/j.heliyon.2023.e23784. eCollection 2024 Jan 15.
This paper presents a sentiment analysis combining the lexicon-based and machine learning (ML)-based approaches in Turkish to investigate the public mood for the prediction of stock market behavior in BIST30, Borsa Istanbul. Our main motivation behind this study is to apply sentiment analysis to financial-related tweets in Turkish. We import 17189 tweets posted as "#Borsaistanbul, #Bist, #Bist30, #Bist100″ on Twitter between November 7, 2022, and November 15, 2022, via a MAXQDA 2020, a qualitative data analysis program. For the lexicon-based side, we use a multilingual sentiment offered by the Orange program to label the polarities of the 17189 samples as positive, negative, and neutral labels. Neutral labels are discarded for the machine learning experiments. For the machine learning side, we select 9076 data as positive and negative to implement the classification problem with six different supervised machine learning classifiers conducted in Python 3.6 with the sklearn library. In experiments, 80 % of the selected data is used for the training phase and the rest is used for the testing and validation phase. Results of the experiments show that the Support Vector Machine and Multilayer Perceptron classifier perform better than other classifiers with 0.89 and 0.88 accuracy and AUC values of 0.8729 and 0.8647 respectively. Other classifiers obtain approximately a 78,5 % accuracy rate. It is possible to increase sentiment analysis accuracy with parameter optimization on a larger, cleaner, and more balanced dataset by changing the pre-processing steps. This work can be expanded in the future to develop better sentiment analysis using deep learning approaches.
本文提出了一种结合基于词典和基于机器学习(ML)的方法的情感分析,用于土耳其语,以研究公众情绪,从而预测伊斯坦布尔证券交易所BIST30的股票市场行为。我们进行这项研究的主要动机是将情感分析应用于土耳其语的金融相关推文。我们通过定性数据分析程序MAXQDA 2020,导入了2022年11月7日至2022年11月15日期间在Twitter上以“#Borsaistanbul、#Bist、#Bist30、#Bist100”发布的17189条推文。对于基于词典的方面,我们使用Orange程序提供的多语言情感分析来将17189个样本的极性标记为积极、消极和中性标签。在机器学习实验中,中性标签被舍弃。对于机器学习方面,我们选择9076条数据作为积极和消极数据,使用sklearn库在Python 3.6中进行六个不同的监督机器学习分类器来实现分类问题。在实验中,80%的选定数据用于训练阶段,其余数据用于测试和验证阶段。实验结果表明,支持向量机和多层感知器分类器的表现优于其他分类器,准确率分别为0.89和0.88,AUC值分别为0.8729和0.8647。其他分类器的准确率约为78.5%。通过改变预处理步骤,在更大、更干净、更平衡的数据集上进行参数优化,可以提高情感分析的准确性。这项工作未来可以通过使用深度学习方法进行扩展,以开发更好的情感分析。