Ding Ting, Qu Tonghui, Zou Zongliang, Ding Cheng
School of Earth Science, East China University of Technology, Nanchang, Jiangxi, China.
Urumqi Comprehensive Survey Center on Natural Resources, China Geological Survey, Urumqi, Xinjiang, China.
PeerJ Comput Sci. 2024 Oct 28;10:e2301. doi: 10.7717/peerj-cs.2301. eCollection 2024.
Automated expert systems (AES) analyzing depression-related content on social media have piqued the interest of researchers. Depression, often linked to suicide, requires early prediction for potential life-saving interventions. In the conventional approach, psychologists conduct patient interviews or administer questionnaires to assess depression levels. However, this traditional method is plagued by limitations. Patients might not feel comfortable disclosing their true feelings to psychologists, and counselors may struggle to accurately predict situations due to limited data. In this context, social media emerges as a potentially valuable resource. Given the widespread use of social media in daily life, individuals often express their nature and mental state through their online posts. AES can efficiently analyze vast amounts of social media content to predict depression levels in individuals at an early stage. This study contributes to this endeavor by proposing an innovative approach for predicting suicide risks using social media content and machine learning techniques. A novel multi-model feature generation technique is employed to enhance the performance of machine learning models. This technique involves the use of a feature extraction method known as term frequency-inverse document frequency (TF-IDF), combined with two machine learning models: logistic regression (LR) and support vector machine (SVM). The proposed technique calculates probabilities for each sample in the dataset, resulting in a new feature set referred to as the probability-based feature set (ProBFS). This ProBFS is compact yet highly correlated with the target classes in the dataset. The utilization of concise and correlated features yields significant outcomes. The SVM model achieves an impressive accuracy score of 0.96 using ProBFS while maintaining a low computational time of 5.63 seconds even when dealing with extensive datasets. Furthermore, a comparison with state-of-the-art approaches is conducted to demonstrate the significance of the proposed method.
分析社交媒体上与抑郁症相关内容的自动化专家系统(AES)引起了研究人员的兴趣。抑郁症常与自杀有关,需要早期预测以便进行可能挽救生命的干预。在传统方法中,心理学家通过对患者进行访谈或发放问卷来评估抑郁程度。然而,这种传统方法存在局限性。患者可能不愿意向心理学家透露自己的真实感受,而且由于数据有限,咨询师可能难以准确预测情况。在这种背景下,社交媒体成为一种潜在的宝贵资源。鉴于社交媒体在日常生活中的广泛使用,个人经常通过他们的在线帖子表达自己的性格和心理状态。AES可以有效地分析大量社交媒体内容,以早期预测个体的抑郁程度。本研究通过提出一种利用社交媒体内容和机器学习技术预测自杀风险的创新方法,为这一努力做出了贡献。采用了一种新颖的多模型特征生成技术来提高机器学习模型的性能。该技术涉及使用一种称为词频 - 逆文档频率(TF-IDF)的特征提取方法,并结合两种机器学习模型:逻辑回归(LR)和支持向量机(SVM)。所提出的技术计算数据集中每个样本的概率,从而产生一个新的特征集,称为基于概率的特征集(ProBFS)。这个ProBFS简洁但与数据集中的目标类别高度相关。使用简洁且相关的特征产生了显著的结果。SVM模型使用ProBFS时实现了令人印象深刻的0.96的准确率得分,即使在处理大量数据集时,计算时间也保持在5.63秒的低水平。此外,还与现有最先进的方法进行了比较,以证明所提出方法的重要性。