Suppr超能文献

基于文本的社交媒体抑郁症预测:机器学习的系统评价与荟萃分析

Text-Based Depression Prediction on Social Media Using Machine Learning: Systematic Review and Meta-Analysis.

作者信息

Phiri Doreen, Makowa Frank, Amelia Vivi Leona, Phiri Yohane Vincent Abero, Dlamini Lindelwa Portia, Chung Min-Huey

机构信息

School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan.

Department of Information and Communication Technology, University of North Carolina Project, Lilongwe, Malawi.

出版信息

J Med Internet Res. 2025 Apr 11;27:e59002. doi: 10.2196/59002.

Abstract

BACKGROUND

Depression affects more than 350 million people globally. Traditional diagnostic methods have limitations. Analyzing textual data from social media provides new insights into predicting depression using machine learning. However, there is a lack of comprehensive reviews in this area, which necessitates further research.

OBJECTIVE

This review aims to assess the effectiveness of user-generated social media texts in predicting depression and evaluate the influence of demographic, language, social media activity, and temporal features on predicting depression on social media texts through machine learning.

METHODS

We searched studies from 11 databases (CINHAL [through EBSCOhost], PubMed, Scopus, Ovid MEDLINE, Embase, PubPsych, Cochrane Library, Web of Science, ProQuest, IEEE Explore, and ACM digital library) from January 2008 to August 2023. We included studies that used social media texts, machine learning, and reported area under the curve, Pearson r, and specificity and sensitivity (or data used for their calculation) to predict depression. Protocol papers and studies not written in English were excluded. We extracted study characteristics, population characteristics, outcome measures, and prediction factors from each study. A random effects model was used to extract the effect sizes with 95% CIs. Study heterogeneity was evaluated using forest plots and P values in the Cochran Q test. Moderator analysis was performed to identify the sources of heterogeneity.

RESULTS

A total of 36 studies were included. We observed a significant overall correlation between social media texts and depression, with a large effect size (r=0.630, 95% CI 0.565-0.686). We noted the same correlation and large effect size for demographic (largest effect size; r=0.642, 95% CI 0.489-0.757), social media activity (r=0.552, 95% CI 0.418-0.663), language (r=0.545, 95% CI 0.441-0.649), and temporal features (r=0.531, 95% CI 0.320-0.693). The social media platform type (public or private; P<.001), machine learning approach (shallow or deep; P=.048), and use of outcome measures (yes or no; P<.001) were significant moderators. Sensitivity analysis revealed no change in the results, indicating result stability. The Begg-Mazumdar rank correlation (Kendall τ=0.22063; P=.058) and the Egger test (2-tailed t=1.28696; P=.207) confirmed the absence of publication bias.

CONCLUSIONS

Social media textual content can be a useful tool for predicting depression. Demographics, language, social media activity, and temporal features should be considered to maximize the accuracy of depression prediction models. Additionally, the effects of social media platform type, machine learning approach, and use of outcome measures in depression prediction models need attention. Analyzing social media texts for depression prediction is challenging, and findings may not apply to a broader population. Nevertheless, our findings offer valuable insights for future research.

TRIAL REGISTRATION

PROSPERO CRD42023427707; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023427707.

摘要

背景

抑郁症在全球影响着超过3.5亿人。传统的诊断方法存在局限性。分析社交媒体的文本数据为利用机器学习预测抑郁症提供了新的见解。然而,该领域缺乏全面的综述,这需要进一步研究。

目的

本综述旨在评估用户生成的社交媒体文本在预测抑郁症方面的有效性,并通过机器学习评估人口统计学、语言、社交媒体活动和时间特征对社交媒体文本预测抑郁症的影响。

方法

我们检索了2008年1月至2023年8月期间11个数据库(CINHAL[通过EBSCOhost]、PubMed、Scopus、Ovid MEDLINE、Embase、PubPsych、Cochrane图书馆、科学网、ProQuest、IEEE Xplore和ACM数字图书馆)中的研究。我们纳入了使用社交媒体文本、机器学习,并报告曲线下面积、Pearson相关系数r以及特异性和敏感性(或用于计算它们的数据)来预测抑郁症的研究。方案论文和非英文撰写的研究被排除。我们从每项研究中提取研究特征、人群特征、结局指标和预测因素。使用随机效应模型提取效应量及95%置信区间。使用森林图和Cochran Q检验中的P值评估研究异质性。进行调节因素分析以确定异质性的来源。

结果

共纳入36项研究。我们观察到社交媒体文本与抑郁症之间存在显著的总体相关性,效应量较大(r = 0.630,95%置信区间0.565 - 0.686)。我们注意到人口统计学(效应量最大;r = 0.642,95%置信区间0.489 - 0.757)、社交媒体活动(r = 0.552,95%置信区间0.418 - 0.663)、语言(r = 0.545,95%置信区间0.441 - 0.649)和时间特征(r = 0.531,95%置信区间0.320 - 0.693)也存在相同的相关性且效应量较大。社交媒体平台类型(公共或私人;P <.001)、机器学习方法(浅层或深层;P =.048)以及结局指标的使用(是或否;P <.001)是显著的调节因素。敏感性分析显示结果无变化,表明结果稳定。Begg - Mazumdar等级相关(Kendall τ = 0.22063;P =.058)和Egger检验(双侧t = 1.28696;P =.207)证实不存在发表偏倚。

结论

社交媒体文本内容可以成为预测抑郁症的有用工具。应考虑人口统计学、语言、社交媒体活动和时间特征,以最大限度提高抑郁症预测模型的准确性。此外,社交媒体平台类型、机器学习方法以及结局指标在抑郁症预测模型中的作用需要关注。分析社交媒体文本进行抑郁症预测具有挑战性,研究结果可能不适用于更广泛的人群。尽管如此,我们的研究结果为未来研究提供了有价值的见解。

试验注册

PROSPERO CRD42023427707;https://www.crd.york.ac.uk/PROSPERO/view/CRD42023427707。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2728/12032503/ac0daa47559e/jmir_v27i1e59002_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验