• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社交媒体上的公共卫生讨论:评估自动情感分析方法

Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods.

作者信息

Gandy Lisa M, Ivanitskaya Lana V, Bacon Leeza L, Bizri-Baryak Rodina

机构信息

Department of Computer Science, College of Sciences and Liberal Arts, Kettering University, Flint, MI, United States.

Department of Health Administration, The College of Health Professions, Central Michigan University, Mt Pleasant, MI, United States.

出版信息

JMIR Form Res. 2025 Jan 8;9:e57395. doi: 10.2196/57395.

DOI:10.2196/57395
PMID:39773420
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784633/
Abstract

BACKGROUND

Sentiment analysis is one of the most widely used methods for mining and examining text. Social media researchers need guidance on choosing between manual and automated sentiment analysis methods.

OBJECTIVE

Popular sentiment analysis tools based on natural language processing (NLP; VADER [Valence Aware Dictionary for Sentiment Reasoning], TEXT2DATA [T2D], and Linguistic Inquiry and Word Count [LIWC-22]), and a large language model (ChatGPT 4.0) were compared with manually coded sentiment scores, as applied to the analysis of YouTube comments on videos discussing the opioid epidemic. Sentiment analysis methods were also examined regarding ease of programming, monetary cost, and other practical considerations.

METHODS

Evaluation methods included descriptive statistics, receiver operating characteristic (ROC) curve analysis, confusion matrices, Cohen κ, accuracy, specificity, precision, sensitivity (recall), F-score harmonic mean, and the Matthews correlation coefficient. An inductive, iterative approach to content analysis of the data was used to obtain manual sentiment codes.

RESULTS

A subset of comments were analyzed by a second coder, producing good agreement between the 2 coders' judgments (κ=0.734). YouTube social media about the opioid crisis had many more negative comments (4286/4871, 88%) than positive comments (79/662, 12%), making it possible to evaluate the performance of sentiment analysis models in an unbalanced dataset. The tone summary measure from LIWC-22 performed better than other tools for estimating the prevalence of negative versus positive sentiment. According to the ROC curve analysis, VADER was best at classifying manually coded negative comments. A comparison of Cohen κ values indicated that NLP tools (VADER, followed by LIWC's tone and T2D) showed only fair agreement with manual coding. In contrast, ChatGPT 4.0 had poor agreement and failed to generate binary sentiment scores in 2 out of 3 attempts. Variations in accuracy, specificity, precision, sensitivity, F-score, and MCC did not reveal a single superior model. F-score harmonic means were 0.34-0.38 (SD 0.02) for NLP tools and very low (0.13) for ChatGPT 4.0. None of the MCCs reached a strong correlation level.

CONCLUSIONS

Researchers studying negative emotions, public worries, or dissatisfaction with social media face unique challenges in selecting models suitable for unbalanced datasets. We recommend VADER, the only cost-free tool we evaluated, due to its excellent discrimination, which can be further improved when the comments are at least 100 characters long. If estimating the prevalence of negative comments in an unbalanced dataset is important, we recommend the tone summary measure from LIWC-22. Researchers using T2D must know that it may only score some data and, compared with other methods, be more time-consuming and cost-prohibitive. A general-purpose large language model, ChatGPT 4.0, has yet to surpass the performance of NLP models, at least for unbalanced datasets with highly prevalent (7:1) negative comments.

摘要

背景

情感分析是文本挖掘和研究中使用最广泛的方法之一。社交媒体研究人员在选择手动和自动情感分析方法时需要指导。

目的

将基于自然语言处理的流行情感分析工具(NLP;情感推理的价态感知词典[VADER]、文本到数据[T2D]和语言查询与字数统计[LIWC-22])以及一个大语言模型(ChatGPT 4.0)与手动编码的情感分数进行比较,应用于分析YouTube上关于讨论阿片类药物流行的视频的评论。还从编程的难易程度、货币成本和其他实际考虑因素方面对情感分析方法进行了研究。

方法

评估方法包括描述性统计、受试者操作特征(ROC)曲线分析、混淆矩阵、科恩κ系数、准确性、特异性、精确性、敏感性(召回率)、F分数调和均值以及马修斯相关系数。采用归纳、迭代的方法对数据进行内容分析以获得手动情感编码。

结果

一部分评论由第二位编码员进行分析,两位编码员的判断之间达成了良好的一致性(κ=0.734)。关于阿片类药物危机的YouTube社交媒体上负面评论(4286/4871,88%)比正面评论(79/662,12%)多得多,这使得在不平衡数据集中评估情感分析模型的性能成为可能。LIWC-22的语气总结度量在估计负面与正面情感的普遍性方面比其他工具表现更好。根据ROC曲线分析,VADER在对手动编码的负面评论进行分类方面表现最佳。科恩κ值的比较表明,NLP工具(VADER,其次是LIWC的语气和T2D)与手动编码的一致性仅为一般。相比之下,ChatGPT 4.0的一致性较差,在3次尝试中有2次未能生成二元情感分数。准确性、特异性、精确性、敏感性、F分数和MCC的变化并未揭示出单一的优越模型。NLP工具的F分数调和均值为0.34 - 0.38(标准差0.02),ChatGPT 4.0的则非常低(0.13)。没有一个MCC达到强相关水平。

结论

研究负面情绪、公众担忧或对社交媒体不满的研究人员在选择适合不平衡数据集的模型时面临独特挑战。由于其出色的辨别能力,我们推荐VADER,这是我们评估的唯一免费工具,当评论至少100个字符长时,其性能可进一步提高。如果在不平衡数据集中估计负面评论的普遍性很重要,我们推荐LIWC-22的语气总结度量。使用T2D的研究人员必须知道,它可能只能对一些数据进行评分,并且与其他方法相比,更耗时且成本高昂。一个通用的大语言模型ChatGPT 4.0尚未超越NLP模型的性能,至少对于具有高度普遍(7:1)负面评论的不平衡数据集是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/9fad2d448c42/formative_v9i1e57395_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/b5690d00774f/formative_v9i1e57395_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/7f3181b86ea0/formative_v9i1e57395_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/809db9b93c18/formative_v9i1e57395_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/9fad2d448c42/formative_v9i1e57395_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/b5690d00774f/formative_v9i1e57395_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/7f3181b86ea0/formative_v9i1e57395_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/809db9b93c18/formative_v9i1e57395_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/604f/11784633/9fad2d448c42/formative_v9i1e57395_fig4.jpg

相似文献

1
Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods.社交媒体上的公共卫生讨论:评估自动情感分析方法
JMIR Form Res. 2025 Jan 8;9:e57395. doi: 10.2196/57395.
2
Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.X(前身为 Twitter)上处方药引用的数字流行病学:神经网络主题建模和情感分析。
J Med Internet Res. 2024 Aug 23;26:e57885. doi: 10.2196/57885.
3
Exploring the Social Media Discussion of Breast Cancer Treatment Choices: Quantitative Natural Language Processing Study.探索社交媒体上关于乳腺癌治疗选择的讨论:定量自然语言处理研究。
JMIR Cancer. 2025 Jan 28;11:e52886. doi: 10.2196/52886.
4
Sentiment analysis of the Hamas-Israel war on YouTube comments using deep learning.利用深度学习对YouTube上有关哈马斯-以色列战争的评论进行情感分析。
Sci Rep. 2024 Jun 13;14(1):13647. doi: 10.1038/s41598-024-63367-3.
5
Using Natural Language Processing to Explore Social Media Opinions on Food Security: Sentiment Analysis and Topic Modeling Study.使用自然语言处理技术探索社交媒体对食品安全的看法:情感分析和主题建模研究。
J Med Internet Res. 2024 Mar 21;26:e47826. doi: 10.2196/47826.
6
Food for thought: A natural language processing analysis of the 2020 Dietary Guidelines publice comments.值得深思的问题:2020 年膳食指南公开意见的自然语言处理分析。
Am J Clin Nutr. 2021 Aug 2;114(2):713-720. doi: 10.1093/ajcn/nqab119.
7
Investigating Reddit Data on Type 2 Diabetes Management During the COVID-19 Pandemic Using Latent Dirichlet Allocation Topic Modeling and Valence Aware Dictionary for Sentiment Reasoning Analysis: Content Analysis.使用潜在狄利克雷分配主题模型和情感推理分析的价态感知词典对Reddit上2019年冠状病毒病大流行期间2型糖尿病管理数据进行调查:内容分析
JMIR Form Res. 2025 Feb 21;9:e51154. doi: 10.2196/51154.
8
Sentiment Dynamics Among Informal Caregivers in Web-Based Alzheimer Communities: Systematic Analysis of Emotional Support and Interaction Patterns.基于网络的阿尔茨海默病社区中非正式照料者的情绪动态:情感支持与互动模式的系统分析
JMIR Aging. 2024 Dec 4;7:e60050. doi: 10.2196/60050.
9
Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences.结合主题建模、情感分析和语料库语言学来分析基于网络的非结构化患者体验数据:莫达非尼体验的案例研究。
J Med Internet Res. 2024 Dec 11;26:e54321. doi: 10.2196/54321.
10
The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling.公众对 reddit 上 ChatGPT 的态度:基于情感分析和主题建模的无监督学习研究。
PLoS One. 2024 May 14;19(5):e0302502. doi: 10.1371/journal.pone.0302502. eCollection 2024.

引用本文的文献

1
Sentiment analysis in public health: a systematic review of the current state, challenges, and future directions.公共卫生中的情感分析:对当前状况、挑战及未来方向的系统综述
Front Public Health. 2025 Jun 20;13:1609749. doi: 10.3389/fpubh.2025.1609749. eCollection 2025.
2
Sentiment Analysis Using a Large Language Model-Based Approach to Detect Opioids Mixed With Other Substances Via Social Media: Method Development and Validation.使用基于大语言模型的方法通过社交媒体检测与其他物质混合的阿片类药物的情感分析:方法开发与验证
JMIR Infodemiology. 2025 Jun 19;5:e70525. doi: 10.2196/70525.

本文引用的文献

1
Negative expressions are shared more on Twitter for public figures than for ordinary users.与普通用户相比,推特上针对公众人物的负面言论更多。
PNAS Nexus. 2023 Jul 6;2(7):pgad219. doi: 10.1093/pnasnexus/pgad219. eCollection 2023 Jul.
2
Stock price movement prediction based on Stocktwits investor sentiment using FinBERT and ensemble SVM.基于Stocktwits投资者情绪,使用FinBERT和集成支持向量机的股票价格走势预测。
PeerJ Comput Sci. 2023 Jun 7;9:e1403. doi: 10.7717/peerj-cs.1403. eCollection 2023.
3
Detecting depression of Chinese microblog users text analysis: Combining Linguistic Inquiry Word Count (LIWC) with culture and suicide related lexicons.
检测中国微博用户的抑郁情绪:文本分析——结合语言调查词频统计(LIWC)与文化及自杀相关词汇
Front Psychiatry. 2023 Feb 9;14:1121583. doi: 10.3389/fpsyt.2023.1121583. eCollection 2023.
4
The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.马修斯相关系数(MCC)应取代受试者工作特征曲线下面积(ROC AUC),作为评估二元分类的标准指标。
BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.
5
Spanish Facebook Posts as an Indicator of COVID-19 Vaccine Hesitancy in Texas.西班牙文脸书帖子作为德克萨斯州新冠疫苗犹豫程度的一个指标
Vaccines (Basel). 2022 Oct 14;10(10):1713. doi: 10.3390/vaccines10101713.
6
Negativity Spreads More than Positivity on Twitter After Both Positive and Negative Political Situations.在积极和消极的政治局势之后,推特上负面情绪的传播比正面情绪更广泛。
Affect Sci. 2021 Oct 12;2(4):379-390. doi: 10.1007/s42761-021-00057-7. eCollection 2021 Dec.
7
Social Media Insights Into US Mental Health During the COVID-19 Pandemic: Longitudinal Analysis of Twitter Data.社交媒体洞察美国在 COVID-19 大流行期间的心理健康状况:对 Twitter 数据的纵向分析。
J Med Internet Res. 2020 Dec 14;22(12):e21418. doi: 10.2196/21418.
8
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
9
Validating Automated Sentiment Analysis of Online Cognitive Behavioral Therapy Patient Texts: An Exploratory Study.验证在线认知行为疗法患者文本的自动情感分析:一项探索性研究。
Front Psychol. 2019 May 14;10:1065. doi: 10.3389/fpsyg.2019.01065. eCollection 2019.
10
Linguistic markers of psychological change surrounding September 11, 2001.2001年9月11日前后心理变化的语言标记。
Psychol Sci. 2004 Oct;15(10):687-93. doi: 10.1111/j.0956-7976.2004.00741.x.