• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社交媒体中的机器学习短文本分类:以推特上的气候变化为例。

Short text classification with machine learning in the social sciences: The case of climate change on Twitter.

机构信息

Department of Economic History and International Relations, Stockholm University, Stockholm, Sweden.

出版信息

PLoS One. 2023 Sep 29;18(9):e0290762. doi: 10.1371/journal.pone.0290762. eCollection 2023.

DOI:10.1371/journal.pone.0290762
PMID:37773969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10540966/
Abstract

To analyse large numbers of texts, social science researchers are increasingly confronting the challenge of text classification. When manual labeling is not possible and researchers have to find automatized ways to classify texts, computer science provides a useful toolbox of machine-learning methods whose performance remains understudied in the social sciences. In this article, we compare the performance of the most widely used text classifiers by applying them to a typical research scenario in social science research: a relatively small labeled dataset with infrequent occurrence of categories of interest, which is a part of a large unlabeled dataset. As an example case, we look at Twitter communication regarding climate change, a topic of increasing scholarly interest in interdisciplinary social science research. Using a novel dataset including 5,750 tweets from various international organizations regarding the highly ambiguous concept of climate change, we evaluate the performance of methods in automatically classifying tweets based on whether they are about climate change or not. In this context, we highlight two main findings. First, supervised machine-learning methods perform better than state-of-the-art lexicons, in particular as class balance increases. Second, traditional machine-learning methods, such as logistic regression and random forest, perform similarly to sophisticated deep-learning methods, whilst requiring much less training time and computational resources. The results have important implications for the analysis of short texts in social science research.

摘要

为了分析大量文本,社会科学研究人员越来越多地面临文本分类的挑战。当无法进行手动标记并且研究人员必须找到自动分类文本的方法时,计算机科学提供了一个有用的机器学习方法工具箱,但其在社会科学中的性能仍未得到充分研究。在本文中,我们将应用最广泛使用的文本分类器,并将其应用于社会科学研究中的一个典型研究场景,以比较它们的性能:一个相对较小的标记数据集,其中包含感兴趣类别的罕见情况,这是一个大型未标记数据集的一部分。作为一个示例案例,我们研究了关于气候变化的 Twitter 交流,这是跨学科社会科学研究中日益受到关注的一个主题。我们使用一个新的包含 5750 条来自不同国际组织关于气候变化这一高度模糊概念的推文的数据集,评估了根据推文是否与气候变化有关来自动分类推文的方法的性能。在这种情况下,我们强调了两个主要发现。首先,监督机器学习方法比最先进的词汇库表现更好,尤其是在类平衡增加的情况下。其次,传统的机器学习方法(如逻辑回归和随机森林)与复杂的深度学习方法表现相似,而所需的训练时间和计算资源要少得多。这些结果对社会科学研究中短文本的分析具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/979d0237d280/pone.0290762.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/125014a4a40f/pone.0290762.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/35ed3e51a63a/pone.0290762.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/6be0c3726caa/pone.0290762.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/1162c58f2497/pone.0290762.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/8f4b7b2b9a2e/pone.0290762.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/95c1ff45bc15/pone.0290762.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/f47a702529ed/pone.0290762.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/979d0237d280/pone.0290762.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/125014a4a40f/pone.0290762.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/35ed3e51a63a/pone.0290762.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/6be0c3726caa/pone.0290762.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/1162c58f2497/pone.0290762.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/8f4b7b2b9a2e/pone.0290762.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/95c1ff45bc15/pone.0290762.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/f47a702529ed/pone.0290762.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e5b/10540966/979d0237d280/pone.0290762.g008.jpg

相似文献

1
Short text classification with machine learning in the social sciences: The case of climate change on Twitter.社交媒体中的机器学习短文本分类:以推特上的气候变化为例。
PLoS One. 2023 Sep 29;18(9):e0290762. doi: 10.1371/journal.pone.0290762. eCollection 2023.
2
Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning.使用监督式机器学习评估与电子烟相关推文的情感和内容
J Med Internet Res. 2015 Aug 25;17(8):e208. doi: 10.2196/jmir.4392.
3
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.在 Twitter 上检测潜在有害和保护自杀相关内容:机器学习方法。
J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705.
4
Evaluation of clustering and topic modeling methods over health-related tweets and emails.健康相关推文和电子邮件的聚类和主题建模方法评估。
Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.
5
Twitter Analysis of the Nonmedical Use and Side Effects of Methylphenidate: Machine Learning Study.哌醋甲酯非医疗用途及副作用的推特分析:机器学习研究
J Med Internet Res. 2020 Feb 24;22(2):e16466. doi: 10.2196/16466.
6
Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?剖析推特上关于抗生素的讨论:整体情况如何?
J Med Internet Res. 2015 Jun 19;17(6):e154. doi: 10.2196/jmir.4220.
7
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.当“负面”即“正面”:识别与毒品相关推文中的个人交流和情感倾向
JMIR Public Health Surveill. 2016 Oct 24;2(2):e162. doi: 10.2196/publichealth.6327.
8
Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid.开发一个自动系统来对 Twitter 上有关医疗服务的闲聊进行分类:以医疗补助计划为例。
J Med Internet Res. 2021 May 3;23(5):e26616. doi: 10.2196/26616.
9
Using Twitter-Based Data for Sexual Violence Research: Scoping Review.利用推特数据进行性暴力研究:范围综述。
J Med Internet Res. 2023 May 15;25:e46084. doi: 10.2196/46084.
10
Exploiting Language Models to Classify Events from Twitter.利用语言模型对推特上的事件进行分类。
Comput Intell Neurosci. 2015;2015:401024. doi: 10.1155/2015/401024. Epub 2015 Sep 14.

引用本文的文献

1
HEDL: Deep learning multiple approaches for early detection of depression using sarcastic text.HEDL:使用讽刺文本的深度学习多种方法用于抑郁症的早期检测。
MethodsX. 2025 May 14;14:103370. doi: 10.1016/j.mex.2025.103370. eCollection 2025 Jun.
2
Applications of Machine Learning in Food Safety and HACCP Monitoring of Animal-Source Foods.机器学习在动物源食品的食品安全与HACCP监测中的应用
Foods. 2025 Mar 8;14(6):922. doi: 10.3390/foods14060922.
3
The linguistic and emotional effects of weather on UK social media users.天气对英国社交媒体用户的语言及情感影响。

本文引用的文献

1
Exploring climate change on Twitter using seven aspects: Stance, sentiment, aggressiveness, temperature, gender, topics, and disasters.利用七个方面探索 Twitter 上的气候变化:立场、情绪、攻击性、温度、性别、主题和灾害。
PLoS One. 2022 Sep 21;17(9):e0274213. doi: 10.1371/journal.pone.0274213. eCollection 2022.
2
Who tweets climate change papers? investigating publics of research through users' descriptions.谁会发布气候变化相关论文的推文?通过用户描述来调查研究的受众群体。
PLoS One. 2022 Jun 3;17(6):e0268999. doi: 10.1371/journal.pone.0268999. eCollection 2022.
3
International organizations and climate change adaptation: A new dataset for the social scientific study of adaptation, 1990-2017.
Sci Rep. 2025 Mar 7;15(1):8009. doi: 10.1038/s41598-024-82384-w.
4
Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec.基于改进 TF-IDF 和 Word2Vec 的旅游景点子类别的文本分类算法。
PLoS One. 2024 Oct 18;19(10):e0305095. doi: 10.1371/journal.pone.0305095. eCollection 2024.
5
Acupuncture indication knowledge bases: meridian entity recognition and classification based on ACUBERT.针刺适应证知识库:基于 ACUBERT 的经络实体识别与分类。
Database (Oxford). 2024 Aug 30;2024. doi: 10.1093/database/baae083.
6
CIDER: Context-sensitive polarity measurement for short-form text.CIDER:用于短文本的上下文敏感极性测量。
PLoS One. 2024 Apr 18;19(4):e0299490. doi: 10.1371/journal.pone.0299490. eCollection 2024.
国际组织与气候变化适应:1990-2017 年适应问题社会科学研究的新数据集。
PLoS One. 2021 Sep 10;16(9):e0257101. doi: 10.1371/journal.pone.0257101. eCollection 2021.
4
Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals.机器翻译的变革:深度学习系统的新闻翻译质量可媲美专业人工翻译。
Nat Commun. 2020 Sep 1;11(1):4381. doi: 10.1038/s41467-020-18073-9.
5
Word2vec convolutional neural networks for classification of news articles and tweets.基于词向量卷积神经网络的新闻文章和推文分类。
PLoS One. 2019 Aug 22;14(8):e0220976. doi: 10.1371/journal.pone.0220976. eCollection 2019.
6
Scientific networks on Twitter: Analyzing scientists' interactions in the climate change debate.推特上的科学网络:分析气候变化辩论中科学家的互动。
Public Underst Sci. 2019 Aug;28(6):696-712. doi: 10.1177/0963662519844131. Epub 2019 Apr 26.
7
Rationale-Augmented Convolutional Neural Networks for Text Classification.用于文本分类的基于原理增强的卷积神经网络。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:795-804. doi: 10.18653/v1/d16-1076.
8
Climate Change Sentiment on Twitter: An Unsolicited Public Opinion Poll.推特上的气候变化情绪:一项自发的民意调查。
PLoS One. 2015 Aug 20;10(8):e0136092. doi: 10.1371/journal.pone.0136092. eCollection 2015.
9
Optimal Thresholding of Classifiers to Maximize F1 Measure.分类器的最优阈值设定以最大化F1度量
Mach Learn Knowl Discov Databases. 2014;8725:225-239. doi: 10.1007/978-3-662-44851-9_15.
10
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.