• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用BERTweet对推特上的电子烟话语进行分类:比较深度学习研究。

Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study.

作者信息

Baker William, Colditz Jason B, Dobbs Page D, Mai Huy, Visweswaran Shyam, Zhan Justin, Primack Brian A

机构信息

Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR, United States.

Division of General Internal Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States.

出版信息

JMIR Med Inform. 2022 Jul 21;10(7):e33678. doi: 10.2196/33678.

DOI:10.2196/33678
PMID:35862172
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9353682/
Abstract

BACKGROUND

Twitter provides a valuable platform for the surveillance and monitoring of public health topics; however, manually categorizing large quantities of Twitter data is labor intensive and presents barriers to identify major trends and sentiments. Additionally, while machine and deep learning approaches have been proposed with high accuracy, they require large, annotated data sets. Public pretrained deep learning classification models, such as BERTweet, produce higher-quality models while using smaller annotated training sets.

OBJECTIVE

This study aims to derive and evaluate a pretrained deep learning model based on BERTweet that can identify tweets relevant to vaping, tweets (related to vaping) of commercial nature, and tweets with provape sentiment. Additionally, the performance of the BERTweet classifier will be compared against a long short-term memory (LSTM) model to show the improvements a pretrained model has over traditional deep learning approaches.

METHODS

Twitter data were collected from August to October 2019 using vaping-related search terms. From this set, a random subsample of 2401 English tweets was manually annotated for relevance (vaping related or not), commercial nature (commercial or not), and sentiment (positive, negative, or neutral). Using the annotated data, 3 separate classifiers were built using BERTweet with the default parameters defined by the Simple Transformer application programming interface (API). Each model was trained for 20 iterations and evaluated with a random split of the annotated tweets, reserving 10% (n=165) of tweets for evaluations.

RESULTS

The relevance, commercial, and sentiment classifiers achieved an area under the receiver operating characteristic curve (AUROC) of 94.5%, 99.3%, and 81.7%, respectively. Additionally, the weighted F1 scores of each were 97.6%, 99.0%, and 86.1%, respectively. We found that BERTweet outperformed the LSTM model in the classification of all categories.

CONCLUSIONS

Large, open-source deep learning classifiers, such as BERTweet, can provide researchers the ability to reliably determine if tweets are relevant to vaping; include commercial content; and include positive, negative, or neutral content about vaping with a higher accuracy than traditional natural language processing deep learning models. Such enhancement to the utilization of Twitter data can allow for faster exploration and dissemination of time-sensitive data than traditional methodologies (eg, surveys, polling research).

摘要

背景

推特为公共卫生话题的监测和监督提供了一个有价值的平台;然而,手动对大量推特数据进行分类需要耗费大量人力,并且在识别主要趋势和情绪方面存在障碍。此外,虽然已经提出了具有高精度的机器学习和深度学习方法,但它们需要大量的带注释数据集。公共预训练深度学习分类模型,如BERTweet,在使用较小的带注释训练集时能产生更高质量的模型。

目的

本研究旨在推导和评估基于BERTweet的预训练深度学习模型,该模型能够识别与电子烟相关的推文、具有商业性质的(与电子烟相关的)推文以及带有支持电子烟情绪的推文。此外,将把BERTweet分类器的性能与长短期记忆(LSTM)模型进行比较,以展示预训练模型相对于传统深度学习方法的改进。

方法

使用与电子烟相关的搜索词,于2019年8月至10月收集推特数据。从该数据集中,随机抽取2401条英文推文的子样本,人工标注其相关性(是否与电子烟相关)、商业性质(是否为商业性质)和情绪(积极、消极或中性)。使用带注释的数据,使用BERTweet并采用由Simple Transformer应用程序编程接口(API)定义的默认参数构建3个单独的分类器。每个模型训练20次迭代,并使用带注释推文的随机划分进行评估,保留10%(n = 165)的推文用于评估。

结果

相关性、商业性质和情绪分类器的受试者工作特征曲线下面积(AUROC)分别达到94.5%、99.3%和81.7%。此外,每个分类器的加权F1分数分别为97.6%、99.0%和86.1%。我们发现BERTweet在所有类别的分类中均优于LSTM模型。

结论

大型开源深度学习分类器,如BERTweet,能够使研究人员可靠地确定推文是否与电子烟相关;是否包含商业内容;以及是否包含关于电子烟的积极、消极或中性内容,其准确性高于传统自然语言处理深度学习模型。与传统方法(如调查、民意调查研究)相比,对推特数据利用的这种增强能够更快地探索和传播对时间敏感的数据。

相似文献

1
Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study.使用BERTweet对推特上的电子烟话语进行分类:比较深度学习研究。
JMIR Med Inform. 2022 Jul 21;10(7):e33678. doi: 10.2196/33678.
2
Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.机器学习分类器在电子烟 Twitter 监测中的应用:比较机器学习研究。
J Med Internet Res. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478.
3
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
4
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.
5
Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning.使用监督式机器学习评估与电子烟相关推文的情感和内容
J Med Internet Res. 2015 Aug 25;17(8):e208. doi: 10.2196/jmir.4392.
6
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.当“负面”即“正面”:识别与毒品相关推文中的个人交流和情感倾向
JMIR Public Health Surveill. 2016 Oct 24;2(2):e162. doi: 10.2196/publichealth.6327.
7
Exploring Eating Disorder Topics on Twitter: Machine Learning Approach.在推特上探索饮食失调话题:机器学习方法。
JMIR Med Inform. 2020 Oct 30;8(10):e18273. doi: 10.2196/18273.
8
Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification.在2019年电子烟相关肺损伤(EVALI)爆发期间,利用机器学习分类法在推特上自动检测与电子烟相关的推文。
Front Big Data. 2022 Feb 10;5:770585. doi: 10.3389/fdata.2022.770585. eCollection 2022.
9
Identifying Key Topics Bearing Negative Sentiment on Twitter: Insights Concerning the 2015-2016 Zika Epidemic.识别推特上带有负面情绪的关键话题:关于2015 - 2016年寨卡疫情的见解
JMIR Public Health Surveill. 2019 Jun 4;5(2):e11036. doi: 10.2196/11036.
10
Using #ActuallyAutistic on Twitter for Precision Diagnosis of Autism Spectrum Disorder: Machine Learning Study.在推特上使用#ActuallyAutistic进行自闭症谱系障碍的精准诊断:机器学习研究
JMIR Form Res. 2024 Feb 14;8:e52660. doi: 10.2196/52660.

引用本文的文献

1
Public perception and changing attitudes toward antidepressants over a decade in social media: Lessons learned from online discussion using artificial intelligence.社交媒体上公众对抗抑郁药物十年间的认知及态度变化:利用人工智能从在线讨论中汲取的经验教训
PLoS One. 2025 Sep 4;20(9):e0318464. doi: 10.1371/journal.pone.0318464. eCollection 2025.
2
Assessment of beliefs and attitudes towards benzodiazepines using machine learning based on social media posts: an observational study.基于社交媒体帖子的机器学习评估苯二氮䓬类药物的信念和态度:一项观察性研究。
BMC Psychiatry. 2024 Oct 8;24(1):659. doi: 10.1186/s12888-024-06111-5.
3
Twitter Sentiment About the US Federal Tobacco 21 Law: Mixed Methods Analysis.关于美国联邦烟草21岁法案的推特情绪:混合方法分析。
JMIR Form Res. 2023 Aug 31;7:e50346. doi: 10.2196/50346.
4
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.
5
Artificial Intelligence-Enabled Analysis of Statin-Related Topics and Sentiments on Social Media.基于人工智能的社交媒体中他汀类药物相关话题和情绪的分析。
JAMA Netw Open. 2023 Apr 3;6(4):e239747. doi: 10.1001/jamanetworkopen.2023.9747.

本文引用的文献

1
Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.机器学习分类器在电子烟 Twitter 监测中的应用:比较机器学习研究。
J Med Internet Res. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478.
2
Twitter sentiment classification for measuring public health concerns.用于衡量公众健康担忧的推特情感分类
Soc Netw Anal Min. 2015;5(1):13. doi: 10.1007/s13278-015-0253-5. Epub 2015 May 12.
3
I wake up and hit the JUUL: Analyzing Twitter for JUUL nicotine effects and dependence.我醒来后就开始吸 JUUL:分析 Twitter 上关于 JUUL 尼古丁效应和依赖的内容。
Drug Alcohol Depend. 2019 Nov 1;204:107500. doi: 10.1016/j.drugalcdep.2019.06.005. Epub 2019 Aug 30.
4
Toward Real-Time Infoveillance of Twitter Health Messages.实时监测 Twitter 健康信息
Am J Public Health. 2018 Aug;108(8):1009-1014. doi: 10.2105/AJPH.2018.304497. Epub 2018 Jun 21.
5
Vaping versus JUULing: how the extraordinary growth and marketing of JUUL transformed the US retail e-cigarette market.蒸气与 JUUL 之争:JUUL 超乎寻常的增长和营销如何改变美国零售电子烟市场。
Tob Control. 2019 Mar;28(2):146-151. doi: 10.1136/tobaccocontrol-2018-054382. Epub 2018 May 31.
6
Sentiment Analysis of Health Care Tweets: Review of the Methods Used.医疗保健推文的情感分析:所用方法综述
JMIR Public Health Surveill. 2018 Apr 23;4(2):e43. doi: 10.2196/publichealth.5789.
7
Exploratory Analysis of Marketing and Non-marketing E-cigarette Themes on Twitter.推特上营销和非营销电子烟主题的探索性分析
Soc Inform (2016). 2016 Nov;10047:307-322. doi: 10.1007/978-3-319-47874-6_22. Epub 2016 Oct 19.
8
Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks.使用卷积双向长短期记忆网络学习监测机器健康状况。
Sensors (Basel). 2017 Jan 30;17(2):273. doi: 10.3390/s17020273.
9
Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning.使用监督式机器学习评估与电子烟相关推文的情感和内容
J Med Internet Res. 2015 Aug 25;17(8):e208. doi: 10.2196/jmir.4392.
10
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.