• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社交媒体中非医疗处方药物使用的自动检测的文本分类模型。

Text classification models for the automatic detection of nonmedical prescription medication use from social media.

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Atlanta, GA, 30322, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

出版信息

BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0.

DOI:10.1186/s12911-021-01394-0
PMID:33499852
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7835447/
Abstract

BACKGROUND

Prescription medication (PM) misuse/abuse has emerged as a national crisis in the United States, and social media has been suggested as a potential resource for performing active monitoring. However, automating a social media-based monitoring system is challenging-requiring advanced natural language processing (NLP) and machine learning methods. In this paper, we describe the development and evaluation of automatic text classification models for detecting self-reports of PM abuse from Twitter.

METHODS

We experimented with state-of-the-art bi-directional transformer-based language models, which utilize tweet-level representations that enable transfer learning (e.g., BERT, RoBERTa, XLNet, AlBERT, and DistilBERT), proposed fusion-based approaches, and compared the developed models with several traditional machine learning, including deep learning, approaches. Using a public dataset, we evaluated the performances of the classifiers on their abilities to classify the non-majority "abuse/misuse" class.

RESULTS

Our proposed fusion-based model performs significantly better than the best traditional model (F-score [95% CI]: 0.67 [0.64-0.69] vs. 0.45 [0.42-0.48]). We illustrate, via experimentation using varying training set sizes, that the transformer-based models are more stable and require less annotated data compared to the other models. The significant improvements achieved by our best-performing classification model over past approaches makes it suitable for automated continuous monitoring of nonmedical PM use from Twitter.

CONCLUSIONS

BERT, BERT-like and fusion-based models outperform traditional machine learning and deep learning models, achieving substantial improvements over many years of past research on the topic of prescription medication misuse/abuse classification from social media, which had been shown to be a complex task due to the unique ways in which information about nonmedical use is presented. Several challenges associated with the lack of context and the nature of social media language need to be overcome to further improve BERT and BERT-like models. These experimental driven challenges are represented as potential future research directions.

摘要

背景

处方药物(PM)的滥用已成为美国的全国性危机,社交媒体已被提议作为主动监测的潜在资源。然而,自动化社交媒体监测系统具有挑战性,需要先进的自然语言处理(NLP)和机器学习方法。在本文中,我们描述了从 Twitter 检测 PM 滥用自我报告的自动文本分类模型的开发和评估。

方法

我们尝试了基于双向转换器的最先进的语言模型,这些模型利用了能够进行迁移学习的推文级表示(例如 BERT、RoBERTa、XLNet、AlBERT 和 DistilBERT)、提出了基于融合的方法,并将所开发的模型与几种传统机器学习方法(包括深度学习)进行了比较。我们使用公共数据集评估了分类器在分类非主要“滥用/误用”类别的能力。

结果

我们提出的基于融合的模型的性能明显优于最佳传统模型(F 分数[95%置信区间]:0.67[0.64-0.69] 与 0.45[0.42-0.48])。通过使用不同的训练集大小进行实验,我们表明基于转换器的模型比其他模型更稳定,并且需要更少的注释数据。与过去的方法相比,我们表现最佳的分类模型取得的显著改进使其适用于从 Twitter 自动连续监测非医疗 PM 的使用。

结论

BERT、BERT 类模型和基于融合的模型优于传统机器学习和深度学习模型,在过去多年社交媒体上处方药滥用/误用分类的研究中取得了实质性的改进,由于非医疗使用信息呈现的独特方式,该研究被证明是一项复杂的任务。需要克服与缺乏上下文和社交媒体语言性质相关的几个挑战,以进一步改进 BERT 和 BERT 类模型。这些由实验驱动的挑战代表了潜在的未来研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c2df11ffa393/12911_2021_1394_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/3f0b7bbbecd9/12911_2021_1394_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/7305291fe733/12911_2021_1394_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c70e304ab7de/12911_2021_1394_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c6920b0b8339/12911_2021_1394_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/6466f1e94695/12911_2021_1394_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c2df11ffa393/12911_2021_1394_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/3f0b7bbbecd9/12911_2021_1394_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/7305291fe733/12911_2021_1394_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c70e304ab7de/12911_2021_1394_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c6920b0b8339/12911_2021_1394_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/6466f1e94695/12911_2021_1394_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ab8/7836501/c2df11ffa393/12911_2021_1394_Fig6_HTML.jpg

相似文献

1
Text classification models for the automatic detection of nonmedical prescription medication use from social media.社交媒体中非医疗处方药物使用的自动检测的文本分类模型。
BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0.
2
Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.通过数据标注促进用于表征药物非医疗用途的可重复研究:Twitter语料库描述及指南
J Med Internet Res. 2020 Feb 26;22(2):e15861. doi: 10.2196/15861.
3
Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework.从社交媒体挖掘处方药物滥用监测信息:综述与以数据为中心的框架建议。
J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162.
4
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.在 Twitter 上检测潜在有害和保护自杀相关内容:机器学习方法。
J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705.
5
Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.基于机器学习和自然语言处理的地理定位中心监测和特征描述阿片类药物相关社交媒体聊天。
JAMA Netw Open. 2019 Nov 1;2(11):e1914672. doi: 10.1001/jamanetworkopen.2019.14672.
6
Momentary Depressive Feeling Detection Using X (Formerly Twitter) Data: Contextual Language Approach.使用X(原推特)数据检测瞬间抑郁情绪:上下文语言方法。
JMIR AI. 2023 Nov 27;2:e49531. doi: 10.2196/49531.
7
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
8
Towards Transfer Learning Techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study.面向迁移学习技术——BERT、DistilBERT、BERTimbau 和 DistilBERTimbau 用于来自不同语言的自动文本分类:案例研究。
Sensors (Basel). 2022 Oct 26;22(21):8184. doi: 10.3390/s22218184.
9
Utilizing a multi-class classification approach to detect therapeutic and recreational misuse of opioids on Twitter.利用多类别分类方法在推特上检测阿片类药物的治疗性和娱乐性滥用情况。
Comput Biol Med. 2021 Feb;129:104132. doi: 10.1016/j.compbiomed.2020.104132. Epub 2020 Nov 20.
10
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.

引用本文的文献

1
Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.使用大语言模型从公开可用来源自动提取死亡率信息:开发与评估研究
J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113.
2
Monitoring the opioid epidemic via social media discussions.通过社交媒体讨论监测阿片类药物流行情况。
NPJ Digit Med. 2025 May 15;8(1):284. doi: 10.1038/s41746-025-01642-x.
3
Which social media platforms facilitate monitoring the opioid crisis?哪些社交媒体平台有助于监测阿片类药物危机?

本文引用的文献

1
COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model.新冠疫情感知:基于BERT模型对中国社交媒体的负面情绪分析
IEEE Access. 2020 Jul 28;8:138162-138169. doi: 10.1109/ACCESS.2020.3012595. eCollection 2020.
2
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
3
Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.
PLOS Digit Health. 2025 Apr 28;4(4):e0000842. doi: 10.1371/journal.pdig.0000842. eCollection 2025 Apr.
4
"I Been Taking Adderall Mixing it With Lean, Hope I Don't Wake Up Out My Sleep": Harnessing Twitter to Understand Nonmedical Prescription Stimulant Use among Black Women and Men Subscribers.“我一直在服用阿得拉,并与止咳糖浆混合,希望我不会在睡梦中醒来”:利用推特了解黑人女性和男性订阅者中非医疗用途处方兴奋剂的使用情况。
medRxiv. 2024 Dec 5:2024.12.03.24318408. doi: 10.1101/2024.12.03.24318408.
5
Task-Specific Transformer-Based Language Models in Health Care: Scoping Review.基于任务特定的转换器的语言模型在医疗保健中的应用:范围综述。
JMIR Med Inform. 2024 Nov 18;12:e49724. doi: 10.2196/49724.
6
Classification of Patients' Judgments of Their Physicians in Web-Based Written Reviews Using Natural Language Processing: Algorithm Development and Validation.使用自然语言处理对患者在基于网络的书面评论中对其医生的评价进行分类:算法开发与验证
J Med Internet Res. 2024 Aug 1;26:e50236. doi: 10.2196/50236.
7
#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning.# 慢性疼痛:利用机器学习从推特自动构建慢性疼痛队列
Health Data Sci. 2023;3. doi: 10.34133/hds.0078. Epub 2023 Jul 4.
8
A framework for multi-faceted content analysis of social media chatter regarding non-medical use of prescription medications.一个用于对社交媒体上有关处方药非医疗用途的闲聊进行多方面内容分析的框架。
BMC Digit Health. 2023;1. doi: 10.1186/s44247-023-00029-w. Epub 2023 Aug 7.
9
Large-Scale Social Media Analysis Reveals Emotions Associated with Nonmedical Prescription Drug Use.大规模社交媒体分析揭示与非医疗用途处方药使用相关的情绪。
Health Data Sci. 2022;2022. doi: 10.34133/2022/9851989. Epub 2022 Apr 27.
10
A review on Natural Language Processing Models for COVID-19 research.关于用于新冠病毒研究的自然语言处理模型的综述。
Healthc Anal (N Y). 2022 Nov;2:100078. doi: 10.1016/j.health.2022.100078. Epub 2022 Jul 19.
通过数据标注促进用于表征药物非医疗用途的可重复研究:Twitter语料库描述及指南
J Med Internet Res. 2020 Feb 26;22(2):e15861. doi: 10.2196/15861.
4
Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.基于机器学习和自然语言处理的地理定位中心监测和特征描述阿片类药物相关社交媒体聊天。
JAMA Netw Open. 2019 Nov 1;2(11):e1914672. doi: 10.1001/jamanetworkopen.2019.14672.
5
Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework.从社交媒体挖掘处方药物滥用监测信息:综述与以数据为中心的框架建议。
J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162.
6
An unsupervised and customizable misspelling generator for mining noisy health-related text sources.一种用于挖掘噪声健康相关文本源的无监督和可定制的拼写错误生成器。
J Biomed Inform. 2018 Dec;88:98-107. doi: 10.1016/j.jbi.2018.11.007. Epub 2018 Nov 13.
7
Sex differences in patterns of prescription opioid non-medical use among 10-18 year olds in the US.美国 10-18 岁青少年处方阿片类药物非医疗使用模式的性别差异。
Addict Behav. 2019 Feb;89:163-171. doi: 10.1016/j.addbeh.2018.10.009. Epub 2018 Oct 9.
8
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.从 Twitter 上获取药物相关文本分类和概念规范化的数据和系统:来自社交媒体挖掘健康(SMM4H)-2017 共享任务的见解。
J Am Med Inform Assoc. 2018 Oct 1;25(10):1274-1283. doi: 10.1093/jamia/ocy114.
9
Detection and Analysis of Drug Misuses. A Study Based on Social Media Messages.药物滥用的检测与分析。一项基于社交媒体信息的研究。
Front Pharmacol. 2018 Jul 26;9:791. doi: 10.3389/fphar.2018.00791. eCollection 2018.
10
Candyflipping and Other Combinations: Identifying Drug-Drug Combinations from an Online Forum.摇头丸与其他组合:从一个在线论坛中识别药物组合。
Front Psychiatry. 2018 Apr 30;9:135. doi: 10.3389/fpsyt.2018.00135. eCollection 2018.