• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从社交媒体中发现孕妇群体以进行安全监测与分析。

Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.

作者信息

Sarker Abeed, Chandrashekar Pramod, Magge Arjun, Cai Haitao, Klein Ari, Gonzalez Graciela

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, United States.

出版信息

J Med Internet Res. 2017 Oct 30;19(10):e361. doi: 10.2196/jmir.8164.

DOI:10.2196/jmir.8164
PMID:29084707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5684515/
Abstract

BACKGROUND

Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. Although the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations such as low enrollment rate, high cost, and selection bias.

OBJECTIVE

The primary objectives of this study were to systematically assess whether social media (Twitter) can be used to discover cohorts of pregnant women and to develop and deploy a natural language processing and machine learning pipeline for the automatic collection of cohort information. In addition, we also attempted to ascertain, in a preliminary fashion, what types of longitudinal information may potentially be mined from the collected cohort information.

METHODS

Our discovery of pregnant women relies on detecting pregnancy-indicating tweets (PITs), which are statements posted by pregnant women regarding their pregnancies. We used a set of 14 patterns to first detect potential PITs. We manually annotated a sample of 14,156 of the retrieved user posts to distinguish real PITs from false positives and trained a supervised classification system to detect real PITs. We optimized the classification system via cross validation, with features and settings targeted toward optimizing precision for the positive class. For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined.

RESULTS

Our rule-based PIT detection approach retrieved over 200,000 posts over a period of 18 months. Manual annotation agreement for three annotators was very high at kappa (κ)=.79. On a blind test set, the implemented classifier obtained an overall F score of 0.84 (0.88 for the pregnancy class and 0.68 for the nonpregnancy class). Precision for the pregnancy class was 0.93, and recall was 0.84. Feature analysis showed that the combination of dense and sparse vectors for classification achieved optimal performance. Employing the trained classifier resulted in the identification of 71,954 users from the collected posts. Over 250 million posts were retrieved for these users, which provided a multitude of longitudinal information about them.

CONCLUSIONS

Social media sources such as Twitter can be used to identify large cohorts of pregnant women and to gather longitudinal information via automated processing of their postings. Considering the many drawbacks and limitations of pregnancy registries, social media mining may provide beneficial complementary information. Although the cohort sizes identified over social media are large, future research will have to assess the completeness of the information available through them.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/3a9bb681933f/jmir_v19i10e361_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/ec74258c1548/jmir_v19i10e361_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/71bd0c8c25e7/jmir_v19i10e361_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/c26e64c9fde3/jmir_v19i10e361_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/3a9bb681933f/jmir_v19i10e361_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/ec74258c1548/jmir_v19i10e361_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/71bd0c8c25e7/jmir_v19i10e361_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/c26e64c9fde3/jmir_v19i10e361_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4537/5684515/3a9bb681933f/jmir_v19i10e361_fig4.jpg
摘要

背景

孕期暴露登记处是关于孕期母亲用药安全性信息的主要来源。此类登记处以自愿方式在孕期早期招募孕妇,并对她们进行跟踪直至孕期结束或更长时间,以系统收集有关特定妊娠结局的信息。尽管孕期登记模式相较于其他研究设计具有明显优势,但它们面临着诸多挑战和限制,如低登记率、高成本和选择偏倚。

目的

本研究的主要目的是系统评估社交媒体(推特)是否可用于发现孕妇队列,并开发和部署自然语言处理及机器学习流程以自动收集队列信息。此外,我们还试图初步确定从收集到的队列信息中可能挖掘出哪些类型的纵向信息。

方法

我们对孕妇的发现依赖于检测表明怀孕的推文(PIT),即孕妇发布的关于其怀孕情况的陈述。我们使用一组14种模式首先检测潜在的PIT。我们对检索到的14156条用户帖子样本进行人工标注,以区分真正的PIT和误报,并训练一个监督分类系统来检测真正的PIT。我们通过交叉验证优化分类系统,其特征和设置旨在优化阳性类别的精度。对于通过自动分类被确定为发布真正PIT的用户,我们的流程收集他们所有可用的过往和未来帖子,从中可以挖掘其他信息(如用药情况和胎儿结局)。

结果

我们基于规则的PIT检测方法在18个月内检索到超过200000条帖子。三位标注员的人工标注一致性在kappa(κ)=.79时非常高。在一个盲测集上,实施的分类器获得的总体F分数为0.84(怀孕类别为0.88,非怀孕类别为0.68)。怀孕类别的精度为0.93,召回率为0.84。特征分析表明,用于分类的密集和稀疏向量的组合实现了最佳性能。使用经过训练的分类器从收集到的帖子中识别出71954名用户。为这些用户检索到超过2.5亿条帖子,这些帖子提供了关于他们的大量纵向信息。

结论

推特等社交媒体来源可用于识别大量孕妇队列,并通过对其帖子的自动处理收集纵向信息。考虑到孕期登记处存在的诸多缺点和限制,社交媒体挖掘可能提供有益的补充信息。尽管通过社交媒体识别出的队列规模很大,但未来的研究将必须评估通过它们可获得信息的完整性。

相似文献

1
Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.从社交媒体中发现孕妇群体以进行安全监测与分析。
J Med Internet Res. 2017 Oct 30;19(10):e361. doi: 10.2196/jmir.8164.
2
Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter.社交媒体挖掘在出生缺陷研究中的应用:一种基于规则和自举的方法,用于在 Twitter 上收集罕见健康相关事件的数据。
J Biomed Inform. 2018 Nov;87:68-78. doi: 10.1016/j.jbi.2018.10.001. Epub 2018 Oct 4.
3
A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes.一种自然语言处理流程,以促进将推特数据用于不良妊娠结局的数字流行病学研究。
J Biomed Inform. 2020;112S:100076. doi: 10.1016/j.yjbinx.2020.100076. Epub 2020 Aug 8.
4
Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter.用于药物警戒的社交媒体挖掘:通过推特自动监测处方药滥用情况
Drug Saf. 2016 Mar;39(3):231-40. doi: 10.1007/s40264-015-0379-4.
5
Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.用于追踪 COVID-19 的 Twitter:自然语言处理管道和探索性数据集。
J Med Internet Res. 2021 Jan 22;23(1):e25314. doi: 10.2196/25314.
6
Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.基于机器学习和自然语言处理的地理定位中心监测和特征描述阿片类药物相关社交媒体聊天。
JAMA Netw Open. 2019 Nov 1;2(11):e1914672. doi: 10.1001/jamanetworkopen.2019.14672.
7
ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets.ReportAGE:基于用户在推文中的自我报告自动提取 Twitter 用户的准确年龄。
PLoS One. 2022 Jan 25;17(1):e0262087. doi: 10.1371/journal.pone.0262087. eCollection 2022.
8
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类
J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.
9
An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.一种用于在社交媒体消息中发现健康相关知识的集成异构分类方法。
J Biomed Inform. 2014 Jun;49:255-68. doi: 10.1016/j.jbi.2014.03.005. Epub 2014 Mar 16.
10
Filtering big data from social media--Building an early warning system for adverse drug reactions.从社交媒体中筛选大数据——构建药物不良反应预警系统。
J Biomed Inform. 2015 Apr;54:230-40. doi: 10.1016/j.jbi.2015.01.011. Epub 2015 Feb 14.

引用本文的文献

1
Uncovering the Complexity of Perinatal Polysubstance Use Disclosure Patterns on X: Mixed Methods Study.揭示围产期多物质使用披露模式的复杂性 X:混合方法研究。
J Med Internet Res. 2024 Sep 20;26:e53171. doi: 10.2196/53171.
2
Social Media Posts on Statins: What Can We Learn About Patient Experiences and Perspectives?关于他汀类药物的社交媒体帖子:我们能从患者经历和观点中学到什么?
J Am Heart Assoc. 2024 Apr 2;13(7):e033992. doi: 10.1161/JAHA.124.033992. Epub 2024 Mar 27.
3
Area-level Measures of the Social Environment: Operationalization, Pitfalls, and Ways Forward.

本文引用的文献

1
Tweet for Behavior Change: Using Social Media for the Dissemination of Public Health Messages.通过推文改变行为:利用社交媒体传播公共卫生信息。
JMIR Public Health Surveill. 2017 Mar 23;3(1):e14. doi: 10.2196/publichealth.6313.
2
A systematic review of pregnancy exposure registries: examination of protocol-specified pregnancy outcomes, target sample size, and comparator selection.妊娠暴露登记系统评价:对方案规定的妊娠结局、目标样本量和对照选择的审查。
Pharmacoepidemiol Drug Saf. 2017 Feb;26(2):208-214. doi: 10.1002/pds.4150. Epub 2016 Dec 27.
3
Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis.
社区环境的区域水平测量指标:操作化、陷阱与未来方向。
Curr Top Behav Neurosci. 2024;68:277-296. doi: 10.1007/7854_2024_464.
4
#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning.# 慢性疼痛:利用机器学习从推特自动构建慢性疼痛队列
Health Data Sci. 2023;3. doi: 10.34133/hds.0078. Epub 2023 Jul 4.
5
Early-stage pregnancy recognition on microblogs: Machine learning and lexicon-based approaches.微博上的早期妊娠识别:基于机器学习和词汇的方法。
Heliyon. 2023 Sep 14;9(9):e20132. doi: 10.1016/j.heliyon.2023.e20132. eCollection 2023 Sep.
6
Pregex: Rule-Based Detection and Extraction of Twitter Data in Pregnancy.Pregex:基于规则的孕期推特数据检测与提取
J Med Internet Res. 2023 Feb 9;25:e40569. doi: 10.2196/40569.
7
Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition.从推文自动提取药物提及-生物创意 VII 共享任务 3 竞赛概述。
Database (Oxford). 2023 Feb 3;2023. doi: 10.1093/database/baac108.
8
Using Twitter Data for Cohort Studies of Drug Safety in Pregnancy: Proof-of-concept With β-Blockers.利用推特数据进行孕期药物安全性队列研究:以β受体阻滞剂为例的概念验证
JMIR Form Res. 2022 Jun 30;6(6):e36771. doi: 10.2196/36771.
9
Signals of increasing co-use of stimulants and opioids from online drug forum data.从网上毒品论坛数据看兴奋剂和阿片类药物联合使用不断增加的信号。
Harm Reduct J. 2022 May 25;19(1):51. doi: 10.1186/s12954-022-00628-2.
10
Identifying Barriers to Enrollment in Patient Pregnancy Registries: Building Evidence Through Crowdsourcing.识别患者妊娠登记注册的障碍:通过众包收集证据
JMIR Form Res. 2022 May 25;6(5):e30573. doi: 10.2196/30573.
使用贝叶斯变化点分析评估谷歌、推特和维基百科作为流感监测工具:一项比较分析。
JMIR Public Health Surveill. 2016 Oct 20;2(2):e161. doi: 10.2196/publichealth.5901.
4
Detecting signals of detrimental prescribing cascades from social media.从社交媒体中检测有害处方级联反应信号。
Artif Intell Med. 2016 Jul;71:43-56. doi: 10.1016/j.artmed.2016.06.002. Epub 2016 Jun 29.
5
Performing research in pregnancy: Challenges and perspectives.孕期开展研究:挑战与展望。
Clin Dermatol. 2016 May-Jun;34(3):410-5. doi: 10.1016/j.clindermatol.2016.02.014. Epub 2016 Feb 11.
6
MONITORING POTENTIAL DRUG INTERACTIONS AND REACTIONS VIA NETWORK ANALYSIS OF INSTAGRAM USER TIMELINES.通过对Instagram用户动态的网络分析监测潜在药物相互作用和反应
Pac Symp Biocomput. 2016;21:492-503.
7
Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter.用于药物警戒的社交媒体挖掘:通过推特自动监测处方药滥用情况
Drug Saf. 2016 Mar;39(3):231-40. doi: 10.1007/s40264-015-0379-4.
8
Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.社交媒体中的药物警戒:使用带有词嵌入聚类特征的序列标注挖掘药物不良反应提及信息。
J Am Med Inform Assoc. 2015 May;22(3):671-81. doi: 10.1093/jamia/ocu041. Epub 2015 Mar 9.
9
Utilizing social media data for pharmacovigilance: A review.利用社交媒体数据进行药物警戒:综述
J Biomed Inform. 2015 Apr;54:202-12. doi: 10.1016/j.jbi.2015.02.004. Epub 2015 Feb 23.
10
Twitter improves influenza forecasting.推特可改善流感预测。
PLoS Curr. 2014 Oct 28;6:ecurrents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117. doi: 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117.