• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于健康相关队列研究的推特个人资料中的自动性别检测

Automatic gender detection in Twitter profiles for health-related cohort studies.

作者信息

Yang Yuan-Chi, Al-Garadi Mohammed Ali, Love Jennifer S, Perrone Jeanmarie, Sarker Abeed

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA.

Department of Emergency Medicine, School of Medicine, Oregon Health & Science University, Portland, Oregon, USA.

出版信息

JAMIA Open. 2021 Jun 23;4(2):ooab042. doi: 10.1093/jamiaopen/ooab042. eCollection 2021 Apr.

DOI:10.1093/jamiaopen/ooab042
PMID:34169232
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8220305/
Abstract

OBJECTIVE

Biomedical research involving social media data is gradually moving from population-level to targeted, cohort-level data analysis. Though crucial for biomedical studies, social media user's demographic information (eg, gender) is often not explicitly known from profiles. Here, we present an automatic gender classification system for social media and we illustrate how gender information can be incorporated into a social media-based health-related study.

MATERIALS AND METHODS

We used a large Twitter dataset composed of public, gender-labeled users (Dataset-1) for training and evaluating the gender detection pipeline. We experimented with machine learning algorithms including support vector machines (SVMs) and deep-learning models, and public packages including M3. We considered users' information including profile and tweets for classification. We also developed a meta-classifier ensemble that strategically uses the predicted scores from the classifiers. We then applied the best-performing pipeline to Twitter users who have self-reported nonmedical use of prescription medications (Dataset-2) to assess the system's utility.

RESULTS AND DISCUSSION

We collected 67 181 and 176 683 users for Dataset-1 and Dataset-2, respectively. A meta-classifier involving SVM and M3 performed the best (Dataset-1 accuracy: 94.4% [95% confidence interval: 94.0-94.8%]; Dataset-2: 94.4% [95% confidence interval: 92.0-96.6%]). Including automatically classified information in the analyses of Dataset-2 revealed gender-specific trends-proportions of females closely resemble data from the National Survey of Drug Use and Health 2018 (tranquilizers: 0.50 vs 0.50; stimulants: 0.50 vs 0.45), and the overdose Emergency Room Visit due to Opioids by Nationwide Emergency Department Sample (pain relievers: 0.38 vs 0.37).

CONCLUSION

Our publicly available, automated gender detection pipeline may aid cohort-specific social media data analyses (https://bitbucket.org/sarkerlab/gender-detection-for-public).

摘要

目的

涉及社交媒体数据的生物医学研究正逐渐从人群层面转向有针对性的队列层面数据分析。尽管社交媒体用户的人口统计学信息(如性别)对生物医学研究至关重要,但通常无法从个人资料中明确得知。在此,我们展示了一种用于社交媒体的自动性别分类系统,并阐述了如何将性别信息纳入基于社交媒体的健康相关研究。

材料与方法

我们使用了一个大型推特数据集(数据集1)进行训练和评估性别检测流程,该数据集由公开的、带有性别标签的用户组成。我们试验了包括支持向量机(SVM)和深度学习模型在内的机器学习算法,以及包括M3在内的公共软件包。我们将用户的资料和推文等信息用于分类。我们还开发了一个元分类器集成,策略性地使用分类器的预测分数。然后,我们将表现最佳的流程应用于自我报告非医疗用途处方药的推特用户(数据集2),以评估该系统的效用。

结果与讨论

我们分别为数据集1和数据集2收集了67181名和176683名用户。一个涉及SVM和M3的元分类器表现最佳(数据集1准确率:94.4%[95%置信区间:94.0 - 94.8%];数据集2:94.4%[95%置信区间:92.0 - 96.6%])。在对数据集2的分析中纳入自动分类信息后,揭示了特定性别的趋势——女性比例与2018年全国药物使用和健康调查的数据相近(镇静剂:0.50对0.50;兴奋剂:0.50对0.45),以及全国急诊科样本中因阿片类药物导致的过量急诊室就诊情况(止痛药:0.38对0.37)。

结论

我们公开可用的自动性别检测流程可能有助于特定队列的社交媒体数据分析(https://bitbucket.org/sarkerlab/gender-detection-for-public)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97d5/8220305/f665d3e819b3/ooab042f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97d5/8220305/f665d3e819b3/ooab042f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97d5/8220305/f665d3e819b3/ooab042f1.jpg

相似文献

1
Automatic gender detection in Twitter profiles for health-related cohort studies.用于健康相关队列研究的推特个人资料中的自动性别检测
JAMIA Open. 2021 Jun 23;4(2):ooab042. doi: 10.1093/jamiaopen/ooab042. eCollection 2021 Apr.
2
Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.通过数据标注促进用于表征药物非医疗用途的可重复研究:Twitter语料库描述及指南
J Med Internet Res. 2020 Feb 26;22(2):e15861. doi: 10.2196/15861.
3
A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes.一种自然语言处理流程,以促进将推特数据用于不良妊娠结局的数字流行病学研究。
J Biomed Inform. 2020;112S:100076. doi: 10.1016/j.yjbinx.2020.100076. Epub 2020 Aug 8.
4
Twitter Analysis of the Nonmedical Use and Side Effects of Methylphenidate: Machine Learning Study.哌醋甲酯非医疗用途及副作用的推特分析:机器学习研究
J Med Internet Res. 2020 Feb 24;22(2):e16466. doi: 10.2196/16466.
5
Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.基于机器学习和自然语言处理的地理定位中心监测和特征描述阿片类药物相关社交媒体聊天。
JAMA Netw Open. 2019 Nov 1;2(11):e1914672. doi: 10.1001/jamanetworkopen.2019.14672.
6
Towards scaling Twitter for digital epidemiology of birth defects.迈向扩大推特在出生缺陷数字流行病学中的应用规模。
NPJ Digit Med. 2019 Oct 1;2:96. doi: 10.1038/s41746-019-0170-5. eCollection 2019.
7
Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid.开发一个自动系统来对 Twitter 上有关医疗服务的闲聊进行分类:以医疗补助计划为例。
J Med Internet Res. 2021 May 3;23(5):e26616. doi: 10.2196/26616.
8
Text classification models for the automatic detection of nonmedical prescription medication use from social media.社交媒体中非医疗处方药物使用的自动检测的文本分类模型。
BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0.
9
Can accurate demographic information about people who use prescription medications nonmedically be derived from Twitter?能否从 Twitter 上获取非医疗目的使用处方药物人群的准确人口统计学信息?
Proc Natl Acad Sci U S A. 2023 Feb 21;120(8):e2207391120. doi: 10.1073/pnas.2207391120. Epub 2023 Feb 14.
10
How Do You #relax When You're #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets.当你感到压力时如何放松?一项关于与压力相关推文的内容分析和信息流行病学研究。
JMIR Public Health Surveill. 2017 Jun 13;3(2):e35. doi: 10.2196/publichealth.5939.

引用本文的文献

1
"I Been Taking Adderall Mixing it With Lean, Hope I Don't Wake Up Out My Sleep": Harnessing Twitter to Understand Nonmedical Prescription Stimulant Use among Black Women and Men Subscribers.“我一直在服用阿得拉,并与止咳糖浆混合,希望我不会在睡梦中醒来”:利用推特了解黑人女性和男性订阅者中非医疗用途处方兴奋剂的使用情况。
medRxiv. 2024 Dec 5:2024.12.03.24318408. doi: 10.1101/2024.12.03.24318408.
2
Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review.用于预测 Twitter 用户性别和年龄的方法和标注数据集:范围综述。
J Med Internet Res. 2024 Mar 15;26:e47923. doi: 10.2196/47923.
3
Large-Scale Social Media Analysis Reveals Emotions Associated with Nonmedical Prescription Drug Use.

本文引用的文献

1
Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid.开发一个自动系统来对 Twitter 上有关医疗服务的闲聊进行分类:以医疗补助计划为例。
J Med Internet Res. 2021 May 3;23(5):e26616. doi: 10.2196/26616.
2
Text classification models for the automatic detection of nonmedical prescription medication use from social media.社交媒体中非医疗处方药物使用的自动检测的文本分类模型。
BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0.
3
Ethics and governance for digital disease surveillance.
大规模社交媒体分析揭示与非医疗用途处方药使用相关的情绪。
Health Data Sci. 2022;2022. doi: 10.34133/2022/9851989. Epub 2022 Apr 27.
4
Barriers to opioid use disorder treatment: A comparison of self-reported information from social media with barriers found in literature.阿片类使用障碍治疗障碍:社交媒体自我报告信息与文献中发现的障碍的比较。
Front Public Health. 2023 Apr 20;11:1141093. doi: 10.3389/fpubh.2023.1141093. eCollection 2023.
5
Can accurate demographic information about people who use prescription medications nonmedically be derived from Twitter?能否从 Twitter 上获取非医疗目的使用处方药物人群的准确人口统计学信息?
Proc Natl Acad Sci U S A. 2023 Feb 21;120(8):e2207391120. doi: 10.1073/pnas.2207391120. Epub 2023 Feb 14.
6
Automatic Detection of Twitter Users Who Express Chronic Stress Experiences via Supervised Machine Learning and Natural Language Processing.基于监督机器学习和自然语言处理的 Twitter 用户慢性应激体验自动检测。
Comput Inform Nurs. 2023 Sep 1;41(9):717-724. doi: 10.1097/CIN.0000000000000985.
7
Demographics and topics impact on the co-spread of COVID-19 misinformation and fact-checks on Twitter.人口统计学和话题对推特上新冠疫情错误信息与事实核查的共同传播产生影响。
Inf Process Manag. 2021 Nov;58(6):102732. doi: 10.1016/j.ipm.2021.102732. Epub 2021 Aug 30.
数字疾病监测的伦理与治理
Science. 2020 May 29;368(6494):951-954. doi: 10.1126/science.abb9045. Epub 2020 May 11.
4
Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.通过数据标注促进用于表征药物非医疗用途的可重复研究:Twitter语料库描述及指南
J Med Internet Res. 2020 Feb 26;22(2):e15861. doi: 10.2196/15861.
5
Sentiment Analysis in Health and Well-Being: Systematic Review.健康与幸福中的情感分析:系统综述
JMIR Med Inform. 2020 Jan 28;8(1):e16023. doi: 10.2196/16023.
6
Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.基于机器学习和自然语言处理的地理定位中心监测和特征描述阿片类药物相关社交媒体聊天。
JAMA Netw Open. 2019 Nov 1;2(11):e1914672. doi: 10.1001/jamanetworkopen.2019.14672.
7
Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework.从社交媒体挖掘处方药物滥用监测信息:综述与以数据为中心的框架建议。
J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162.
8
Natural Language Processing of Social Media as Screening for Suicide Risk.社交媒体的自然语言处理用于自杀风险筛查。
Biomed Inform Insights. 2018 Aug 27;10:1178222618792860. doi: 10.1177/1178222618792860. eCollection 2018.
9
Sentiment Analysis of Health Care Tweets: Review of the Methods Used.医疗保健推文的情感分析:所用方法综述
JMIR Public Health Surveill. 2018 Apr 23;4(2):e43. doi: 10.2196/publichealth.5789.
10
Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users' Views, Online Context and Algorithmic Estimation.构建社会研究中发布推特数据的伦理框架:兼顾用户观点、网络环境及算法评估
Sociology. 2017 Dec;51(6):1149-1168. doi: 10.1177/0038038517708140. Epub 2017 May 26.