# 慢性疼痛:利用机器学习从推特自动构建慢性疼痛队列

#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning.

作者信息

Sarker Abeed, Lakamana Sahithi, Guo Yuting, Ge Yao, Leslie Abimbola, Okunromade Omolola, Gonzalez-Polledo Elena, Perrone Jeanmarie, McKenzie-Brown Anne Marie

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA.

Department of Radiology, Robert Larner College of Medicine, University of Vermont, Burlington, VT, USA.

出版信息

Health Data Sci. 2023;3. doi: 10.34133/hds.0078. Epub 2023 Jul 4.

Abstract

BACKGROUND

Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers.

METHODS

We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses.

RESULTS

Interannotator agreement for the binary annotation was 0.82 (Cohen's kappa). The RoBERTa model performed best (F score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies.

CONCLUSION

Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.

摘要

背景

由于慢性疼痛负担沉重,且使用阿片类药物治疗会对公众健康产生有害影响,因此迫切需要确定有效的替代疗法。社交媒体对于了解慢性疼痛患者自我报告的疗法而言,是一个潜在的宝贵资源。

方法

我们试图(a)验证推特上是否存在大规模与慢性疼痛相关的讨论,(b)开发自然语言处理和机器学习方法以自动检测自我披露内容,(c)收集他们发布的纵向数据,以及(d)半自动分析他们报告的慢性疼痛相关信息的类型。我们使用与慢性疼痛相关的主题标签和关键词收集数据,并手动标注4998条帖子,以表明它们是否为慢性疼痛经历的自我报告。我们训练并评估了几种最先进的监督式文本分类模型,并部署了表现最佳的分类器。我们收集了检测到的队列成员的所有公开帖子,并进行了手动和自然语言处理驱动的描述性分析。

结果

二元注释的注释者间一致性为0.82(科恩kappa系数)。RoBERTa模型表现最佳(F分数:0.84;95%置信区间:0.80至0.89),我们使用该模型对所有收集到的未标记帖子进行分类。我们发现了22795名自我报告的慢性疼痛患者,并收集了他们过去的300多万条帖子。进一步分析揭示了有关替代治疗、患者对治疗的看法、副作用和自我管理策略等信息,但不限于这些。

结论

我们基于社交媒体的方法将随着时间的推移自动形成一个不断扩大的大型队列,这些数据可用于确定针对各种慢性疼痛类型的有效的阿片类药物替代疗法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbe2/10880168/ece884aa0de2/hds.0078.fig.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索