Kim ShinYe, Yang Winson Fu Zun, Jiwani Zishan, Hamm Emily, Singh Shreya
Department of Counseling Psychology, University of Wisconsin-Madison, Madison, WI, United States.
Department of Psychiatry, Massachusetts General Hospital, Cambridge, MA, United States.
J Med Internet Res. 2025 May 13;27:e67506. doi: 10.2196/67506.
The opioid epidemic in the United States remains a major public health concern, with opioid-related deaths increasing more than 8-fold since 1999. Chronic pain, affecting 1 in 5 US adults, is a key contributor to opioid use and misuse. While previous research has explored clinical and behavioral predictors of opioid risk, less attention has been given to large-scale linguistic patterns in public discussions of pain. Social media platforms such as X (formerly Twitter) offer real-time, population-level insights into how individuals express pain, distress, and coping strategies. Understanding these linguistic markers matters because they can reveal underlying psychological states, perceptions of health care access, and community-level opioid risk factors, offering new opportunities for early detection and targeted public health response.
This study aimed to examine linguistic markers of pain communication on the social media platform X and assess whether language patterns differ among US states with high and low opioid mortality rates. We also evaluated the predictive power of these linguistic features using machine learning and identified key thematic structures through semantic network analysis.
We collected 1,438,644 pain-related tweets posted between January and December 2021 using tweepy and snscrape. Tweets from 2 high-opioid mortality states (Ohio and Florida) and 2 low opioid mortality states (South and North Dakota) were selected, resulting in 31,994 tweets from high-death states (HDS) and 750 tweets from low-death states (LDS). Six machine learning algorithms (random forest, k-nearest neighbor, decision tree, naive Bayes, logistic regression, and support vector machine) were applied to predict state-level opioid mortality risk based on linguistic features derived from Linguistic Inquiry and Word Count. Synthetic Minority Oversampling Technique was used to address class imbalance. Semantic network analysis was conducted to visualize co-occurrence patterns and conceptual clustering.
The random forest model demonstrated the strongest predictive performance, with an accuracy of 94.69%, balanced accuracy of 94.69%, κ of 0.89, and an area under the curve of 0.95 (P<.001). Tweets from HDS contained significantly more affective pain words (t=10.84; P<.001; Cohen d=0.12), health care access references, and expressions of distress. LDS tweets showed greater use of authenticity markers (t=-10.04; P<.001) and proactive health-seeking language. Semantic network analysis revealed denser discourse in HDS (density=0.28) focused on distress and barriers to care, while LDS discourse emphasized recovery and optimism.
Our findings demonstrated that linguistic markers in publicly shared pain-related discourse show distinct and predictable differences across regions with varying opioid mortality risks. These linguistic patterns reflect underlying psychological, social, and structural factors that contribute to opioid vulnerability. Importantly, they offer a scalable, real-time resource for identifying at-risk communities. Harnessing social media language analytics can strengthen early detection systems, guide geographically targeted public health messaging, and inform policy efforts aimed at reducing opioid-related harm and improving pain management equity.
美国的阿片类药物泛滥仍然是一个主要的公共卫生问题,自1999年以来,与阿片类药物相关的死亡人数增加了8倍多。慢性疼痛影响着五分之一的美国成年人,是阿片类药物使用和滥用的一个关键因素。虽然先前的研究已经探讨了阿片类药物风险的临床和行为预测因素,但在关于疼痛的公开讨论中,对大规模语言模式的关注较少。像X(前身为推特)这样的社交媒体平台提供了关于个人如何表达疼痛、痛苦和应对策略的实时、人群层面的见解。理解这些语言标记很重要,因为它们可以揭示潜在的心理状态、对医疗保健可及性的看法以及社区层面的阿片类药物风险因素,为早期检测和有针对性的公共卫生应对提供新的机会。
本研究旨在研究社交媒体平台X上疼痛交流的语言标记,并评估阿片类药物死亡率高和低的美国各州之间的语言模式是否存在差异。我们还使用机器学习评估了这些语言特征的预测能力,并通过语义网络分析确定了关键的主题结构。
我们使用tweepy和snscrape收集了2021年1月至12月期间发布的1438644条与疼痛相关的推文。选择了2个阿片类药物死亡率高的州(俄亥俄州和佛罗里达州)和2个阿片类药物死亡率低的州(南达科他州和北达科他州)的推文,从而得到了来自高死亡州(HDS)的31994条推文和来自低死亡州(LDS)的750条推文。应用六种机器学习算法(随机森林、k近邻、决策树、朴素贝叶斯、逻辑回归和支持向量机),根据语言查询与字数统计得出的语言特征来预测州一级的阿片类药物死亡率风险。使用合成少数过采样技术来解决类别不平衡问题。进行语义网络分析以可视化共现模式和概念聚类。
随机森林模型表现出最强的预测性能,准确率为94.69%,平衡准确率为94.69%,κ为0.89,曲线下面积为0.95(P<0.001)。来自HDS的推文包含显著更多的情感性疼痛词汇(t=10.84;P<0.001;Cohen d=0.12)、医疗保健可及性参考以及痛苦表达。LDS的推文显示出更多地使用真实性标记(t=-10.04;P<0.001)和积极寻求健康的语言。语义网络分析显示,HDS中更密集的话语(密度=0.28)集中在痛苦和护理障碍上,而LDS的话语则强调康复和乐观。
我们的研究结果表明,在公开分享的与疼痛相关的话语中,语言标记在阿片类药物死亡率风险不同的地区显示出明显且可预测的差异。这些语言模式反映了导致阿片类药物易感性的潜在心理、社会和结构因素。重要的是,它们为识别高危社区提供了一种可扩展的实时资源。利用社交媒体语言分析可以加强早期检测系统,指导针对特定地理区域的公共卫生信息传递,并为旨在减少与阿片类药物相关的危害和改善疼痛管理公平性的政策努力提供信息。