• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合主题建模、情感分析和语料库语言学来分析基于网络的非结构化患者体验数据:莫达非尼体验的案例研究。

Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences.

作者信息

Walsh Julia, Cave Jonathan, Griffiths Frances

机构信息

Warwick Medical School, University of Warwick, Coventry, United Kingdom.

Department of Economics, University of Warwick, Coventry, United Kingdom.

出版信息

J Med Internet Res. 2024 Dec 11;26:e54321. doi: 10.2196/54321.

DOI:10.2196/54321
PMID:39662896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11669883/
Abstract

BACKGROUND

Patient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneralizable. This study explores combining personal health experiences from multiple sources to create generalizable evidence.

OBJECTIVE

The study aims to (1) investigate how combining unsupervised natural language processing (NLP) and corpus linguistics can explore patient perspectives from a large unstructured dataset of modafinil experiences, (2) compare findings with Cochrane meta-analyses on modafinil's effectiveness, and (3) develop a methodology for analyzing such data.

METHODS

Using 69,022 posts from 790 sources, we used a variety of NLP and corpus techniques to analyze the data, including data cleaning techniques to maximize post context, Python for NLP techniques, and Sketch Engine for linguistic analysis. We used multiple topic mining approaches, such as latent Dirichlet allocation, nonnegative matrix factorization, and word-embedding methods. Sentiment analysis used TextBlob and Valence Aware Dictionary and Sentiment Reasoner, while corpus methods including collocation, concordance, and n-gram generation. Previous work had mapped topic mining to themes, such as health conditions, reasons for taking modafinil, symptom impacts, dosage, side effects, effectiveness, and treatment comparisons.

RESULTS

Key findings of the study included modafinil use across 166 health conditions, most frequently narcolepsy, multiple sclerosis, attention-deficit disorder, anxiety, sleep apnea, depression, bipolar disorder, chronic fatigue syndrome, fibromyalgia, and chronic disease. Word-embedding topic modeling mapped 70% of posts to predefined themes, while sentiment analysis revealed 65% positive responses, 6% neutral responses, and 28% negative responses. Notably, the perceived effectiveness of modafinil for various conditions strongly contrasts with the findings of existing randomized controlled trials and systematic reviews, which conclude insufficient or low-quality evidence of effectiveness.

CONCLUSIONS

This study demonstrated the value of combining NLP with linguistic techniques for analyzing large unstructured text datasets. Despite varying opinions, findings were methodologically consistent and challenged existing clinical evidence. This suggests that patient-generated data could potentially provide valuable insights into treatment outcomes, potentially improving clinical understanding and patient care.

摘要

背景

来自社交媒体的患者体验数据提供了以患者为中心的关于疾病、治疗和医疗服务提供的观点。当前指南通常依赖系统评价,而定性健康研究往往被视为轶事性的且不可推广。本研究探索结合多种来源的个人健康经历以创建可推广的证据。

目的

该研究旨在(1)调查结合无监督自然语言处理(NLP)和语料库语言学如何从大量关于莫达非尼体验的非结构化数据集中探索患者观点,(2)将研究结果与关于莫达非尼有效性的Cochrane荟萃分析进行比较,以及(3)开发一种分析此类数据的方法。

方法

我们使用来自790个来源的69022篇帖子,运用多种NLP和语料库技术来分析数据,包括用于最大化帖子上下文的清洗技术、用于NLP技术的Python以及用于语言分析的Sketch Engine。我们使用了多种主题挖掘方法,如潜在狄利克雷分配、非负矩阵分解和词嵌入方法。情感分析使用TextBlob以及情感感知词典和情感推理器,而语料库方法包括搭配、索引和n元语法生成。之前的工作已将主题挖掘映射到诸如健康状况、服用莫达非尼的原因、症状影响、剂量、副作用、有效性和治疗比较等主题。

结果

该研究的主要发现包括莫达非尼在166种健康状况中的使用,最常见的是发作性睡病、多发性硬化症、注意力缺陷障碍、焦虑症、睡眠呼吸暂停、抑郁症、双相情感障碍、慢性疲劳综合征、纤维肌痛和慢性病。词嵌入主题建模将70%的帖子映射到预定义主题,而情感分析显示65%为积极回应,6%为中性回应,28%为消极回应。值得注意的是,莫达非尼在各种状况下的感知有效性与现有随机对照试验和系统评价的结果形成强烈对比,后者得出有效性证据不足或质量较低的结论。

结论

本研究证明了将NLP与语言技术相结合用于分析大型非结构化文本数据集的价值。尽管存在不同观点,但研究结果在方法上是一致的,并对现有临床证据提出了挑战。这表明患者生成的数据可能为治疗结果提供有价值的见解,有可能改善临床理解和患者护理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/b0c54b624d51/jmir_v26i1e54321_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/1f48dfdc4a46/jmir_v26i1e54321_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/0375b446a94f/jmir_v26i1e54321_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/aa0fa5052e85/jmir_v26i1e54321_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/352bed4915f7/jmir_v26i1e54321_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/4df61a231d42/jmir_v26i1e54321_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/25ef0101a639/jmir_v26i1e54321_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/b0c54b624d51/jmir_v26i1e54321_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/1f48dfdc4a46/jmir_v26i1e54321_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/0375b446a94f/jmir_v26i1e54321_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/aa0fa5052e85/jmir_v26i1e54321_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/352bed4915f7/jmir_v26i1e54321_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/4df61a231d42/jmir_v26i1e54321_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/25ef0101a639/jmir_v26i1e54321_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/b0c54b624d51/jmir_v26i1e54321_fig7.jpg

相似文献

1
Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences.结合主题建模、情感分析和语料库语言学来分析基于网络的非结构化患者体验数据:莫达非尼体验的案例研究。
J Med Internet Res. 2024 Dec 11;26:e54321. doi: 10.2196/54321.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
The Use of Natural Language Processing to Interpret Unstructured Patient Feedback on Health Services: Scoping Review.利用自然语言处理解读关于医疗服务的非结构化患者反馈:范围综述
J Med Internet Res. 2025 Aug 14;27:e72853. doi: 10.2196/72853.
4
Using Natural Language Processing to Explore Social Media Opinions on Food Security: Sentiment Analysis and Topic Modeling Study.使用自然语言处理技术探索社交媒体对食品安全的看法:情感分析和主题建模研究。
J Med Internet Res. 2024 Mar 21;26:e47826. doi: 10.2196/47826.
5
Analyzing Reddit Forums Specific to Abortion That Yield Diverse Dialogues Pertaining to Medical Information Seeking and Personal Worldviews: Data Mining and Natural Language Processing Comparative Study.分析特定于堕胎的 Reddit 论坛,以挖掘和自然语言处理比较研究涉及医学信息寻求和个人世界观的多样化对话。
J Med Internet Res. 2024 Feb 14;26:e47408. doi: 10.2196/47408.
6
Transforming Patient Feedback Into Actionable Insights Through Natural Language Processing: Knowledge Discovery and Action Research Study.通过自然语言处理将患者反馈转化为可操作的见解:知识发现与行动研究
JMIR Form Res. 2025 Aug 26;9:e69699. doi: 10.2196/69699.
7
Using Natural Language Processing to Explore Patient Perspectives on AI Avatars in Support Materials for Patients With Breast Cancer: Survey Study.使用自然语言处理技术探索乳腺癌患者在支持材料中对人工智能化身的看法:调查研究
J Med Internet Res. 2025 Jun 20;27:e70971. doi: 10.2196/70971.
8
Public Perception of the Brain-Computer Interface Based on a Decade of Data on X: Mixed Methods Study.基于X平台十年数据的公众对脑机接口的认知:混合方法研究
JMIR Form Res. 2025 Jun 25;9:e60859. doi: 10.2196/60859.
9
Factors that influence participation in physical activity for people with bipolar disorder: a synthesis of qualitative evidence.影响双相障碍患者参与体育活动的因素:定性证据的综合分析。
Cochrane Database Syst Rev. 2024 Jun 4;6(6):CD013557. doi: 10.1002/14651858.CD013557.pub2.
10
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.

引用本文的文献

1
From Big Data to AI-Driven Decisions in Obstructive Sleep Apnea: A Narrative Review Integrating the DDPP Framework.从大数据到阻塞性睡眠呼吸暂停中人工智能驱动的决策:整合DDPP框架的叙述性综述
Nat Sci Sleep. 2025 Aug 21;17:1863-1882. doi: 10.2147/NSS.S543091. eCollection 2025.
2
Opinion Mining of Erowid's Experience Reports on LSD and Psilocybin-Containing Mushrooms.对Erowid网站上关于LSD和含裸盖菇素蘑菇的体验报告的观点挖掘
Drug Saf. 2025 May;48(5):559-575. doi: 10.1007/s40264-025-01530-z. Epub 2025 Mar 4.

本文引用的文献

1
Interventions for the management of fatigue in adults with a primary brain tumour.成人原发性脑肿瘤患者疲劳管理的干预措施。
Cochrane Database Syst Rev. 2022 Sep 12;9(9):CD011376. doi: 10.1002/14651858.CD011376.pub3.
2
A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.LDA、NMF、Top2Vec和BERTopic用于揭秘推特帖子的主题建模比较
Front Sociol. 2022 May 6;7:886498. doi: 10.3389/fsoc.2022.886498. eCollection 2022.
3
Spontaneously generated online patient experience data - how and why is it being used in health research: an umbrella scoping review.
自发产生的在线患者体验数据——它是如何以及为何被用于健康研究:一项伞式范围综述。
BMC Med Res Methodol. 2022 May 14;22(1):139. doi: 10.1186/s12874-022-01610-z.
4
Using Large-scale Social Media Analytics to Understand Patient Perspectives About Urinary Tract Infections: Thematic Analysis.利用大规模社交媒体分析来了解患者对尿路感染的看法:主题分析
J Med Internet Res. 2022 Jan 25;24(1):e26781. doi: 10.2196/26781.
5
Spontaneously Generated Online Patient Experience of Modafinil: A Qualitative and NLP Analysis.莫达非尼的自发在线患者体验:一项定性与自然语言处理分析
Front Digit Health. 2021 Feb 17;3:598431. doi: 10.3389/fdgth.2021.598431. eCollection 2021.
6
The Use of Social Media for Health Research Purposes: Scoping Review.社交媒体在健康研究中的应用:范围综述。
J Med Internet Res. 2021 May 27;23(5):e25736. doi: 10.2196/25736.
7
Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis.使用主题建模方法处理短文本数据:一项比较分析。
Front Artif Intell. 2020 Jul 14;3:42. doi: 10.3389/frai.2020.00042. eCollection 2020.
8
Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review.应用自然语言处理和机器学习技术于患者体验反馈:系统综述。
BMJ Health Care Inform. 2021 Mar;28(1). doi: 10.1136/bmjhci-2020-100262.
9
Tracking COVID-19 Discourse on Twitter in North America: Infodemiology Study Using Topic Modeling and Aspect-Based Sentiment Analysis.追踪北美地区推特上的 COVID-19 相关言论:使用主题建模和基于方面的情感分析的信息流行病学研究。
J Med Internet Res. 2021 Feb 10;23(2):e25431. doi: 10.2196/25431.
10
Patient Triage by Topic Modeling of Referral Letters: Feasibility Study.通过转诊信主题建模进行患者分诊:可行性研究
JMIR Med Inform. 2020 Nov 6;8(11):e21252. doi: 10.2196/21252.