Suppr超能文献

结合主题建模、情感分析和语料库语言学来分析基于网络的非结构化患者体验数据:莫达非尼体验的案例研究。

Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences.

作者信息

Walsh Julia, Cave Jonathan, Griffiths Frances

机构信息

Warwick Medical School, University of Warwick, Coventry, United Kingdom.

Department of Economics, University of Warwick, Coventry, United Kingdom.

出版信息

J Med Internet Res. 2024 Dec 11;26:e54321. doi: 10.2196/54321.

Abstract

BACKGROUND

Patient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneralizable. This study explores combining personal health experiences from multiple sources to create generalizable evidence.

OBJECTIVE

The study aims to (1) investigate how combining unsupervised natural language processing (NLP) and corpus linguistics can explore patient perspectives from a large unstructured dataset of modafinil experiences, (2) compare findings with Cochrane meta-analyses on modafinil's effectiveness, and (3) develop a methodology for analyzing such data.

METHODS

Using 69,022 posts from 790 sources, we used a variety of NLP and corpus techniques to analyze the data, including data cleaning techniques to maximize post context, Python for NLP techniques, and Sketch Engine for linguistic analysis. We used multiple topic mining approaches, such as latent Dirichlet allocation, nonnegative matrix factorization, and word-embedding methods. Sentiment analysis used TextBlob and Valence Aware Dictionary and Sentiment Reasoner, while corpus methods including collocation, concordance, and n-gram generation. Previous work had mapped topic mining to themes, such as health conditions, reasons for taking modafinil, symptom impacts, dosage, side effects, effectiveness, and treatment comparisons.

RESULTS

Key findings of the study included modafinil use across 166 health conditions, most frequently narcolepsy, multiple sclerosis, attention-deficit disorder, anxiety, sleep apnea, depression, bipolar disorder, chronic fatigue syndrome, fibromyalgia, and chronic disease. Word-embedding topic modeling mapped 70% of posts to predefined themes, while sentiment analysis revealed 65% positive responses, 6% neutral responses, and 28% negative responses. Notably, the perceived effectiveness of modafinil for various conditions strongly contrasts with the findings of existing randomized controlled trials and systematic reviews, which conclude insufficient or low-quality evidence of effectiveness.

CONCLUSIONS

This study demonstrated the value of combining NLP with linguistic techniques for analyzing large unstructured text datasets. Despite varying opinions, findings were methodologically consistent and challenged existing clinical evidence. This suggests that patient-generated data could potentially provide valuable insights into treatment outcomes, potentially improving clinical understanding and patient care.

摘要

背景

来自社交媒体的患者体验数据提供了以患者为中心的关于疾病、治疗和医疗服务提供的观点。当前指南通常依赖系统评价,而定性健康研究往往被视为轶事性的且不可推广。本研究探索结合多种来源的个人健康经历以创建可推广的证据。

目的

该研究旨在(1)调查结合无监督自然语言处理(NLP)和语料库语言学如何从大量关于莫达非尼体验的非结构化数据集中探索患者观点,(2)将研究结果与关于莫达非尼有效性的Cochrane荟萃分析进行比较,以及(3)开发一种分析此类数据的方法。

方法

我们使用来自790个来源的69022篇帖子,运用多种NLP和语料库技术来分析数据,包括用于最大化帖子上下文的清洗技术、用于NLP技术的Python以及用于语言分析的Sketch Engine。我们使用了多种主题挖掘方法,如潜在狄利克雷分配、非负矩阵分解和词嵌入方法。情感分析使用TextBlob以及情感感知词典和情感推理器,而语料库方法包括搭配、索引和n元语法生成。之前的工作已将主题挖掘映射到诸如健康状况、服用莫达非尼的原因、症状影响、剂量、副作用、有效性和治疗比较等主题。

结果

该研究的主要发现包括莫达非尼在166种健康状况中的使用,最常见的是发作性睡病、多发性硬化症、注意力缺陷障碍、焦虑症、睡眠呼吸暂停、抑郁症、双相情感障碍、慢性疲劳综合征、纤维肌痛和慢性病。词嵌入主题建模将70%的帖子映射到预定义主题,而情感分析显示65%为积极回应,6%为中性回应,28%为消极回应。值得注意的是,莫达非尼在各种状况下的感知有效性与现有随机对照试验和系统评价的结果形成强烈对比,后者得出有效性证据不足或质量较低的结论。

结论

本研究证明了将NLP与语言技术相结合用于分析大型非结构化文本数据集的价值。尽管存在不同观点,但研究结果在方法上是一致的,并对现有临床证据提出了挑战。这表明患者生成的数据可能为治疗结果提供有价值的见解,有可能改善临床理解和患者护理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51c9/11669883/1f48dfdc4a46/jmir_v26i1e54321_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验