Marwah Rohit, Mishra Subrat, Gross Benjamin, Couturiaux Sandra, Calara Rico, Sabate Estrella Eduardo Jose, Hogea Cosmina
Definitive Healthcare, 492 Old Connecticut Path Suite 401, Framingham, MA, 01701, United States, 1 5087204224.
Gilead Sciences Inc, Foster City, CA, United States.
J Med Internet Res. 2025 Aug 22;27:e65460. doi: 10.2196/65460.
Social media platforms offer valuable insights into patients' experience, revealing organic conversations that reflect their immediate concerns and needs. Through active listening to lived experiences, we can identify unmet needs and discover the real-world challenges that patients and caregivers face.
The aim of our study is to develop a reusable framework to collect and analyze evolving social media data, capturing insights into the experiences of individuals with myelodysplastic syndromes (MDS) and higher-risk MDS and their caregivers. The findings can inform the development of appropriate patient support interventions.
We conducted a structured Google search of English-language websites relevant to MDS from January 1, 2008, to December 31, 2022, using validated URLs and keywords. Data were sourced from MDS-specific platforms to ensure clinical relevance. Contextual embeddings (rather than simple keyword matching) were applied to detect semantically meaningful mentions of "MDS." Scraping algorithms collected, cleaned, and standardized the data. Posts were classified as originating from patients or caregivers using decision-tree tagging based on contextual summaries. Users were categorized as HR-MDS based on explicit mentions of "high-risk" or by referencing criteria aligned with National Comprehensive Cancer Network guidelines (eg, blast count, transplant, chemotherapy use). Each post was analyzed for major themes and sentiment using a supervised machine learning classifier, while latent topics were identified through a semisupervised model.
We analyzed ~5.5 million words from 42,000 posts across 5500 threads by ~4000 users from the United States, United Kingdom, and Canada. Of the 1249 HR-MDS users identified, 587 (47%) were patients and 662 (53%) were caregivers. Dominant sentiments among HR-MDS users included concern (n=974, 78%), anxiety (n=749, 60%), frustration (n=724, 58%), fear (n=724, 58%), and confusion (n=612, 49%). Concern was the top sentiment among caregivers (n=390, 59%), while anxiety led among patients (n=323, 55%). Key topics included blood counts (n=674, 54%), disease burden (n=537, 43%), quality of life (n=450, 36%), treatment options (n=387, 31%), and disease progression (n=387, 31%). Anxiety was frequently tied to health (n=600, 48%), treatment (n=325, 26%), and the diagnostic process (n=250, 20%). Fear stemmed from complications (n=237, 19%) and progression (n=240, 19%). Confusion about diagnosis and disease understanding was reported by 300 (24%). Information-seeking behaviors revealed user interest in treatment interventions (n=238, 19%) and ongoing research (n=212, 17%).
The application of sophisticated natural language processing techniques demonstrates promise in effectively identifying the emerging complex themes and sentiments experienced by HR-MDS users, thereby highlighting the unmet needs, barriers, and facilitators associated with the disease.
社交媒体平台为了解患者体验提供了宝贵见解,揭示了反映患者当前关切和需求的自然对话。通过积极倾听实际生活经历,我们能够识别未满足的需求,并发现患者及护理人员所面临的现实世界挑战。
我们研究的目的是开发一个可重复使用的框架,用于收集和分析不断演变的社交媒体数据,获取对骨髓增生异常综合征(MDS)及高危MDS患者及其护理人员经历的见解。研究结果可为制定适当的患者支持干预措施提供参考。
我们于2008年1月1日至2022年12月31日,使用经过验证的网址和关键词,对与MDS相关的英语网站进行了结构化谷歌搜索。数据来源于特定于MDS的平台,以确保临床相关性。应用上下文嵌入(而非简单的关键词匹配)来检测语义上有意义的“MDS”提及。抓取算法收集、清理并标准化数据。根据上下文摘要,使用决策树标签将帖子分类为患者或护理人员发布。根据对“高危”的明确提及或参照符合美国国立综合癌症网络指南的标准(如原始细胞计数、移植、化疗使用情况),将用户分类为高危MDS。使用监督式机器学习分类器分析每个帖子的主要主题和情感,同时通过半监督模型识别潜在主题。
我们分析了来自美国、英国和加拿大的约4000名用户在5500个线程中的42000个帖子中的约550万个单词。在识别出的1249名高危MDS用户中,587名(47%)是患者,662名(53%)是护理人员。高危MDS用户中的主要情感包括担忧(n = 974,78%)、焦虑(n = 749,60%)、沮丧(n = 724,58%)、恐惧(n = 724,58%)和困惑(n = 612,49%)。担忧是护理人员中的首要情感(n = 390,59%),而焦虑在患者中最为突出(n = 323,55%)。关键主题包括血细胞计数(n = 674,54%)、疾病负担(n = 537,43%)、生活质量(n = 450,36%)、治疗选择(n = 387,31%)和疾病进展(n = 387,31%)。焦虑常常与健康(n = 600,48%)、治疗(n = 325,26%)和诊断过程(n = 250,20%)相关。恐惧源于并发症(n = 237,19%)和疾病进展(n = 240,19%)。300名(24%)用户报告了对诊断和疾病理解的困惑。信息寻求行为显示用户对治疗干预措施(n = 238,19%)和正在进行的研究(n = 212,17%)感兴趣。
复杂自然语言处理技术的应用显示出在有效识别高危MDS用户所经历的新出现的复杂主题和情感方面的前景,从而突出了与该疾病相关的未满足需求、障碍和促进因素。