Suppr超能文献

利用社交媒体数据,采用自然语言处理方法研究霍奇金淋巴瘤的疾病及治疗特征。

Leveraging social media data to study disease and treatment characteristics of Hodgkin's lymphoma Using Natural Language Processing methods.

作者信息

Siddiqui Zasim Azhar, Pathan Maryam, Nduaguba Sabina, LeMasters Traci, Scott Virginia G, Sambamoorthi Usha, Patel Jay S

机构信息

Department of Pharmaceutical Systems and Policy, School of Pharmacy, West Virginia University, Morgantown, West Virginia, United States of America.

Real World Evidence, OPEN Health Evidence & Access, United States of America.

出版信息

PLOS Digit Health. 2025 Mar 19;4(3):e0000765. doi: 10.1371/journal.pdig.0000765. eCollection 2025 Mar.

Abstract

BACKGROUND

The use of social media platforms in health research is increasing, yet their application in studying rare diseases is limited. Hodgkin's lymphoma (HL) is a rare malignancy with a high incidence in young adults. This study evaluates the feasibility of using social media data to study the disease and treatment characteristics of HL.

METHODS

We utilized the X (formerly Twitter) API v2 developer portal to download posts (formerly tweets) from January 2010 to October 2022. Annotation guidelines were developed from literature and a manual review of limited posts was performed to identify the class and attributes (characteristics) of HL discussed on X, and create a gold standard dataset. This dataset was subsequently employed to train, test, and validate a Named Entity Recognition (NER) Natural Language Processing (NLP) application.

RESULTS

After data preparation, 80,811 posts were collected: 500 for annotation guideline development, 2,000 for NLP application development, and the remaining 78,311 for deploying the application. We identified nine classes related to HL, such as HL classification, etiopathology, stages and progression, and treatment. The treatment class and HL stages and progression were the most frequently discussed, with 20,013 (25.56%) posts mentioning HL's treatments and 17,177 (21.93%) mentioning HL stages and progression. The model exhibited robust performance, achieving 86% accuracy and an 87% F1 score. The etiopathology class demonstrated excellent performance, with 93% accuracy and a 95% F1 score.

DISCUSSION

The NLP application displayed high efficacy in extracting and characterizing HL-related information from social media posts, as evidenced by the high F1 score. Nonetheless, the data presented limitations in distinguishing between patients, providers, and caregivers and in establishing the temporal relationships between classes and attributes. Further research is necessary to bridge these gaps.

CONCLUSION

Our study demonstrated potential of using social media as a valuable preliminary research source for understanding the characteristics of rare diseases such as Hodgkin's Lymphoma.

摘要

背景

社交媒体平台在健康研究中的应用日益增加,但其在罕见病研究中的应用却很有限。霍奇金淋巴瘤(HL)是一种在年轻人中发病率较高的罕见恶性肿瘤。本研究评估了使用社交媒体数据研究HL疾病及治疗特征的可行性。

方法

我们利用X(原推特)API v2开发者门户下载了2010年1月至2022年10月的帖子(原推文)。从文献中制定注释指南,并对有限的帖子进行人工审核,以确定在X上讨论的HL的类别和属性(特征),并创建一个黄金标准数据集。该数据集随后被用于训练、测试和验证命名实体识别(NER)自然语言处理(NLP)应用程序。

结果

经过数据准备,共收集了80,811条帖子:500条用于注释指南制定,2,000条用于NLP应用程序开发,其余78,311条用于应用程序部署。我们确定了与HL相关的九个类别,如HL分类、病因病理学、分期和进展以及治疗。治疗类别以及HL分期和进展是讨论最频繁的,分别有20,013条(25.56%)帖子提及HL的治疗方法,17,177条(21.93%)帖子提及HL分期和进展。该模型表现出强大的性能,准确率达到86%,F1分数为87%。病因病理学类别表现出色,准确率为93%,F1分数为95%。

讨论

NLP应用程序在从社交媒体帖子中提取和表征与HL相关的信息方面显示出高效性,高F1分数证明了这一点。尽管如此,数据在区分患者、医疗服务提供者和护理人员以及建立类别和属性之间的时间关系方面存在局限性。需要进一步研究来弥合这些差距。

结论

我们的研究证明了利用社交媒体作为了解霍奇金淋巴瘤等罕见病特征的有价值的初步研究来源的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/948d/11922232/7555b4ca7424/pdig.0000765.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验