推特上的聚类与主题建模：基于健康数据集的比较

Clustering and topic modeling over tweets: A comparison over a health dataset.

作者信息

Lossio-Ventura Juan Antonio, Morzan Juandiego, Alatrista-Salas Hugo, Hernandez-Boussard Tina, Bian Jiang

机构信息

Department of Medicine, Biomedical Informatics, Stanford University, USA.

School of Engineering, Universidad del Pacífico, Lima, Peru.

出版信息

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1544-1547. doi: 10.1109/bibm47256.2019.8983167. Epub 2020 Feb 6.

DOI:10.1109/bibm47256.2019.8983167

PMID:35463811

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9028681/

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

摘要

推特成为医疗领域最受欢迎的社交互动形式。因此，各个团队已将推特评估为患者分享其医疗保健信息的额外来源，潜在目标是改善他们的治疗结果。一些现有的主题建模和文档聚类应用程序已被改编用于评估推文，结果表明这些应用程序的性能因推文的性质和特征而受到负面影响。此外，由于现有应用程序之间缺乏比较，推特健康研究变得难以衡量。在本文中，我们基于不同主题建模和文档聚类应用程序的内部指标，对两个与推特健康相关的数据集进行了评估。我们的结果表明，在线推特LDA和吉布斯LDA在提取主题和对推文进行分组方面表现更好。我们希望为医疗从业者提供这种比较，以便他们为自己的任务选择最合适的应用程序。

相似文献

Clustering and topic modeling over tweets: A comparison over a health dataset.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1544-1547. doi: 10.1109/bibm47256.2019.8983167. Epub 2020 Feb 6.

Evaluation of clustering and topic modeling methods over health-related tweets and emails.

Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.

Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection.

J Med Internet Res. 2016 Aug 29;18(8):e232. doi: 10.2196/jmir.6045.

Extracting health-related causality from twitter messages using natural language processing.

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):79. doi: 10.1186/s12911-019-0785-0.

What are IBD Patients Talking About on Twitter? Using Natural Language Understanding to Investigate Patients' Tweets.

SN Comput Sci. 2023;4(4):343. doi: 10.1007/s42979-023-01772-7. Epub 2023 Apr 20.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?

J Med Internet Res. 2015 Jun 19;17(6):e154. doi: 10.2196/jmir.4220.

Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data.

J Med Internet Res. 2015 Dec 16;17(12):e280. doi: 10.2196/jmir.5144.

US Black Maternal Health Advocacy Topics and Trends on Twitter: Temporal Infoveillance Study.

JMIR Infodemiology. 2022 Apr 20;2(1):e30885. doi: 10.2196/30885. eCollection 2022 Jan-Jun.

Online Influence and Sentiment of Fitness Tweets: Analysis of Two Million Fitness Tweets.

JMIR Public Health Surveill. 2017 Oct 31;3(4):e82. doi: 10.2196/publichealth.8507.

引用本文的文献

A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information.

Sci Rep. 2024 Dec 30;14(1):32051. doi: 10.1038/s41598-024-83743-3.

An integrated clustering and BERT framework for improved topic modeling.

Int J Inf Technol. 2023;15(4):2187-2195. doi: 10.1007/s41870-023-01268-w. Epub 2023 May 6.

Evaluation of clustering and topic modeling methods over health-related tweets and emails.

Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.

本文引用的文献

Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013-2017.

BMJ Open. 2019 Jan 15;9(1):e024018. doi: 10.1136/bmjopen-2018-024018.

Utilizing Twitter data for analysis of chemotherapy.

Int J Med Inform. 2018 Dec;120:92-100. doi: 10.1016/j.ijmedinf.2018.10.002. Epub 2018 Oct 9.

A novel framework for biomedical entity sense induction.

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

Cancer and Social Media: A Comparison of Traffic about Breast Cancer, Prostate Cancer, and Other Reproductive Cancers on Twitter and Instagram.

J Health Commun. 2018;23(2):181-189. doi: 10.1080/10810730.2017.1421730. Epub 2018 Jan 9.

Social media for patients: benefits and drawbacks.

Curr Rev Musculoskelet Med. 2017 Mar;10(1):141-145. doi: 10.1007/s12178-017-9394-7.

Twitter as a Tool for Health Research: A Systematic Review.

Am J Public Health. 2017 Jan;107(1):e1-e8. doi: 10.2105/AJPH.2016.303512. Epub 2016 Nov 17.

Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality.

JMIR Ment Health. 2016 May 16;3(2):e21. doi: 10.2196/mental.4822.

TOWARDS EARLY DISCOVERY OF SALIENT HEALTH THREATS: A SOCIAL MEDIA EMOTION CLASSIFICATION TECHNIQUE.

Pac Symp Biocomput. 2016;21:504-15.

PLoS One. 2013;8(2):e56221. doi: 10.1371/journal.pone.0056221. Epub 2013 Feb 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

推特上的聚类与主题建模：基于健康数据集的比较

Clustering and topic modeling over tweets: A comparison over a health dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献