Lossio-Ventura Juan Antonio, Morzan Juandiego, Alatrista-Salas Hugo, Hernandez-Boussard Tina, Bian Jiang
Department of Medicine, Biomedical Informatics, Stanford University, USA.
School of Engineering, Universidad del Pacífico, Lima, Peru.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1544-1547. doi: 10.1109/bibm47256.2019.8983167. Epub 2020 Feb 6.
Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.
推特成为医疗领域最受欢迎的社交互动形式。因此,各个团队已将推特评估为患者分享其医疗保健信息的额外来源,潜在目标是改善他们的治疗结果。一些现有的主题建模和文档聚类应用程序已被改编用于评估推文,结果表明这些应用程序的性能因推文的性质和特征而受到负面影响。此外,由于现有应用程序之间缺乏比较,推特健康研究变得难以衡量。在本文中,我们基于不同主题建模和文档聚类应用程序的内部指标,对两个与推特健康相关的数据集进行了评估。我们的结果表明,在线推特LDA和吉布斯LDA在提取主题和对推文进行分组方面表现更好。我们希望为医疗从业者提供这种比较,以便他们为自己的任务选择最合适的应用程序。