Suppr超能文献

用于衡量公众健康担忧的推特情感分类

Twitter sentiment classification for measuring public health concerns.

作者信息

Ji Xiang, Chun Soon Ae, Wei Zhi, Geller James

机构信息

1Department of Computer Science, New Jersey Institute of Technology, Martin Luther King Blvd, Newark, NJ 07102 USA.

2City University of New York, College of Staten Island, 2800 Victory Blvd, Staten Island, NY 10314 USA.

出版信息

Soc Netw Anal Min. 2015;5(1):13. doi: 10.1007/s13278-015-0253-5. Epub 2015 May 12.

Abstract

An important task of public health officials is to keep track of health issues, such as spreading epidemics. In this paper, we are addressing the issue of spreading epidemics. Public concern about a communicable disease can be seen as a problem of its own. Keeping track of trends in concern about public health and identifying peaks of public concern are therefore crucial tasks. However, monitoring public health concerns is not only expensive with traditional surveillance systems, but also suffers from limited coverage and significant delays. To address these problems, we are using Twitter messages, which are available free of cost, are generated world-wide, and are posted in real time. We are measuring public concern using a two-step sentiment classification approach. In the first step, we distinguish Personal tweets from News (i.e., Non-Personal) tweets. In the second step, we further separate Personal Negative from Personal Non-Negative tweets. Both these steps consist themselves of two sub-steps. In the first sub-step (of both steps), our programs automatically generate training data using an emotion-oriented, clue-based method. In the second sub-step, we are training and testing three different Machine Learning (ML) models with the training data from the first sub-step; this allows us to determine the best ML model for different datasets. Furthermore, we are testing the already trained ML models with a human annotated, disjoint dataset. Based on the number of tweets classified as Personal Negative, we compute a Measure of Concern (MOC) and a timeline of the MOC. We attempt to correlate peaks of the MOC timeline to peaks of the News (Non-Personal) timeline. Our best accuracy results are achieved using the two-step method with a Naïve Bayes classifier for the Epidemic domain (six datasets) and the Mental Health domain (three datasets).

摘要

公共卫生官员的一项重要任务是跟踪健康问题,例如正在传播的流行病。在本文中,我们正在探讨流行病传播的问题。公众对传染病的关注本身可被视为一个问题。因此,跟踪公众健康关注趋势并确定公众关注的高峰是至关重要的任务。然而,使用传统监测系统监测公众健康关注不仅成本高昂,而且覆盖面有限且存在显著延迟。为了解决这些问题,我们正在使用推特消息,这些消息免费可得,在全球范围内生成,并实时发布。我们使用两步情感分类方法来衡量公众关注。第一步,我们将个人推文与新闻(即非个人)推文区分开来。第二步,我们进一步将个人负面推文与个人非负面推文分开。这两个步骤本身又都由两个子步骤组成。在(两个步骤的)第一个子步骤中,我们的程序使用基于情感导向、线索的方法自动生成训练数据。在第二个子步骤中,我们使用第一个子步骤的训练数据对三种不同的机器学习(ML)模型进行训练和测试;这使我们能够为不同的数据集确定最佳的ML模型。此外,我们使用人工标注的不相交数据集对已经训练好的ML模型进行测试。基于被分类为个人负面的推文数量,我们计算关注度量(MOC)和MOC的时间线。我们试图将MOC时间线的峰值与新闻(非个人)时间线的峰值相关联。我们使用两步法并结合朴素贝叶斯分类器在流行病领域(六个数据集)和心理健康领域(三个数据集)取得了最佳准确率结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6f6/7096866/181945c89f4a/13278_2015_253_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验