Gupta Ravish, Gupta Matrika, Calix Ricardo A, Bernard Gordon R
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:1174-1177. doi: 10.1109/EMBC.2017.8037039.
Twitter, as a social media platform, has become an increasingly useful data source for health surveillance studies, and personal health experiences shared on Twitter provide valuable information to the surveillance. Twitter data are known for their irregular usages of languages and informal short texts due to the 140 character limit, and for their noisiness such that majority of the posts are irrelevant to any particular health surveillance. These factors pose challenges in identifying personal health experience tweets from the Twitter data. In this study, we designed deep neural networks with 3 different architectural configurations, and after training them with a corpus of 8,770 annotated tweets, we used them to predict personal experience tweets from a set of 821 annotate tweets. Our results demonstrated a significant amount of improvement in predicting personal health experience tweets by deep neural networks over that by conventional classifiers: 37.5% in accuracy, 31.1% in precision, and 53.6% in recall. We believe that our method can be utilized in various health surveillance studies using Twitter as a data source.
推特作为一个社交媒体平台,已成为健康监测研究中越来越有用的数据来源,在推特上分享的个人健康经历为监测提供了有价值的信息。推特数据因其140字符的限制导致语言使用不规范和文本简短随意,且大部分帖子与任何特定的健康监测无关而嘈杂。这些因素给从推特数据中识别个人健康经历推文带来了挑战。在本研究中,我们设计了具有3种不同架构配置的深度神经网络,并用8770条带注释推文的语料库对其进行训练后,使用它们从一组821条带注释推文中预测个人经历推文。我们的结果表明,与传统分类器相比,深度神经网络在预测个人健康经历推文方面有显著改进:准确率提高37.5%,精确率提高31.1%,召回率提高53.6%。我们相信我们的方法可用于以推特为数据源的各种健康监测研究。