Suppr超能文献

通过词嵌入和 LSTM 神经网络识别个人健康体验的推文。

Identifying tweets of personal health experience through word embedding and LSTM neural network.

机构信息

Department of Computer Information Technology and Graphics, Purdue University Northwest, Hammond, IN, USA.

Department of Medicine, Vanderbilt University, Nashville, TN, USA.

出版信息

BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):210. doi: 10.1186/s12859-018-2198-y.

Abstract

BACKGROUND

As Twitter has become an active data source for health surveillance research, it is important that efficient and effective methods are developed to identify tweets related to personal health experience. Conventional classification algorithms rely on features engineered by human domain experts, and engineering such features is a challenging task and requires much human intelligence. The resultant features may not be optimal for the classification problem, and can make it challenging for conventional classifiers to correctly predict personal experience tweets (PETs) due to the various ways to express and/or describe personal experience in tweets. In this study, we developed a method that combines word embedding and long short-term memory (LSTM) model without the need to engineer any specific features. Through word embedding, tweet texts were represented as dense vectors which in turn were fed to the LSTM neural network as sequences.

RESULTS

Statistical analyses of the results of 10-fold cross-validations of our method and conventional methods indicate that there exist significant differences (p < 0.01) in performance measures of accuracy, precision, recall, F1-score, and ROC/AUC, demonstrating that our approach outperforms the conventional methods in identifying PETs.

CONCLUSION

We presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network. It is conceivable that our method can help accelerate and scale up analyzing textual data of social media for health surveillance purposes, because of no need for the laborious and costly process of engineering features.

摘要

背景

随着 Twitter 成为健康监测研究的活跃数据源,开发有效的方法来识别与个人健康体验相关的推文变得尤为重要。传统的分类算法依赖于人类领域专家设计的特征,而设计这些特征是一项具有挑战性的任务,需要大量的人类智慧。由此产生的特征可能不是分类问题的最佳选择,由于在推文中表达和/或描述个人体验的方式多种多样,这使得传统分类器难以正确预测个人体验推文 (PETs)。在这项研究中,我们开发了一种结合词嵌入和长短期记忆 (LSTM) 模型的方法,无需设计任何特定特征。通过词嵌入,推文文本被表示为密集向量,然后作为序列输入到 LSTM 神经网络中。

结果

对我们的方法和传统方法的 10 折交叉验证结果的统计分析表明,在准确性、精度、召回率、F1 得分和 ROC/AUC 等性能指标上存在显著差异 (p < 0.01),这表明我们的方法在识别 PETs 方面优于传统方法。

结论

我们通过结合词嵌入和 LSTM 神经网络,提出了一种有效识别与健康相关的个人体验推文的方法。可以想象,由于无需进行繁琐且昂贵的特征设计过程,我们的方法可以帮助加速和扩大社交媒体文本数据的分析,用于健康监测目的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51fe/5998756/9c67fa952170/12859_2018_2198_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验