Suppr超能文献

利用带有词嵌入表示的 Bi-LSTM 递归神经网络挖掘社交媒体中的电子烟不良事件。

Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation.

机构信息

Department of Management Information Systems, University of Arizona, Tucson, AZ, USA.

Department of Operation and Information Systems, University of Utah, Salt Lake City, UT, USA.

出版信息

J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80. doi: 10.1093/jamia/ocx045.

Abstract

OBJECTIVE

Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers' e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media.

METHODS

Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network.

RESULTS

Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed.

CONCLUSION

Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.

摘要

目的

近年来,电子烟在全球范围内的使用越来越普及。然而,电子烟的风险尚未得到充分研究。由于实验和调查中的受试者样本量有限,大多数电子烟不良事件研究的检出率都较低。社交媒体为消费者对电子烟的反馈和体验提供了一个大型数据库,这对于电子烟的安全监测很有用。然而,社交媒体中非正式和非技术性的电子烟消费者词汇较难进行自动解释。这个问题阻碍了社交媒体内容在电子烟安全监测中的使用。最近,深度神经网络方法的发展为从嘈杂文本中提取命名实体提供了希望。受这些观察结果的启发,我们旨在设计一种深度神经网络方法,以从社交媒体中提取电子烟的安全信息。

方法

我们的深度神经语言模型利用词嵌入作为文本输入的表示,并使用最先进的双向长短期记忆(Bi-LSTM)递归神经网络识别命名实体类型。

结果

与 3 个基线模型相比,我们的 Bi-LSTM 模型的性能最佳,准确率为 94.10%,召回率为 91.80%,F1 得分为 92.94%。我们从研究测试平台中识别出了 1591 个独特的不良事件和 9930 个独特的电子烟组件(即化学品、口味和设备)。

结论

虽然条件随机场基线模型的准确率略高于我们的方法,但我们的 Bi-LSTM 模型的召回率要高得多,因此 F1 得分最高。我们的方法可以推广到从社交媒体中提取医疗概念,以用于其他医疗应用。

相似文献

9
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

引用本文的文献

本文引用的文献

1
Trends in E-Cigarette Awareness and Perceived Harmfulness in the U.S.美国电子烟认知和感知危害的趋势
Am J Prev Med. 2017 Mar;52(3):339-346. doi: 10.1016/j.amepre.2016.10.017. Epub 2016 Nov 24.
5
Utilizing social media data for pharmacovigilance: A review.利用社交媒体数据进行药物警戒:综述
J Biomed Inform. 2015 Apr;54:202-12. doi: 10.1016/j.jbi.2015.02.004. Epub 2015 Feb 23.
9
Electronic cigarettes: human health effects.电子烟:对人类健康的影响
Tob Control. 2014 May;23 Suppl 2(Suppl 2):ii36-40. doi: 10.1136/tobaccocontrol-2013-051470.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验