利用带有词嵌入表示的 Bi-LSTM 递归神经网络挖掘社交媒体中的电子烟不良事件。

Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation.

机构信息

Department of Management Information Systems, University of Arizona, Tucson, AZ, USA.

Department of Operation and Information Systems, University of Utah, Salt Lake City, UT, USA.

出版信息

J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80. doi: 10.1093/jamia/ocx045.

DOI:10.1093/jamia/ocx045

PMID:28505280

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6455898/

Abstract

OBJECTIVE

Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers' e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media.

METHODS

Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network.

RESULTS

Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed.

CONCLUSION

Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.

摘要

目的

近年来，电子烟在全球范围内的使用越来越普及。然而，电子烟的风险尚未得到充分研究。由于实验和调查中的受试者样本量有限，大多数电子烟不良事件研究的检出率都较低。社交媒体为消费者对电子烟的反馈和体验提供了一个大型数据库，这对于电子烟的安全监测很有用。然而，社交媒体中非正式和非技术性的电子烟消费者词汇较难进行自动解释。这个问题阻碍了社交媒体内容在电子烟安全监测中的使用。最近，深度神经网络方法的发展为从嘈杂文本中提取命名实体提供了希望。受这些观察结果的启发，我们旨在设计一种深度神经网络方法，以从社交媒体中提取电子烟的安全信息。

方法

我们的深度神经语言模型利用词嵌入作为文本输入的表示，并使用最先进的双向长短期记忆（Bi-LSTM）递归神经网络识别命名实体类型。

结果

与 3 个基线模型相比，我们的 Bi-LSTM 模型的性能最佳，准确率为 94.10%，召回率为 91.80%，F1 得分为 92.94%。我们从研究测试平台中识别出了 1591 个独特的不良事件和 9930 个独特的电子烟组件（即化学品、口味和设备）。

结论

虽然条件随机场基线模型的准确率略高于我们的方法，但我们的 Bi-LSTM 模型的召回率要高得多，因此 F1 得分最高。我们的方法可以推广到从社交媒体中提取医疗概念，以用于其他医疗应用。

相似文献

Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation.利用带有词嵌入表示的 Bi-LSTM 递归神经网络挖掘社交媒体中的电子烟不良事件。

J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80. doi: 10.1093/jamia/ocx045.

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.社交媒体中的药物警戒：使用带有词嵌入聚类特征的序列标注挖掘药物不良反应提及信息。

J Am Med Inform Assoc. 2015 May;22(3):671-81. doi: 10.1093/jamia/ocu041. Epub 2015 Mar 9.

Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。

Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.

Identifying health related occupations of Twitter users through word embedding and deep neural networks.通过词嵌入和深度神经网络识别 Twitter 用户的健康相关职业。

BMC Bioinformatics. 2022 Sep 28;22(Suppl 10):630. doi: 10.1186/s12859-022-04933-2.

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

A Social Media Study on the Associations of Flavored Electronic Cigarettes With Health Symptoms: Observational Study.一项关于调味电子烟与健康症状关联的社交媒体研究：观察性研究。

J Med Internet Res. 2020 Jun 22;22(6):e17496. doi: 10.2196/17496.

Character level and word level embedding with bidirectional LSTM - Dynamic recurrent neural network for biomedical named entity recognition from literature.基于字符和词的双向 LSTM 嵌入 - 用于从文献中识别生物医学命名实体的动态递归神经网络。

J Biomed Inform. 2020 Dec;112:103609. doi: 10.1016/j.jbi.2020.103609. Epub 2020 Oct 26.

Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.基于本体的推特消息中医疗命名实体识别的递归神经网络方法。

Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts.用于药物警戒的深度学习：用于标记推特帖子中药物不良反应的循环神经网络架构

J Am Med Inform Assoc. 2017 Jul 1;24(4):813-821. doi: 10.1093/jamia/ocw180.

引用本文的文献

Promoting Health Literacy With Human-in-the-Loop Video Understandability Classification of YouTube Videos: Development and Evaluation Study.通过YouTube视频的人在回路视频可理解性分类促进健康素养：开发与评估研究

J Med Internet Res. 2025 Apr 8;27:e56080. doi: 10.2196/56080.

Unveiling the Influence of AI on Advancements in Respiratory Care: Narrative Review.揭示人工智能对呼吸护理进展的影响：叙述性综述

Interact J Med Res. 2024 Dec 20;13:e57271. doi: 10.2196/57271.

What can we learn from a Chinese social media used by glaucoma patients?我们能从青光眼患者使用的中文社交媒体中学到什么？

BMC Ophthalmol. 2023 Nov 20;23(1):470. doi: 10.1186/s12886-023-03208-5.

Can Race-sensitive Biomedical Embeddings Improve Healthcare Predictive Models?种族敏感的生物医学嵌入能否改善医疗保健预测模型？

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:388-397. eCollection 2023.

A scholarly network of AI research with an information science focus: Global North and Global South perspectives.一个以信息科学为重点的人工智能研究学术网络：全球北方和全球南方的视角。

PLoS One. 2022 Apr 15;17(4):e0266565. doi: 10.1371/journal.pone.0266565. eCollection 2022.

Content Analysis of Nicotine Poisoning (Nic Sick) Videos on TikTok: Retrospective Observational Infodemiology Study.基于 TikTok 平台尼古丁中毒（Nic Sick）视频的内容分析：回顾性观察性信息流行病学研究。

J Med Internet Res. 2022 Mar 30;24(3):e34050. doi: 10.2196/34050.

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.在 Instagram 上识别电子烟品牌和口味：自然语言处理分析。

J Med Internet Res. 2022 Jan 18;24(1):e30257. doi: 10.2196/30257.

Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.从药品不良事件记录中识别中文命名实体：基于激进嵌入与动态嵌入相结合的BERT的双向长短期条件随机场（Bi-LSTM-CRF）模型

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

Int J Environ Res Public Health. 2021 Aug 5;18(16):8301. doi: 10.3390/ijerph18168301.

Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations.基于语言模型表示嵌入的半监督双向长短期记忆与条件随机场命名实体识别模型

Entropy (Basel). 2020 Feb 22;22(2):252. doi: 10.3390/e22020252.

本文引用的文献

Trends in E-Cigarette Awareness and Perceived Harmfulness in the U.S.美国电子烟认知和感知危害的趋势

Am J Prev Med. 2017 Mar;52(3):339-346. doi: 10.1016/j.amepre.2016.10.017. Epub 2016 Nov 24.

Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks.结合条件随机场和双向递归神经网络的疾病命名实体识别

Database (Oxford). 2016 Oct 24;2016. doi: 10.1093/database/baw140. Print 2016.

Electronic Cigarette Use Among Adults: United States, 2014.2014年美国成年人使用电子烟情况

NCHS Data Brief. 2015 Oct(217):1-8.

Adverse Drug Event-based Stratification of Tumor Mutations: A Case Study of Breast Cancer Patients Receiving Aromatase Inhibitors.基于药物不良事件的肿瘤突变分层：接受芳香化酶抑制剂治疗的乳腺癌患者的案例研究

AMIA Annu Symp Proc. 2014 Nov 14;2014:1160-9. eCollection 2014.

Utilizing social media data for pharmacovigilance: A review.利用社交媒体数据进行药物警戒：综述

J Biomed Inform. 2015 Apr;54:202-12. doi: 10.1016/j.jbi.2015.02.004. Epub 2015 Feb 23.

Safety evaluation and risk assessment of electronic cigarettes as tobacco cigarette substitutes: a systematic review.电子烟作为传统卷烟替代品的安全性评估和风险评估：系统综述。

Ther Adv Drug Saf. 2014 Apr;5(2):67-86. doi: 10.1177/2042098614524430.

Induced lexico-syntactic patterns improve information extraction from online medical forums.诱导词汇句法模式可提高从在线医疗论坛中提取信息的能力。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):902-9. doi: 10.1136/amiajnl-2014-002669. Epub 2014 Jun 26.

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.应用 MetaMap 对 Medline 进行分析，以在大型临床数据集识别新的关联：可行性分析。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):925-37. doi: 10.1136/amiajnl-2014-002767. Epub 2014 Jun 13.

Electronic cigarettes: human health effects.电子烟：对人类健康的影响

Tob Control. 2014 May;23 Suppl 2(Suppl 2):ii36-40. doi: 10.1136/tobaccocontrol-2013-051470.

Electronic cigarettes and conventional cigarette use among U.S. adolescents: a cross-sectional study.美国青少年使用电子烟和传统香烟情况：一项横断面研究。

JAMA Pediatr. 2014 Jul;168(7):610-7. doi: 10.1001/jamapediatrics.2013.5488.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验