Department of Management Information Systems, University of Arizona, Tucson, AZ, USA.
Department of Operation and Information Systems, University of Utah, Salt Lake City, UT, USA.
J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80. doi: 10.1093/jamia/ocx045.
OBJECTIVE: Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers' e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. METHODS: Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. RESULTS: Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. CONCLUSION: Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.
目的:近年来,电子烟在全球范围内的使用越来越普及。然而,电子烟的风险尚未得到充分研究。由于实验和调查中的受试者样本量有限,大多数电子烟不良事件研究的检出率都较低。社交媒体为消费者对电子烟的反馈和体验提供了一个大型数据库,这对于电子烟的安全监测很有用。然而,社交媒体中非正式和非技术性的电子烟消费者词汇较难进行自动解释。这个问题阻碍了社交媒体内容在电子烟安全监测中的使用。最近,深度神经网络方法的发展为从嘈杂文本中提取命名实体提供了希望。受这些观察结果的启发,我们旨在设计一种深度神经网络方法,以从社交媒体中提取电子烟的安全信息。
方法:我们的深度神经语言模型利用词嵌入作为文本输入的表示,并使用最先进的双向长短期记忆(Bi-LSTM)递归神经网络识别命名实体类型。
结果:与 3 个基线模型相比,我们的 Bi-LSTM 模型的性能最佳,准确率为 94.10%,召回率为 91.80%,F1 得分为 92.94%。我们从研究测试平台中识别出了 1591 个独特的不良事件和 9930 个独特的电子烟组件(即化学品、口味和设备)。
结论:虽然条件随机场基线模型的准确率略高于我们的方法,但我们的 Bi-LSTM 模型的召回率要高得多,因此 F1 得分最高。我们的方法可以推广到从社交媒体中提取医疗概念,以用于其他医疗应用。
J Am Med Inform Assoc. 2018-1-1
BMC Bioinformatics. 2022-9-28
Int J Environ Res Public Health. 2020-3-2
Int J Environ Res Public Health. 2019-9-27
BMC Med Inform Decis Mak. 2017-7-5
Interact J Med Res. 2024-12-20
BMC Ophthalmol. 2023-11-20
AMIA Jt Summits Transl Sci Proc. 2023-6-16
J Med Internet Res. 2022-3-30
Int J Environ Res Public Health. 2021-8-5
Am J Prev Med. 2016-11-24
Database (Oxford). 2016-10-24
NCHS Data Brief. 2015-10
J Biomed Inform. 2015-4
J Am Med Inform Assoc. 2014-6-26
J Am Med Inform Assoc. 2014-6-13
Tob Control. 2014-5