文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

通过词嵌入和深度神经网络识别 Twitter 用户的健康相关职业。

Identifying health related occupations of Twitter users through word embedding and deep neural networks.

机构信息

Department of Computer Science, Lakehead University, Oliver Road, Thunder Bay, ON, Canada.

Dept of Math and Computer Science, Brandon University, 270 18th Street, R7A 6A9, Brandon, Canada.

出版信息

BMC Bioinformatics. 2022 Sep 28;22(Suppl 10):630. doi: 10.1186/s12859-022-04933-2.


DOI:10.1186/s12859-022-04933-2
PMID:36171569
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9520792/
Abstract

BACKGROUND: Twitter is a popular social networking site where short messages or "tweets" of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical ("Bio") content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.

摘要

背景:Twitter 是一个广受欢迎的社交网络平台,用户发布的短消息或“推文”被广泛用于研究目的。然而,在挖掘医学专业方面的研究相对较少,例如从用户的个人简介中检测他们的职业。挖掘这些职业可以用于构建高效的推荐系统,以实现具有成本效益的定向广告投放。此外,开发有效的方法来识别用户的职业非常重要,因为传统的分类方法依赖于人类智能开发的特征。虽然,这种结果可能对分类问题有利。然而,对于传统的分类器来说,准确预测用户的医疗职业仍然极具挑战性,因为它涉及到预测多个职业。因此,本研究强调通过用户的公开个人简介(“Bio”)内容预测用户的医疗职业类别。我们通过标注 Twitter 用户的 Bio 内容来进行分析。在本文中,我们提出了一种结合词嵌入和最先进的神经网络模型的方法,包括:长短期记忆(LSTM)、双向 LSTM、门控循环单元、基于转换器的双向编码器表示和 A lite BERT。此外,我们还观察到,通过将词嵌入与神经网络模型相结合,不需要构建任何特定的属性或特征。通过使用词嵌入,Bio 内容被格式化为密集向量,并作为向量序列输入到神经网络模型中。

结果:包括准确性、精度、召回率和 F1 分数在内的性能指标表明,我们将词嵌入与神经网络模型相结合的方法明显优于传统方法。这些分数证明了我们提出的方法在检测用户医疗职业方面优于传统的机器学习技术。在深度学习网络中,ALBERT 的表现最好,F1 得分为 0.90。

结论:在这项研究中,我们提出了一种通过将词嵌入与最先进的神经网络相结合来检测从事医疗领域的 Twitter 用户职业的新方法。我们方法的结果表明,我们的方法可以进一步推进分析社交媒体语料库的过程,而无需费力开发计算成本高昂的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/a4b431ed8410/12859_2022_4933_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/e55a0c06eebc/12859_2022_4933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/d27387a36571/12859_2022_4933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/9931595fab32/12859_2022_4933_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/e063fe83c6c6/12859_2022_4933_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/a4b431ed8410/12859_2022_4933_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/e55a0c06eebc/12859_2022_4933_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/d27387a36571/12859_2022_4933_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/9931595fab32/12859_2022_4933_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/e063fe83c6c6/12859_2022_4933_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e487/9520792/a4b431ed8410/12859_2022_4933_Fig5_HTML.jpg

相似文献

[1]
Identifying health related occupations of Twitter users through word embedding and deep neural networks.

BMC Bioinformatics. 2022-9-28

[2]
Identifying tweets of personal health experience through word embedding and LSTM neural network.

BMC Bioinformatics. 2018-6-13

[3]
Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.

Int J Environ Res Public Health. 2019-9-27

[4]
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.

BMC Med Res Methodol. 2022-7-2

[5]
An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding.

Comput Intell Neurosci. 2022

[6]
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.

J Med Internet Res. 2022-8-17

[7]
Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.

J Med Internet Res. 2020-8-12

[8]
Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review.

J Med Internet Res. 2024-3-15

[9]
Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation.

J Am Med Inform Assoc. 2018-1-1

[10]
Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation.

Sensors (Basel). 2019-4-11

本文引用的文献

[1]
Utilizing deep learning and graph mining to identify drug use on Twitter data.

BMC Med Inform Decis Mak. 2020-12-30

[2]
Multi-column deep neural network for traffic sign classification.

Neural Netw. 2012-2-14

[3]
Learning to forget: continual prediction with LSTM.

Neural Comput. 2000-10

[4]
Long short-term memory.

Neural Comput. 1997-11-15

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索