Department of Computer Science, Lakehead University, Oliver Road, Thunder Bay, ON, Canada.
Dept of Math and Computer Science, Brandon University, 270 18th Street, R7A 6A9, Brandon, Canada.
BMC Bioinformatics. 2022 Sep 28;22(Suppl 10):630. doi: 10.1186/s12859-022-04933-2.
BACKGROUND: Twitter is a popular social networking site where short messages or "tweets" of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical ("Bio") content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.
背景:Twitter 是一个广受欢迎的社交网络平台,用户发布的短消息或“推文”被广泛用于研究目的。然而,在挖掘医学专业方面的研究相对较少,例如从用户的个人简介中检测他们的职业。挖掘这些职业可以用于构建高效的推荐系统,以实现具有成本效益的定向广告投放。此外,开发有效的方法来识别用户的职业非常重要,因为传统的分类方法依赖于人类智能开发的特征。虽然,这种结果可能对分类问题有利。然而,对于传统的分类器来说,准确预测用户的医疗职业仍然极具挑战性,因为它涉及到预测多个职业。因此,本研究强调通过用户的公开个人简介(“Bio”)内容预测用户的医疗职业类别。我们通过标注 Twitter 用户的 Bio 内容来进行分析。在本文中,我们提出了一种结合词嵌入和最先进的神经网络模型的方法,包括:长短期记忆(LSTM)、双向 LSTM、门控循环单元、基于转换器的双向编码器表示和 A lite BERT。此外,我们还观察到,通过将词嵌入与神经网络模型相结合,不需要构建任何特定的属性或特征。通过使用词嵌入,Bio 内容被格式化为密集向量,并作为向量序列输入到神经网络模型中。
结果:包括准确性、精度、召回率和 F1 分数在内的性能指标表明,我们将词嵌入与神经网络模型相结合的方法明显优于传统方法。这些分数证明了我们提出的方法在检测用户医疗职业方面优于传统的机器学习技术。在深度学习网络中,ALBERT 的表现最好,F1 得分为 0.90。
结论:在这项研究中,我们提出了一种通过将词嵌入与最先进的神经网络相结合来检测从事医疗领域的 Twitter 用户职业的新方法。我们方法的结果表明,我们的方法可以进一步推进分析社交媒体语料库的过程,而无需费力开发计算成本高昂的特征。
BMC Bioinformatics. 2022-9-28
BMC Bioinformatics. 2018-6-13
Int J Environ Res Public Health. 2019-9-27
Comput Intell Neurosci. 2022
J Med Internet Res. 2022-8-17
J Med Internet Res. 2020-8-12
J Med Internet Res. 2024-3-15
J Am Med Inform Assoc. 2018-1-1
BMC Med Inform Decis Mak. 2020-12-30
Neural Netw. 2012-2-14
Neural Comput. 2000-10
Neural Comput. 1997-11-15