Suppr超能文献

一种基于 Bi-LSTM 和 BERT 模型的心理健康预测新型文本挖掘方法。

A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model.

机构信息

Department of Artificial Intelligence, Ajou University, Suwon, Republic of Korea.

Department of Software, Sejong University, Republic of Korea.

出版信息

Comput Intell Neurosci. 2022 Mar 3;2022:7893775. doi: 10.1155/2022/7893775. eCollection 2022.

Abstract

With the current advancement in the Internet, there has been a growing demand for building intelligent and smart systems that can efficiently address the detection of health-related problems on social media, such as the detection of depression and anxiety. These types of systems, which are mainly dependent on machine learning techniques, must be able to deal with obtaining the semantic and syntactic meaning of texts posted by users on social media. The data generated by users on social media contains unstructured and unpredictable content. Several systems based on machine learning and social media platforms have recently been introduced to identify health-related problems. However, the text representation and deep learning techniques employed provide only limited information and knowledge about the different texts posted by users. This is owing to a lack of long-term dependencies between each word in the entire text and a lack of proper exploitation of recent deep learning schemes. In this paper, we propose a novel framework to efficiently and effectively identify depression and anxiety-related posts while maintaining the contextual and semantic meaning of the words used in the whole corpus when applying bidirectional encoder representations from transformers (BERT). In addition, we propose a knowledge distillation technique, which is a recent technique for transferring knowledge from a large pretrained model (BERT) to a smaller model to boost performance and accuracy. We also devised our own data collection framework from Reddit and Twitter, which are the most common social media sites. Finally, we employed word2vec and BERT with Bi-LSTM to effectively analyze and detect depression and anxiety signs from social media posts. Our system surpasses other state-of-the-art methods and achieves an accuracy of 98% using the knowledge distillation technique.

摘要

随着互联网的当前发展,对于构建能够有效解决社交媒体中与健康相关问题(例如抑郁和焦虑的检测)的智能和智能系统的需求不断增长。这些系统主要依赖于机器学习技术,必须能够处理获取用户在社交媒体上发布的文本的语义和句法意义。用户在社交媒体上生成的数据包含非结构化和不可预测的内容。最近已经引入了几个基于机器学习和社交媒体平台的系统来识别与健康相关的问题。但是,所采用的文本表示和深度学习技术仅提供了有关用户发布的不同文本的有限信息和知识。这是由于缺乏整个文本中每个单词之间的长期依赖性,以及缺乏对最近深度学习方案的适当利用。在本文中,我们提出了一种新颖的框架,该框架可以有效地识别抑郁和焦虑相关的帖子,同时在应用来自变压器的双向编码器表示(BERT)时保持整个语料库中使用的单词的上下文和语义意义。此外,我们提出了一种知识提炼技术,这是一种从大型预训练模型(BERT)转移知识到较小模型以提高性能和准确性的新技术。我们还设计了自己的从 Reddit 和 Twitter 收集数据的框架,这是最常见的社交媒体网站。最后,我们使用 word2vec 和 BERT 与 Bi-LSTM 有效地分析和检测社交媒体帖子中的抑郁和焦虑迹象。我们的系统超越了其他最先进的方法,并在使用知识提炼技术时达到了 98%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c24/8913054/e842c4de1f7b/CIN2022-7893775.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验