Suppr超能文献

基于词向量卷积神经网络的新闻文章和推文分类。

Word2vec convolutional neural networks for classification of news articles and tweets.

机构信息

Department of computer science, Sangmyung University, Seoul, South Korea.

出版信息

PLoS One. 2019 Aug 22;14(8):e0220976. doi: 10.1371/journal.pone.0220976. eCollection 2019.

Abstract

Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Convolutional Neural Networks (CNNs) to classify news articles and tweets into related and unrelated ones. Using two word embedding algorithms of word2vec, Continuous Bag-of-Word (CBOW) and Skip-gram, we constructed CNN with the CBOW model and CNN with the Skip-gram model. We measured the classification accuracy of CNN with CBOW, CNN with Skip-gram, and CNN without word2vec models for real news articles and tweets. The experimental results indicated that word2vec significantly improved the accuracy of the classification model. The accuracy of the CBOW model was higher and more stable when compared to that of the Skip-gram model. The CBOW model exhibited better performance on news articles, and the Skip-gram model exhibited better performance on tweets. Specifically, CNN with word2vec models was more effective on news articles when compared to that on tweets because news articles are typically more uniform when compared to tweets.

摘要

大型网络数据资源包括在线新闻和 Twitter,这些都是用于调查深度学习的良好资源。然而,收集到的新闻文章和推文几乎肯定包含了对学习不必要的数据,这会干扰准确的学习。本文探讨了词向量卷积神经网络(CNN)的性能,以将新闻文章和推文分为相关和不相关两类。我们使用了两种词向量算法,连续袋词(CBOW)和跳词(Skip-gram),构建了基于 CBOW 模型的 CNN 和基于 Skip-gram 模型的 CNN。我们测量了基于 CBOW 的 CNN、基于 Skip-gram 的 CNN 和没有词向量模型的 CNN 对真实新闻文章和推文的分类准确率。实验结果表明,词向量显著提高了分类模型的准确率。与 Skip-gram 模型相比,CBOW 模型的准确率更高且更稳定。在新闻文章上,CBOW 模型的表现优于 Skip-gram 模型;而在推文中,Skip-gram 模型的表现优于 CBOW 模型。具体来说,与推文相比,词向量模型的 CNN 在新闻文章上的效果更好,因为新闻文章通常比推文更统一。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65f2/6705863/ba0fb5bd8423/pone.0220976.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验