• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

词嵌入助力新闻文章中的主题识别。

Word embedding empowered topic recognition in news articles.

作者信息

Kaleem Sidrah, Jalil Zakia, Nasir Muhammad, Alazab Moutaz

机构信息

Department of Computer Science, International Islamic University, Islamabad, Islamabad, Islamabad, Pakistan.

Department of Data Science & Artificial Intelligence, International Islamic University, Islamabad, Islamabad Capital Territory, Pakistan.

出版信息

PeerJ Comput Sci. 2024 Dec 11;10:e2300. doi: 10.7717/peerj-cs.2300. eCollection 2024.

DOI:10.7717/peerj-cs.2300
PMID:39896382
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784532/
Abstract

Advancements in technology have placed global news at our fingertips, anytime, anywhere, through social media and online news sources. Analyzing the extensive electronic text collections is urgently needed. According to the scholars, combining the topic and word embedding models could improve text representation and help with downstream tasks related to natural language processing. However, the field of news topic recognition lacks a standardized approach to integrating topic models and word embedding models. This presents an exciting opportunity for research, as existing algorithms tend to be overly complex and miss out on the potential benefits of fusion. To overcome limitations in news text topic recognition, this research suggests a new technique word embedding latent Dirichlet allocation that combines topic models and word embeddings for better news topic recognition. This framework seamlessly integrates probabilistic topic modeling using latent Dirichlet allocation with Gibbs sampling, semantic insights from Word2Vec embeddings, and syntactic relationships to extract comprehensive text representations. Popular classifiers leverage these representations to perform automatic and precise news topic identification. Consequently, our framework seamlessly integrates document-topic relationships and contextual information, enabling superior performance, enhanced expressiveness, and efficient dimensionality reduction. Our word embedding method significantly outperforms existing approaches, reaching 88% and 97% accuracy on 20NewsGroup and BBC News in news topic recognition.

摘要

技术的进步让全球新闻随时随地触手可及,通过社交媒体和在线新闻来源即可获取。迫切需要对大量的电子文本集进行分析。据学者称,将主题模型和词嵌入模型相结合可以改善文本表示,并有助于处理与自然语言处理相关的下游任务。然而,新闻主题识别领域缺乏一种将主题模型和词嵌入模型整合在一起的标准化方法。这为研究提供了一个令人兴奋的机会,因为现有的算法往往过于复杂,错过了融合的潜在好处。为了克服新闻文本主题识别中的局限性,本研究提出了一种新技术——词嵌入潜在狄利克雷分配,它将主题模型和词嵌入相结合,以实现更好的新闻主题识别。该框架将使用潜在狄利克雷分配和吉布斯采样的概率主题建模、来自Word2Vec嵌入的语义洞察以及句法关系无缝集成,以提取全面的文本表示。流行的分类器利用这些表示来执行自动且精确的新闻主题识别。因此,我们的框架无缝集成了文档-主题关系和上下文信息,实现了卓越的性能、增强的表现力和高效的降维。我们的词嵌入方法在新闻主题识别方面显著优于现有方法,在20新闻组和BBC新闻数据集上的准确率分别达到88%和97%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/c541d50f0f15/peerj-cs-10-2300-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/834ef0f8643d/peerj-cs-10-2300-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/10f3d1a8ff36/peerj-cs-10-2300-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/737af40ee68e/peerj-cs-10-2300-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/7fdd57543fae/peerj-cs-10-2300-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/b5de3eb2b096/peerj-cs-10-2300-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/c541d50f0f15/peerj-cs-10-2300-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/834ef0f8643d/peerj-cs-10-2300-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/10f3d1a8ff36/peerj-cs-10-2300-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/737af40ee68e/peerj-cs-10-2300-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/7fdd57543fae/peerj-cs-10-2300-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/b5de3eb2b096/peerj-cs-10-2300-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c8/11784532/c541d50f0f15/peerj-cs-10-2300-g006.jpg

相似文献

1
Word embedding empowered topic recognition in news articles.词嵌入助力新闻文章中的主题识别。
PeerJ Comput Sci. 2024 Dec 11;10:e2300. doi: 10.7717/peerj-cs.2300. eCollection 2024.
2
A Topic Recognition Method of News Text Based on Word Embedding Enhancement.基于词向量增强的新闻文本主题识别方法。
Comput Intell Neurosci. 2022 Feb 16;2022:4582480. doi: 10.1155/2022/4582480. eCollection 2022.
3
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.研究基于神经主题模型的词向量有效利用,以实现短文本的可解释主题。
Sensors (Basel). 2022 Jan 23;22(3):852. doi: 10.3390/s22030852.
4
Short text topic modelling using local and global word-context semantic correlation.使用局部和全局词上下文语义相关性的短文本主题建模
Multimed Tools Appl. 2023 Feb 2:1-23. doi: 10.1007/s11042-023-14352-x.
5
Impact of word embedding models on text analytics in deep learning environment: a review.词嵌入模型对深度学习环境下文本分析的影响:综述
Artif Intell Rev. 2023 Feb 22:1-81. doi: 10.1007/s10462-023-10419-1.
6
Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.用于对ICD-10-CM编码进行分类的混合采样训练投影词嵌入模型:纵向观察研究
JMIR Med Inform. 2019 Jul 23;7(3):e14499. doi: 10.2196/14499.
7
Explainable hybrid word representations for sentiment analysis of financial news.可解释的混合词表示在金融新闻情感分析中的应用。
Neural Netw. 2023 Jul;164:115-123. doi: 10.1016/j.neunet.2023.04.011. Epub 2023 Apr 21.
8
Accurate disaster entity recognition based on contextual embeddings in self-attentive BiLSTM-CRF.基于自注意力双向长短时记忆条件随机场中上下文嵌入的准确灾害实体识别。
PLoS One. 2025 Mar 26;20(3):e0318262. doi: 10.1371/journal.pone.0318262. eCollection 2025.
9
A Method of Short Text Representation Based on the Feature Probability Embedded Vector.一种基于特征概率嵌入向量的短文本表示方法。
Sensors (Basel). 2019 Aug 28;19(17):3728. doi: 10.3390/s19173728.
10
Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding.使用词嵌入通过词汇相似度计算和语义相关性测量来评估用于查找相似新闻文章的关键短语提取算法。
PeerJ Comput Sci. 2022 Jul 7;8:e1024. doi: 10.7717/peerj-cs.1024. eCollection 2022.

本文引用的文献

1
A Topic Recognition Method of News Text Based on Word Embedding Enhancement.基于词向量增强的新闻文本主题识别方法。
Comput Intell Neurosci. 2022 Feb 16;2022:4582480. doi: 10.1155/2022/4582480. eCollection 2022.