从具有主题漂移的短文本流中学习。

Learning From Short Text Streams With Topic Drifts.

出版信息

IEEE Trans Cybern. 2018 Sep;48(9):2697-2711. doi: 10.1109/TCYB.2017.2748598. Epub 2017 Sep 18.

DOI:10.1109/TCYB.2017.2748598

Abstract

Short text streams such as search snippets and micro blogs have been popular on the Web with the emergence of social media. Unlike traditional normal text streams, these data present the characteristics of short length, weak signal, high volume, high velocity, topic drift, etc. Short text stream classification is hence a very challenging and significant task. However, this challenge has received little attention from the research community. Therefore, a new feature extension approach is proposed for short text stream classification with the help of a large-scale semantic network obtained from a Web corpus. It is built on an incremental ensemble classification model for efficiency. First, more semantic contexts based on the senses of terms in short texts are introduced to make up of the data sparsity using the open semantic network, in which all terms are disambiguated by their semantics to reduce the noise impact. Second, a concept cluster-based topic drifting detection method is proposed to effectively track hidden topic drifts. Finally, extensive studies demonstrate that as compared to several well-known concept drifting detection methods in data stream, our approach can detect topic drifts effectively, and it enables handling short text streams effectively while maintaining the efficiency as compared to several state-of-the-art short text classification approaches.

摘要

随着社交媒体的出现，短文本流（如搜索片段和微博）在网络上变得非常流行。与传统的正常文本流不同，这些数据具有短长度、弱信号、高数量、高速度、主题漂移等特点。因此，短文本流分类是一项非常具有挑战性和重要的任务。然而，这个挑战并没有引起研究界的太多关注。因此，提出了一种新的特征扩展方法，用于使用从 Web 语料库获得的大规模语义网络进行短文本流分类。它建立在一个增量集成分类模型之上，以提高效率。首先，使用开放语义网络引入更多基于短文本中术语含义的语义上下文，以填补数据稀疏性，其中所有术语都通过语义进行消歧，以减少噪声影响。其次，提出了一种基于概念聚类的主题漂移检测方法，以有效地跟踪隐藏的主题漂移。最后，广泛的研究表明，与数据流中的几种著名的概念漂移检测方法相比，我们的方法可以有效地检测主题漂移，并且与几种最新的短文本分类方法相比，它可以有效地处理短文本流并保持效率。

相似文献

Learning From Short Text Streams With Topic Drifts.

IEEE Trans Cybern. 2018 Sep;48(9):2697-2711. doi: 10.1109/TCYB.2017.2748598. Epub 2017 Sep 18.

Dynamic clustering for short text stream based on Dirichlet process.

Appl Intell (Dordr). 2022;52(4):4651-4662. doi: 10.1007/s10489-021-02263-z. Epub 2021 Jul 26.

An Online Semantic-Enhanced Graphical Model for Evolving Short Text Stream Clustering.

IEEE Trans Cybern. 2022 Dec;52(12):13809-13820. doi: 10.1109/TCYB.2021.3108897. Epub 2022 Nov 18.

An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.

J Biomed Inform. 2014 Jun;49:255-68. doi: 10.1016/j.jbi.2014.03.005. Epub 2014 Mar 16.

Hashing for Adaptive Real-Time Graph Stream Classification With Concept Drifts.

IEEE Trans Cybern. 2018 May;48(5):1591-1604. doi: 10.1109/TCYB.2017.2708979. Epub 2017 Aug 25.

Concept Drift Adaptation by Exploiting Historical Knowledge.

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4822-4832. doi: 10.1109/TNNLS.2017.2775225. Epub 2018 Jan 4.

Large scale biomedical texts classification: a kNN and an ESA-based approaches.

J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.

Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.

TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information.

Front Genet. 2023 Oct 5;14:1243874. doi: 10.3389/fgene.2023.1243874. eCollection 2023.

Adaptive Chunk-Based Dynamic Weighted Majority for Imbalanced Data Streams With Concept Drift.

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2764-2778. doi: 10.1109/TNNLS.2019.2951814. Epub 2019 Dec 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从具有主题漂移的短文本流中学习。

Learning From Short Text Streams With Topic Drifts.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献