• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用聚类和一种新颖的隐马尔可夫模型来降低维度,以改进文本分类。

Improving the text classification using clustering and a novel HMM to reduce the dimensionality.

作者信息

Seara Vieira A, Borrajo L, Iglesias E L

机构信息

Department of Computer Science, Higher Technical School of Computer Engineering, University of Vigo, 32004 Ourense, Spain.

出版信息

Comput Methods Programs Biomed. 2016 Nov;136:119-30. doi: 10.1016/j.cmpb.2016.08.018. Epub 2016 Aug 26.

DOI:10.1016/j.cmpb.2016.08.018
PMID:27686709
Abstract

In text classification problems, the representation of a document has a strong impact on the performance of learning systems. The high dimensionality of the classical structured representations can lead to burdensome computations due to the great size of real-world data. Consequently, there is a need for reducing the quantity of handled information to improve the classification process. In this paper, we propose a method to reduce the dimensionality of a classical text representation based on a clustering technique to group documents, and a previously developed Hidden Markov Model to represent them. We have applied tests with the k-NN and SVM classifiers on the OHSUMED and TREC benchmark text corpora using the proposed dimensionality reduction technique. The experimental results obtained are very satisfactory compared to commonly used techniques like InfoGain and the statistical tests performed demonstrate the suitability of the proposed technique for the preprocessing step in a text classification task.

摘要

在文本分类问题中,文档的表示形式对学习系统的性能有很大影响。由于现实世界数据量巨大,经典结构化表示的高维性会导致计算负担繁重。因此,需要减少处理的信息量以改进分类过程。在本文中,我们提出了一种方法,该方法基于用于对文档进行分组的聚类技术以及先前开发的用于表示文档的隐马尔可夫模型来降低经典文本表示的维度。我们使用所提出的降维技术在OHSUMED和TREC基准文本语料库上对k-NN和SVM分类器进行了测试。与InfoGain等常用技术相比,所获得的实验结果非常令人满意,并且所进行的统计测试证明了所提出的技术适用于文本分类任务中的预处理步骤。

相似文献

1
Improving the text classification using clustering and a novel HMM to reduce the dimensionality.利用聚类和一种新颖的隐马尔可夫模型来降低维度,以改进文本分类。
Comput Methods Programs Biomed. 2016 Nov;136:119-30. doi: 10.1016/j.cmpb.2016.08.018. Epub 2016 Aug 26.
2
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.加速从轮廓隐马尔可夫模型数据库中检索信息
PLoS One. 2016 Nov 22;11(11):e0166358. doi: 10.1371/journal.pone.0166358. eCollection 2016.
3
Creating Discriminative Models for Time Series Classification and Clustering by HMM Ensembles.基于 HMM 集成的时间序列分类和聚类判别模型的构建。
IEEE Trans Cybern. 2016 Dec;46(12):2899-2910. doi: 10.1109/TCYB.2015.2492920. Epub 2015 Oct 30.
4
LDA filter: A Latent Dirichlet Allocation preprocess method for Weka.LDA 过滤器:一种用于 WEKA 的潜在狄利克雷分配预处理方法。
PLoS One. 2020 Nov 9;15(11):e0241701. doi: 10.1371/journal.pone.0241701. eCollection 2020.
5
Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。
Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.
6
[Application of support vector machines to classification of blood cells].[支持向量机在血细胞分类中的应用]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2003 Sep;20(3):484-7.
7
Self-Organizing Hidden Markov Model Map (SOHMMM).自组织隐马尔可夫模型图 (SOHMMM)。
Neural Netw. 2013 Dec;48:133-47. doi: 10.1016/j.neunet.2013.07.011. Epub 2013 Aug 13.
8
Bayesian supervised dimensionality reduction.贝叶斯监督降维。
IEEE Trans Cybern. 2013 Dec;43(6):2179-89. doi: 10.1109/TCYB.2013.2245321.
9
DCT-Based Preprocessing Approach for ICA in Hyperspectral Data Analysis.高光谱数据分析中基于离散余弦变换的独立成分分析预处理方法
Sensors (Basel). 2018 Apr 8;18(4):1138. doi: 10.3390/s18041138.
10
Vicinal support vector classifier using supervised kernel-based clustering.基于监督核聚类的邻接支持向量分类器。
Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7.

引用本文的文献

1
"I will never go to Hong Kong again!" How the secondary crisis communication of "Occupy Central" on Weibo shifted to a tourism boycott.“我再也不会去香港了!”“占中”在微博上的次生危机传播如何演变成抵制赴港旅游。
Tour Manag. 2017 Oct;62:159-172. doi: 10.1016/j.tourman.2017.04.007. Epub 2017 May 3.