局部嵌入自动编码器：一种文档表示的半监督流形学习方法。

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

作者信息

Wei Chao, Luo Senlin, Ma Xincheng, Ren Hao, Zhang Ji, Pan Limin

机构信息

Beijing Institute of Technology, Beijing, 10081, China.

出版信息

PLoS One. 2016 Jan 19;11(1):e0146672. doi: 10.1371/journal.pone.0146672. eCollection 2016.

DOI:10.1371/journal.pone.0146672

PMID:26784692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4718658/

Abstract

Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

摘要

主题模型和神经网络能够发现文本语料库中有意义的低维潜在表示；因此，它们已成为文档表示的关键技术。然而，此类模型假定所有文档都是无差别的，这导致潜在表示依赖于所有其他文档，并且无法提供有区分性的文档表示。为了解决这个问题，我们提出了一种受流形启发的半监督自动编码器，以提取文档中有意义的潜在表示，从局部角度来看，附近文档的潜在表示应该是相关的。我们首先在观测空间中用欧几里得距离确定有区分性的邻居集。然后，通过联合最小化输入与输出之间的伯努利交叉熵误差以及输入与输出的邻居之间的平方误差之和来训练自动编码器。两个广泛使用的语料库的结果表明，与比较方法相比，我们的方法在文档聚类方面至少提高了15%，在分类任务方面提高了近7%。证据表明，我们的方法能够轻松捕获新文档中更具区分性的潜在表示。此外，通过激活促进潜在表示可理解性的特征，可以有效地发现一些有意义的词组合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9b1/4718658/1aeb9ba6e8c1/pone.0146672.g001.jpg

相似文献

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.局部嵌入自动编码器：一种文档表示的半监督流形学习方法。

PLoS One. 2016 Jan 19;11(1):e0146672. doi: 10.1371/journal.pone.0146672. eCollection 2016.

Semi-supervised distributed representations of documents for sentiment analysis.用于情感分析的文档的半监督分布式表示。

Neural Netw. 2019 Nov;119:139-150. doi: 10.1016/j.neunet.2019.08.001. Epub 2019 Aug 6.

Improving the utility of MeSH® terms using the TopicalMeSH representation.使用主题词表（TopicalMeSH）表示法提高医学主题词表（MeSH®）术语的实用性。

J Biomed Inform. 2016 Jun;61:77-86. doi: 10.1016/j.jbi.2016.03.013. Epub 2016 Mar 19.

A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder.表征学习的流形学习视角：无编码器学习解码器和表征

Entropy (Basel). 2021 Oct 25;23(11):1403. doi: 10.3390/e23111403.

Discriminative clustering on manifold for adaptive transductive classification.流形上的判别聚类用于自适应转导分类。

Neural Netw. 2017 Oct;94:260-273. doi: 10.1016/j.neunet.2017.07.013. Epub 2017 Aug 1.

Manifold adversarial training for supervised and semi-supervised learning.多流对抗训练用于监督学习和半监督学习。

Neural Netw. 2021 Aug;140:282-293. doi: 10.1016/j.neunet.2021.03.031. Epub 2021 Mar 26.

Vector representation based on a supervised codebook for Nepali documents classification.基于监督码本的尼泊尔语文档分类向量表示

PeerJ Comput Sci. 2021 Mar 3;7:e412. doi: 10.7717/peerj-cs.412. eCollection 2021.

Geometry Regularized Autoencoders.几何正则化自动编码器。

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7381-7394. doi: 10.1109/TPAMI.2022.3222104. Epub 2023 May 5.

Cardiology record multi-label classification using latent Dirichlet allocation.使用潜在狄利克雷分配进行心脏病学记录的多标签分类。

Comput Methods Programs Biomed. 2018 Oct;164:111-119. doi: 10.1016/j.cmpb.2018.07.002. Epub 2018 Jul 17.

Enhanced manifold regularization for semi-supervised classification.用于半监督分类的增强流形正则化

J Opt Soc Am A Opt Image Sci Vis. 2016 Jun 1;33(6):1207-13. doi: 10.1364/JOSAA.33.001207.

引用本文的文献

Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach.基于多视图多表示共识聚类集成（mmcc）方法的增强型网络搜索结果聚类模型。

PLoS One. 2021 Jan 15;16(1):e0245264. doi: 10.1371/journal.pone.0245264. eCollection 2021.

A shared synapse architecture for efficient FPGA implementation of autoencoders.用于自动编码器的高效 FPGA 实现的共享突触结构。

PLoS One. 2018 Mar 15;13(3):e0194049. doi: 10.1371/journal.pone.0194049. eCollection 2018.

本文引用的文献

A regularized approach for geodesic-based semisupervised multimanifold learning.基于测地线正则化的半监督多流形学习方法。

IEEE Trans Image Process. 2014 May;23(5):2133-47. doi: 10.1109/TIP.2014.2312643.

A topic clustering approach to finding similar questions from large question and answer archives.一种从大型问答存档中查找相似问题的主题聚类方法。

PLoS One. 2014 Mar 4;9(3):e71511. doi: 10.1371/journal.pone.0071511. eCollection 2014.

Defining and evaluating classification algorithm for high-dimensional data based on latent topics.基于潜在主题定义和评估高维数据的分类算法

PLoS One. 2014 Jan 9;9(1):e82119. doi: 10.1371/journal.pone.0082119. eCollection 2014.

Representation learning: a review and new perspectives.表示学习：综述与新视角。

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

Learning topic models by belief propagation.通过信念传播学习主题模型。

IEEE Trans Pattern Anal Mach Intell. 2013 May;35(5):1121-34. doi: 10.1109/TPAMI.2012.185.

Nonlinear dimensionality reduction by locally linear embedding.通过局部线性嵌入进行非线性降维

Science. 2000 Dec 22;290(5500):2323-6. doi: 10.1126/science.290.5500.2323.

A global geometric framework for nonlinear dimensionality reduction.一种用于非线性降维的全局几何框架。

Science. 2000 Dec 22;290(5500):2319-23. doi: 10.1126/science.290.5500.2319.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

局部嵌入自动编码器：一种文档表示的半监督流形学习方法。

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献