Yan Yan, Yin Xu-Cheng, Li Sujian, Yang Mingyuan, Hao Hong-Wei
Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
Key Laboratory of Computational Linguistics, Peking University, Ministry of Education, Beijing 100871, China.
Comput Intell Neurosci. 2015;2015:650527. doi: 10.1155/2015/650527. Epub 2015 Mar 23.
High-level abstraction, for example, semantic representation, is vital for document classification and retrieval. However, how to learn document semantic representation is still a topic open for discussion in information retrieval and natural language processing. In this paper, we propose a new Hybrid Deep Belief Network (HDBN) which uses Deep Boltzmann Machine (DBM) on the lower layers together with Deep Belief Network (DBN) on the upper layers. The advantage of DBM is that it employs undirected connection when training weight parameters which can be used to sample the states of nodes on each layer more successfully and it is also an effective way to remove noise from the different document representation type; the DBN can enhance extract abstract of the document in depth, making the model learn sufficient semantic representation. At the same time, we explore different input strategies for semantic distributed representation. Experimental results show that our model using the word embedding instead of single word has better performance.
例如,高级抽象(即语义表示)对于文档分类和检索至关重要。然而,如何学习文档语义表示仍是信息检索和自然语言处理中一个有待探讨的话题。在本文中,我们提出了一种新的混合深度信念网络(HDBN),它在较低层使用深度玻尔兹曼机(DBM),在上层使用深度信念网络(DBN)。DBM的优势在于,在训练权重参数时采用无向连接,这能够更成功地用于对每层节点的状态进行采样,并且这也是从不同文档表示类型中去除噪声的有效方法;DBN可以深入增强对文档摘要的提取,使模型学习到足够的语义表示。同时,我们探索了语义分布式表示的不同输入策略。实验结果表明,我们使用词嵌入而非单个单词的模型具有更好的性能。