Suppr超能文献

具有树状结构数据的多层自组织映射用于高效文档检索和抄袭检测。

Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection.

作者信息

Chow Tommy W S, Rahman M K M

机构信息

Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

IEEE Trans Neural Netw. 2009 Sep;20(9):1385-402. doi: 10.1109/TNN.2009.2023394. Epub 2009 Jul 28.

Abstract

This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD.

摘要

本文提出了一种使用多层自组织映射(MLSOM)的新型文档检索(DR)和抄袭检测(PD)系统。文档通过丰富的树状结构表示进行建模,基于自组织映射的系统被用作一种计算高效的解决方案。所提出的方案不是依赖关键词/行,而是将完整文档作为查询来执行检索和抄袭检测。树状结构表示分层包含文档特征,如文档、页面和段落。因此,它可以反映从当前使用的词频信息中难以获取的潜在上下文。我们表明树状结构数据对于文档检索和抄袭检测是有效的。为了以高效的方式处理树状结构表示,我们使用了一种多层自组织映射算法,该算法是作者先前为图像检索应用而开发的。在本研究中,它用作一种有效的聚类算法。使用多层自组织映射,开发了用于比较文本文档的局部匹配技术。提出了两种基于多层自组织映射(MLSOM)的新颖抄袭检测方法。进行了详细的模拟,实验结果证实了所提出的方法对于文档检索和抄袭检测在计算上是高效且准确的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验