Suppr超能文献

具有树状结构数据的多层自组织映射用于高效文档检索和抄袭检测。

Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection.

作者信息

Chow Tommy W S, Rahman M K M

机构信息

Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

IEEE Trans Neural Netw. 2009 Sep;20(9):1385-402. doi: 10.1109/TNN.2009.2023394. Epub 2009 Jul 28.

Abstract

This paper proposes a new document retrieval (DR) and plagiarism detection (PD) system using multilayer self-organizing map (MLSOM). A document is modeled by a rich tree-structured representation, and a SOM-based system is used as a computationally effective solution. Instead of relying on keywords/lines, the proposed scheme compares a full document as a query for performing retrieval and PD. The tree-structured representation hierarchically includes document features as document, pages, and paragraphs. Thus, it can reflect underlying context that is difficult to acquire from the currently used word-frequency information. We show that the tree-structured data is effective for DR and PD. To handle tree-structured representation in an efficient way, we use an MLSOM algorithm, which was previously developed by the authors for the application of image retrieval. In this study, it serves as an effective clustering algorithm. Using the MLSOM, local matching techniques are developed for comparing text documents. Two novel MLSOM-based PD methods are proposed. Detailed simulations are conducted and the experimental results corroborate that the proposed approach is computationally efficient and accurate for DR and PD.

摘要

本文提出了一种使用多层自组织映射(MLSOM)的新型文档检索(DR)和抄袭检测(PD)系统。文档通过丰富的树状结构表示进行建模,基于自组织映射的系统被用作一种计算高效的解决方案。所提出的方案不是依赖关键词/行,而是将完整文档作为查询来执行检索和抄袭检测。树状结构表示分层包含文档特征,如文档、页面和段落。因此,它可以反映从当前使用的词频信息中难以获取的潜在上下文。我们表明树状结构数据对于文档检索和抄袭检测是有效的。为了以高效的方式处理树状结构表示,我们使用了一种多层自组织映射算法,该算法是作者先前为图像检索应用而开发的。在本研究中,它用作一种有效的聚类算法。使用多层自组织映射,开发了用于比较文本文档的局部匹配技术。提出了两种基于多层自组织映射(MLSOM)的新颖抄袭检测方法。进行了详细的模拟,实验结果证实了所提出的方法对于文档检索和抄袭检测在计算上是高效且准确的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验