Department of Computer Science & Engineering, Karpagam University, Coimbatore, India.
Department of Electronics and Communication Engineering, Karpagam University, Coimbatore, India.
J Med Syst. 2019 May 1;43(6):164. doi: 10.1007/s10916-019-1294-5.
The age of information has done it simple for storing huge amount of data. In actual fact, a considerable segment of existing information is accumulated in the text databases that have huge set of documents from different sources like research articles, news articles, books, e-mail messages, web pages and digital libraries. In many text databases, stored data are in the semi-structured format in that they are neither entirely structured nor entirely unstructured. IR (Information Retrieval) field has been growing in parallel using database systems for several years. Contrasting to the databases system fields that have concentrated mainly on transaction and query processing of the structured data, IR is concerned with firm and retrieval of data from a huge quantity of text-oriented documents. Thus, IR tackles with unstructured and/or semi-structured databases. Information security requirements within a firm have experience major variations in the past some decades. By the establishment of computer, the necessary for automated equipment for securing files as well as other information that stored on the computer turned into evident. This is particularly in case of shared information resources via public network. This is the origin for having a secure computer system / the need for computer security. Computer Security can be achieved by Intrusion Detection Systems. In this paper, we address these issues by applying Similarity Search in two diversified fields: Digital Libraries and Computer Security. The paper discusses a fast and efficient similarity search technique for approximate retrieval of books metadata in Digital Libraries. In DLI the books retrieval takes place just by using metadata such as title, year, edition, author, publishing of a book. Though, if metadata is missing, incorrect or unfinished, then it creates the library retrieval system inefficient, incorrect leads too much confusion to the user. In this context even if the query from the user matches partially or fully with a stored pattern, the information related to that be retrieved. The paper talks about a method that functions rapid and effective, language independent, and flexible library retrieval system signature based similarity search. This system is able to retrieve not only the metadata that exactly matches the query but also fairly accurate identical because of missing words, jumbled words and spell mistakes. Fundamentally, signature file approach is used here. A signature file approach looks like the most capable for huge database as it has superior text retrieval features and requires little storage overhead.
信息时代使得存储大量数据变得简单。事实上,大量现有的信息都积累在文本数据库中,这些数据库包含来自不同来源的大量文档,如研究文章、新闻文章、书籍、电子邮件、网页和数字图书馆。在许多文本数据库中,存储的数据采用半结构化格式,它们既不是完全结构化的,也不是完全非结构化的。信息检索 (IR) 领域多年来一直与数据库系统并行发展。与主要集中在结构化数据的事务和查询处理的数据库系统不同,IR 关注的是从大量面向文本的文档中提取和检索数据。因此,IR 处理非结构化和/或半结构化数据库。过去几十年,公司内部的信息安全要求经历了重大变化。随着计算机的建立,对计算机文件以及存储在计算机上的其他信息进行自动保护的必要性变得显而易见。这在通过公共网络共享信息资源的情况下尤其如此。这就是拥有安全计算机系统/计算机安全需求的起源。计算机安全可以通过入侵检测系统来实现。在本文中,我们通过在两个不同领域应用相似性搜索来解决这些问题:数字图书馆和计算机安全。本文讨论了一种快速有效的相似性搜索技术,用于在数字图书馆中近似检索书籍元数据。在 DLI 中,书籍检索仅通过使用元数据(如标题、年份、版本、作者、出版)来进行。然而,如果元数据缺失、不正确或不完整,那么它会使图书馆检索系统效率低下,错误会给用户带来太多困惑。在这种情况下,即使用户的查询与存储的模式部分或完全匹配,也要检索相关信息。本文介绍了一种快速、有效、独立于语言且灵活的基于库检索系统签名的相似性搜索方法。该系统不仅能够检索与查询完全匹配的元数据,还能够检索由于缺少单词、单词混乱和拼写错误而相当准确的相同信息。从根本上讲,这里使用的是签名文件方法。签名文件方法看起来最适合于大型数据库,因为它具有卓越的文本检索功能,并且需要的存储开销很小。