Suppr超能文献

基于深度学习的信息检索,采用归一化主导特征子集和加权向量模型。

Deep learning-based information retrieval with normalized dominant feature subset and weighted vector model.

作者信息

Eswaraiah Poluru, Syed Hussain

机构信息

School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.

出版信息

PeerJ Comput Sci. 2024 Jan 22;10:e1805. doi: 10.7717/peerj-cs.1805. eCollection 2024.

Abstract

Multimedia data, which includes textual information, is employed in a variety of practical computer vision applications. More than a million new records are added to social media and news sites every day, and the text content they contain has gotten increasingly complex. Finding a meaningful text record in an archive might be challenging for computer vision researchers. Most image searches still employ the tried and true language-based techniques of query text and metadata. Substantial work has been done in the past two decades on content-based text retrieval and analysis that still has limitations. The importance of feature extraction in search engines is often overlooked. Web and product search engines, recommendation systems, and question-answering activities frequently leverage these features. Extracting high-quality machine learning features from large text volumes is a challenge for many open-source software packages. Creating an effective feature set manually is a time-consuming process, but with deep learning, new actual feature demos from training data are analyzed. As a novel feature extraction method, deep learning has made great strides in text mining. Automatically training a deep learning model with the most pertinent text attributes requires massive datasets with millions of variables. In this research, a Normalized Dominant Feature Subset with Weighted Vector Model (NDFS-WVM) is proposed that is used for feature extraction and selection for information retrieval from big data using natural language processing models. The suggested model outperforms the conventional models in terms of text retrieval. The proposed model achieves 98.6% accuracy in information retrieval.

摘要

包括文本信息在内的多媒体数据被应用于各种实际的计算机视觉应用中。每天有超过100万条新记录添加到社交媒体和新闻网站,其包含的文本内容日益复杂。对于计算机视觉研究人员来说,在存档中找到有意义的文本记录可能具有挑战性。大多数图像搜索仍采用基于语言的查询文本和元数据等经过验证的技术。在过去二十年中,基于内容的文本检索和分析已经取得了大量成果,但仍存在局限性。搜索引擎中特征提取的重要性常常被忽视。网络和产品搜索引擎、推荐系统以及问答活动经常利用这些特征。从大量文本中提取高质量的机器学习特征对许多开源软件包来说是一项挑战。手动创建有效的特征集是一个耗时的过程,但通过深度学习,可以分析来自训练数据的新的实际特征演示。作为一种新颖的特征提取方法,深度学习在文本挖掘方面取得了长足进展。使用最相关的文本属性自动训练深度学习模型需要包含数百万变量的海量数据集。在本研究中,提出了一种带加权向量模型的归一化主导特征子集(NDFS-WVM),用于使用自然语言处理模型从大数据中进行信息检索的特征提取和选择。所提出的模型在文本检索方面优于传统模型。该模型在信息检索中达到了98.6%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f3c/11636692/9e35d19d2ac4/peerj-cs-10-1805-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验