• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于索引的用于潜在语义分析快速在线查询处理的算法。

An index-based algorithm for fast on-line query processing of latent semantic analysis.

作者信息

Zhang Mingxi, Li Pohan, Wang Wei

机构信息

College of Communication and Art Design, University of Shanghai for Science and Technology, Shanghai, China.

School of Computer Science, Fudan University, Shanghai, China.

出版信息

PLoS One. 2017 May 16;12(5):e0177523. doi: 10.1371/journal.pone.0177523. eCollection 2017.

DOI:10.1371/journal.pone.0177523
PMID:28520747
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5433746/
Abstract

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

摘要

潜在语义分析(LSA)被广泛用于查找语义与关键词查询相似的文档。尽管LSA能产生很有前景的相似结果,但现有的LSA算法在在线查询处理的相似度计算和候选检查中涉及大量不必要的操作,这在时间成本方面很高,并且尤其当数据集变大时不能有效地响应查询请求。在本文中,我们研究LSA在线查询处理的效率问题,以便高效地搜索与给定查询相似的文档。我们结合一个称为部分相似度的中间值重写LSA的相似度方程,该中间值存储在一个称为部分索引的设计索引中。为了减少搜索空间,我们给出相似度方程的一种近似形式,然后开发一种用于构建部分索引的高效算法,该算法跳过低于给定阈值θ的部分相似度。基于部分索引,我们开发一种称为ILSA的高效算法来支持快速在线查询处理。将给定查询转换为伪文档向量,并通过累积从索引节点获得的部分相似度来计算查询与候选文档之间的相似度,这些索引节点对应于伪文档向量中的非零项。与LSA算法相比,ILSA通过修剪没有前景的候选文档并跳过对相似度得分贡献不大的操作,降低了在线查询处理的时间成本。通过与LSA的比较进行了大量实验,这些实验证明了我们提出的算法的效率和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/f6c7e7a22969/pone.0177523.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/38721f320b52/pone.0177523.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/967850c05054/pone.0177523.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/ff2f434cae2d/pone.0177523.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/2e6882aab754/pone.0177523.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/b321659721a7/pone.0177523.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/1fab88c7769b/pone.0177523.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/1a64244d66d2/pone.0177523.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/ad1c296d0220/pone.0177523.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/0c8930346a40/pone.0177523.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/74889abeca16/pone.0177523.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/e582c712d0e6/pone.0177523.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/bb7c148a56a7/pone.0177523.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/f6c7e7a22969/pone.0177523.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/38721f320b52/pone.0177523.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/967850c05054/pone.0177523.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/ff2f434cae2d/pone.0177523.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/2e6882aab754/pone.0177523.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/b321659721a7/pone.0177523.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/1fab88c7769b/pone.0177523.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/1a64244d66d2/pone.0177523.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/ad1c296d0220/pone.0177523.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/0c8930346a40/pone.0177523.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/74889abeca16/pone.0177523.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/e582c712d0e6/pone.0177523.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/bb7c148a56a7/pone.0177523.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3083/5433746/f6c7e7a22969/pone.0177523.g013.jpg

相似文献

1
An index-based algorithm for fast on-line query processing of latent semantic analysis.一种基于索引的用于潜在语义分析快速在线查询处理的算法。
PLoS One. 2017 May 16;12(5):e0177523. doi: 10.1371/journal.pone.0177523. eCollection 2017.
2
Latent semantic analysis cosines as a cognitive similarity measure: Evidence from priming studies.潜在语义分析余弦作为一种认知相似性度量:来自启动研究的证据。
Q J Exp Psychol (Hove). 2016;69(4):626-53. doi: 10.1080/17470218.2015.1038280. Epub 2015 May 8.
3
Bridging the theoretical gap between semantic representation models without the pressure of a ranking: some lessons learnt from LSA.在没有排名压力的情况下弥合语义表示模型之间的理论差距:从潜在语义分析中学到的一些经验教训。
Cogn Process. 2020 Feb;21(1):1-21. doi: 10.1007/s10339-019-00934-x. Epub 2019 Sep 25.
4
Large-scale latent semantic analysis.大规模潜在语义分析。
Behav Res Methods. 2011 Jun;43(2):414-23. doi: 10.3758/s13428-010-0050-z.
5
New algorithms assessing short summaries in expository texts using latent semantic analysis.使用潜在语义分析评估说明性文本简短摘要的新算法。
Behav Res Methods. 2009 Aug;41(3):944-50. doi: 10.3758/BRM.41.3.944.
6
Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.弥合差距:纳入语义相似性度量以有效将 PubMed 查询映射到文档。
J Biomed Inform. 2017 Nov;75:122-127. doi: 10.1016/j.jbi.2017.09.014. Epub 2017 Oct 3.
7
A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval.一种保持视觉保真度的距离度量学习的提升框架及其在医学图像检索中的应用。
IEEE Trans Pattern Anal Mach Intell. 2010 Jan;32(1):30-44. doi: 10.1109/TPAMI.2008.273.
8
Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach.基于Borda计数和语义相似性方法的相关反馈查询扩展模型
Comput Intell Neurosci. 2015;2015:568197. doi: 10.1155/2015/568197. Epub 2015 Dec 7.
9
An ontology-based similarity measure for biomedical data-application to radiology reports.基于本体的生物医学数据相似度测量-在放射学报告中的应用。
J Biomed Inform. 2013 Oct;46(5):857-68. doi: 10.1016/j.jbi.2013.06.013. Epub 2013 Jul 11.
10
Multidimensional latent semantic analysis using term spatial information.多维潜在语义分析利用术语空间信息。
IEEE Trans Cybern. 2013 Dec;43(6):1625-40. doi: 10.1109/TSMCC.2012.2227112.

本文引用的文献

1
Recommendations for antiarrhythmic drugs based on latent semantic analysis with fc-means clustering.基于模糊C均值聚类的潜在语义分析的抗心律失常药物推荐
Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:4423-4426. doi: 10.1109/EMBC.2016.7591708.
2
Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.使用医学在线(MEDLINE)摘要的潜在语义索引对微小RNA进行优先级排序、聚类和功能注释。
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):350. doi: 10.1186/s12859-016-1223-2.
3
Diffusion-Weighted Images Superresolution Using High-Order SVD.
使用高阶奇异值分解的扩散加权图像超分辨率
Comput Math Methods Med. 2016;2016:3647202. doi: 10.1155/2016/3647202. Epub 2016 Aug 18.
4
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.在聚类上使用奇异值分解以提高文档间相似性度量的精度。
Comput Intell Neurosci. 2016;2016:1096271. doi: 10.1155/2016/1096271. Epub 2016 Aug 7.
5
Asymmetric latent semantic indexing for gene expression experiments visualization.用于基因表达实验可视化的非对称潜在语义索引
J Bioinform Comput Biol. 2016 Aug;14(4):1650023. doi: 10.1142/S0219720016500232. Epub 2016 Jun 9.
6
A Natural Language Processing Tool for Large-Scale Data Extraction from Echocardiography Reports.一种用于从超声心动图报告中大规模提取数据的自然语言处理工具。
PLoS One. 2016 Apr 28;11(4):e0153749. doi: 10.1371/journal.pone.0153749. eCollection 2016.
7
SVD-phy: improved prediction of protein functional associations through singular value decomposition of phylogenetic profiles.SVD-phy:通过系统发育谱的奇异值分解改进蛋白质功能关联预测
Bioinformatics. 2016 Apr 1;32(7):1085-7. doi: 10.1093/bioinformatics/btv696. Epub 2015 Nov 26.
8
Similarity computation strategies in the microRNA-disease network: a survey.微小RNA-疾病网络中的相似性计算策略:一项综述。
Brief Funct Genomics. 2016 Jan;15(1):55-64. doi: 10.1093/bfgp/elv024. Epub 2015 Jul 1.
9
Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks.基于生物互作网络的 miRNA 功能预测和疾病相关 miRNA 优先级排序的综合方法。
Brief Bioinform. 2016 Mar;17(2):193-203. doi: 10.1093/bib/bbv033. Epub 2015 Jun 9.
10
Generating Highly Accurate Predictions for Missing QoS Data via Aggregating Nonnegative Latent Factor Models.通过聚合非负潜在因子模型生成高度准确的缺失 QoS 数据预测。
IEEE Trans Neural Netw Learn Syst. 2016 Mar;27(3):524-37. doi: 10.1109/TNNLS.2015.2412037. Epub 2015 Apr 22.