• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

核酸和蛋白质数据库的快速相似性搜索。

Rapid similarity searches of nucleic acid and protein data banks.

作者信息

Wilbur W J, Lipman D J

出版信息

Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.

DOI:10.1073/pnas.80.3.726
PMID:6572363
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC393452/
Abstract

With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.

摘要

随着蛋白质和核酸序列大型数据库的发展,对于有效搜索此类数据库以寻找与给定序列相似的序列的方法的需求变得明显。我们提出了一种基于固定k的序列元素k元组匹配的序列全局比较算法。与先前的相似性分析技术相比,该方法显著减少了搜索数据库所需的时间,同时灵敏度损失最小。该算法在另一个实现中也经过了调整,以生成严格的序列比对。目前,使用DEC KL - 10系统,我们可以在不到3分钟的时间内将国家生物医学研究基金会整个蛋白质数据库中的所有序列与一个350个残基的查询序列进行比较,并在不到2分钟的时间内将一个500个碱基的查询序列与洛斯阿拉莫斯核酸数据库中的所有真核序列进行类似分析。

相似文献

1
Rapid similarity searches of nucleic acid and protein data banks.核酸和蛋白质数据库的快速相似性搜索。
Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.
2
A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP.一种基于重叠翻译核酸的新型序列相似性搜索与可视化方法:blastNP。
Med Hypotheses. 2004;62(4):568-74. doi: 10.1016/j.mehy.2003.11.020.
3
A rapid access motif database (RAMdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks.一个带有搜索算法的快速访问基序数据库(RAMdb),用于检索核酸或蛋白质数据库中的模式。
Comput Appl Biosci. 1995 Jun;11(3):273-9. doi: 10.1093/bioinformatics/11.3.273.
4
Los Alamos sequence analysis package for nucleic acids and proteins.洛斯阿拉莫斯核酸与蛋白质序列分析软件包。
Nucleic Acids Res. 1982 Jan 11;10(1):183-96. doi: 10.1093/nar/10.1.183.
5
Principle of codification for quick comparisons with the entire biomolecule databanks and associated programs in FORTRAN 77.用于与整个生物分子数据库以及 FORTRAN 77 相关程序进行快速比较的编码原则。
Nucleic Acids Res. 1986 Jan 10;14(1):197-204. doi: 10.1093/nar/14.1.197.
6
Rapid and sensitive sequence comparison with FASTP and FASTA.使用FASTP和FASTA进行快速灵敏的序列比对。
Methods Enzymol. 1990;183:63-98. doi: 10.1016/0076-6879(90)83007-v.
7
Improved sensitivity of biological sequence database searches.生物序列数据库搜索灵敏度的提高。
Comput Appl Biosci. 1990 Jul;6(3):237-45. doi: 10.1093/bioinformatics/6.3.237.
8
Sequence search on a supercomputer.在超级计算机上进行序列搜索。
Nucleic Acids Res. 1986 Jan 10;14(1):57-64. doi: 10.1093/nar/14.1.57.
9
Database similarity searches.数据库相似性搜索。
Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24.
10
Improving the efficiency of dot-matrix similarity searches through use of an oligomer table.通过使用寡聚体表提高点阵相似性搜索的效率。
Nucleic Acids Res. 1986 Jan 10;14(1):597-610. doi: 10.1093/nar/14.1.597.

引用本文的文献

1
The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction.多重序列比对在分子结构与功能预测中的历史演变及意义
Biomolecules. 2024 Nov 29;14(12):1531. doi: 10.3390/biom14121531.
2
SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.SpanSeq:基于相似度的序列数据分割方法,用于改进深度学习项目的开发与评估。
NAR Genom Bioinform. 2024 Aug 16;6(3):lqae106. doi: 10.1093/nargab/lqae106. eCollection 2024 Sep.
3
Characterization of a MHYT domain-coupled transcriptional regulator that responds to carbon monoxide.一氧化碳应答的 MHYT 结构域偶联转录调节因子的特性研究。
Nucleic Acids Res. 2024 Aug 27;52(15):8849-8860. doi: 10.1093/nar/gkae575.
4
Phage display sequencing reveals that genetic, environmental, and intrinsic factors influence variation of human antibody epitope repertoire.噬菌体展示测序揭示遗传、环境和内在因素影响人类抗体表位库的变异。
Immunity. 2023 Jun 13;56(6):1376-1392.e8. doi: 10.1016/j.immuni.2023.04.003. Epub 2023 May 9.
5
PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.PASS:使用同源簇、自然语言处理和序列相似性网络进行蛋白质注释的蛋白质注释监测站点。
Front Bioinform. 2021 Sep 29;1:749008. doi: 10.3389/fbinf.2021.749008. eCollection 2021.
6
Global, highly specific and fast filtering of alignment seeds.全局、高度特异且快速的比对种子过滤。
BMC Bioinformatics. 2022 Jun 10;23(1):225. doi: 10.1186/s12859-022-04745-4.
7
Don Lindberg and the creation of the National Center for Biotechnology Information.唐·林德伯格与国家生物技术信息中心的创建
Inf Serv Use. 2022 May 10;42(1):107-115. doi: 10.3233/ISU-210139. eCollection 2022.
8
Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.基于方法的 SARS-CoV-2 分子建模、模拟和预测综述。
Chem Rev. 2022 Jul 13;122(13):11287-11368. doi: 10.1021/acs.chemrev.1c00965. Epub 2022 May 20.
9
Conserved Motifs and Domains in Members of .. 成员中的保守基序和结构域
Cells. 2022 Jan 11;11(2):230. doi: 10.3390/cells11020230.
10
Electrochemical DNA synthesis and sequencing on a single electrode with scalability for integrated data storage.用于集成数据存储的单电极上的电化学DNA合成与测序及其可扩展性
Sci Adv. 2021 Nov 12;7(46):eabk0100. doi: 10.1126/sciadv.abk0100.

本文引用的文献

1
Pattern recognition in genetic sequences.基因序列中的模式识别。
Proc Natl Acad Sci U S A. 1979 Jul;76(7):3041. doi: 10.1073/pnas.76.7.3041.
2
Comparative biosequence metrics.比较生物序列度量
J Mol Evol. 1981;18(1):38-46. doi: 10.1007/BF01733210.
3
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
4
Enhanced graphic matrix analysis of nucleic acid and protein sequences.核酸和蛋白质序列的增强图形矩阵分析
Proc Natl Acad Sci U S A. 1981 Dec;78(12):7665-9. doi: 10.1073/pnas.78.12.7665.
5
Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries.核酸序列中的模式识别。I. 寻找局部同源性和对称性的通用方法。
Nucleic Acids Res. 1982 Jan 11;10(1):247-63. doi: 10.1093/nar/10.1.247.
6
Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase.病毒src基因产物与哺乳动物环磷酸腺苷(cAMP)依赖性蛋白激酶的催化链相关。
Proc Natl Acad Sci U S A. 1982 May;79(9):2836-9. doi: 10.1073/pnas.79.9.2836.
7
Efficient algorithms for folding and comparing nucleic acid sequences.用于折叠和比较核酸序列的高效算法。
Nucleic Acids Res. 1982 Jan 11;10(1):197-206. doi: 10.1093/nar/10.1.197.
8
An improved method of testing for evolutionary homology.一种改进的进化同源性测试方法。
J Mol Biol. 1966 Mar;16(1):9-16. doi: 10.1016/s0022-2836(66)80258-9.
9
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.
10
Matching sequences under deletion-insertion constraints.在缺失-插入约束下匹配序列。
Proc Natl Acad Sci U S A. 1972 Jan;69(1):4-6. doi: 10.1073/pnas.69.1.4.