Suppr超能文献

核酸和蛋白质数据库的快速相似性搜索。

Rapid similarity searches of nucleic acid and protein data banks.

作者信息

Wilbur W J, Lipman D J

出版信息

Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.

Abstract

With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.

摘要

随着蛋白质和核酸序列大型数据库的发展,对于有效搜索此类数据库以寻找与给定序列相似的序列的方法的需求变得明显。我们提出了一种基于固定k的序列元素k元组匹配的序列全局比较算法。与先前的相似性分析技术相比,该方法显著减少了搜索数据库所需的时间,同时灵敏度损失最小。该算法在另一个实现中也经过了调整,以生成严格的序列比对。目前,使用DEC KL - 10系统,我们可以在不到3分钟的时间内将国家生物医学研究基金会整个蛋白质数据库中的所有序列与一个350个残基的查询序列进行比较,并在不到2分钟的时间内将一个500个碱基的查询序列与洛斯阿拉莫斯核酸数据库中的所有真核序列进行类似分析。

相似文献

1
Rapid similarity searches of nucleic acid and protein data banks.
Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.
4
Los Alamos sequence analysis package for nucleic acids and proteins.
Nucleic Acids Res. 1982 Jan 11;10(1):183-96. doi: 10.1093/nar/10.1.183.
6
Rapid and sensitive sequence comparison with FASTP and FASTA.
Methods Enzymol. 1990;183:63-98. doi: 10.1016/0076-6879(90)83007-v.
7
Improved sensitivity of biological sequence database searches.
Comput Appl Biosci. 1990 Jul;6(3):237-45. doi: 10.1093/bioinformatics/6.3.237.
8
Sequence search on a supercomputer.
Nucleic Acids Res. 1986 Jan 10;14(1):57-64. doi: 10.1093/nar/14.1.57.
9
Database similarity searches.
Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24.
10
Improving the efficiency of dot-matrix similarity searches through use of an oligomer table.
Nucleic Acids Res. 1986 Jan 10;14(1):597-610. doi: 10.1093/nar/14.1.597.

引用本文的文献

2
SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.
NAR Genom Bioinform. 2024 Aug 16;6(3):lqae106. doi: 10.1093/nargab/lqae106. eCollection 2024 Sep.
3
Characterization of a MHYT domain-coupled transcriptional regulator that responds to carbon monoxide.
Nucleic Acids Res. 2024 Aug 27;52(15):8849-8860. doi: 10.1093/nar/gkae575.
4
5
PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.
Front Bioinform. 2021 Sep 29;1:749008. doi: 10.3389/fbinf.2021.749008. eCollection 2021.
6
Global, highly specific and fast filtering of alignment seeds.
BMC Bioinformatics. 2022 Jun 10;23(1):225. doi: 10.1186/s12859-022-04745-4.
7
Don Lindberg and the creation of the National Center for Biotechnology Information.
Inf Serv Use. 2022 May 10;42(1):107-115. doi: 10.3233/ISU-210139. eCollection 2022.
8
Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.
Chem Rev. 2022 Jul 13;122(13):11287-11368. doi: 10.1021/acs.chemrev.1c00965. Epub 2022 May 20.
9
Conserved Motifs and Domains in Members of .
Cells. 2022 Jan 11;11(2):230. doi: 10.3390/cells11020230.

本文引用的文献

1
Pattern recognition in genetic sequences.
Proc Natl Acad Sci U S A. 1979 Jul;76(7):3041. doi: 10.1073/pnas.76.7.3041.
2
Comparative biosequence metrics.
J Mol Evol. 1981;18(1):38-46. doi: 10.1007/BF01733210.
3
Identification of common molecular subsequences.
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
4
Enhanced graphic matrix analysis of nucleic acid and protein sequences.
Proc Natl Acad Sci U S A. 1981 Dec;78(12):7665-9. doi: 10.1073/pnas.78.12.7665.
6
Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase.
Proc Natl Acad Sci U S A. 1982 May;79(9):2836-9. doi: 10.1073/pnas.79.9.2836.
7
Efficient algorithms for folding and comparing nucleic acid sequences.
Nucleic Acids Res. 1982 Jan 11;10(1):197-206. doi: 10.1093/nar/10.1.197.
8
An improved method of testing for evolutionary homology.
J Mol Biol. 1966 Mar;16(1):9-16. doi: 10.1016/s0022-2836(66)80258-9.
9
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.
10
Matching sequences under deletion-insertion constraints.
Proc Natl Acad Sci U S A. 1972 Jan;69(1):4-6. doi: 10.1073/pnas.69.1.4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验