Suppr超能文献

FASTA-SWAP和FASTA-PAT:使用比对氨基酸组合进行模式数据库搜索以及一种新颖的评分理论。

FASTA-SWAP and FASTA-PAT: pattern database searches using combinations of aligned amino acids, and a novel scoring theory.

作者信息

Ladunga I, Wiese B A, Smith R F

机构信息

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

J Mol Biol. 1996 Jun 21;259(4):840-54. doi: 10.1006/jmbi.1996.0362.

Abstract

We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using our World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Op tions/fastapat.html).

摘要

我们介绍了两种新的模式数据库搜索工具,它们利用统计学显著性和信息论来改进蛋白质功能识别。本文介绍的通用模式评分理论以及特定矩阵,再加上模式数据库的低冗余性,提高了搜索的灵敏度和选择性。模式评分优先奖励模式中保守位置的匹配,其得分高于可变位置的匹配,并且给保守位置的错配分配比可变位置的错配更多的负分数。模式评分理论可用于为从任何多序列比对集合中导出的模式创建对数似然模式分数。这个理论框架可用于使现有的序列数据库搜索工具适应模式分析。我们的FASTA-SWAP和FASTA-PAT工具是FASTA程序的扩展,用于在模式数据库中搜索序列查询。第一步,FASTA-SWAP在查询序列和库模式的对角线上搜索高分片段,而FASTA-PAT执行哈希的扩展版本。第二步,两种方法都使用动态规划来优化比对和分数。这些工具利用对齐位置中氨基酸残基所有可能组合的极其紧凑的二进制表示。我们的FASTA-SWAP和FASTA-PAT工具非常适合功能识别可能被序列数据库搜索方法遗漏的远亲。可以使用我们的万维网服务器(http://dot.imgen.bcm.tmc.edu:9331/seq-search/Op tions/fastapat.html)进行FASTA-SWAP和FASTA-PAT搜索。

相似文献

6
Database similarity searches.数据库相似性搜索。
Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24.
7
Incremental window-based protein sequence alignment algorithms.基于窗口递增的蛋白质序列比对算法。
Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297.

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验