Suppr超能文献

一种用于通过序列比较检测远程同源物的新算法的性能评估。

Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison.

作者信息

Kann Maricel G, Goldstein Richard A

机构信息

Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109-1055, USA.

出版信息

Proteins. 2002 Aug 1;48(2):367-76. doi: 10.1002/prot.10117.

Abstract

A detailed analysis of the performance of hybrid, a new sequence alignment algorithm developed by Yu and coworkers that combines Smith Waterman local dynamic programming with a local version of the maximum-likelihood approach, was made to access the applicability of this algorithm to the detection of distant homologs by sequence comparison. We analyzed the statistics of hybrid with a set of nonhomologous protein sequences from the SCOP database and found that the statistics of the scores from hybrid algorithm follows an Extreme Value Distribution with lambda approximately 1, as previously shown by Yu et al. for the case of artificially generated sequences. Local dynamic programming was compared to the hybrid algorithm by using two different test data sets of distant homologs from the PFAM and COGs protein sequence databases. The studies were made with several score functions in current use including OPTIMA, a new score function originally developed to detect remote homologs with the Smith Waterman algorithm. We found OPTIMA to be the best score function for both both dynamic programming and the hybrid algorithms. The ability of dynamic programming to discriminate between homologs and nonhomologs in the two sets of distantly related sequences is slightly better than that of hybrid algorithm. The advantage of producing accurate score statistics with only a few simulations may overcome the small differences in performance and make this new algorithm suitable for detection of homologs in conjunction with a wide range of score functions and gap penalties.

摘要

对一种新的序列比对算法“杂交算法(hybrid)”的性能进行了详细分析,该算法由Yu及其同事开发,它将史密斯-沃特曼局部动态规划与局部最大似然法相结合,目的是评估此算法在通过序列比较检测远源同源物方面的适用性。我们用一组来自SCOP数据库的非同源蛋白质序列分析了杂交算法的统计数据,发现杂交算法得分的统计数据遵循极值分布,其中λ约为1,正如Yu等人之前在人工生成序列的情况下所表明的那样。通过使用来自PFAM和COGs蛋白质序列数据库的两组不同的远源同源物测试数据集,将局部动态规划与杂交算法进行了比较。研究使用了当前使用的几种评分函数,包括OPTIMA,这是一种最初为用史密斯-沃特曼算法检测远源同源物而开发的新评分函数。我们发现OPTIMA对于动态规划和杂交算法都是最佳评分函数。在两组远缘相关序列中,动态规划区分同源物和非同源物的能力略优于杂交算法。仅通过少量模拟就能产生准确评分统计数据的优势,可能会克服性能上的微小差异,并使这种新算法适用于结合各种评分函数和空位罚分来检测同源物。

相似文献

2
Remote homology detection of integral membrane proteins using conserved sequence features.
Proteins. 2008 May 15;71(3):1387-99. doi: 10.1002/prot.21825.
3
Variable gap penalty for protein sequence-structure alignment.
Protein Eng Des Sel. 2006 Mar;19(3):129-33. doi: 10.1093/protein/gzj005. Epub 2006 Jan 19.
4
Protein structure mining using a structural alphabet.
Proteins. 2008 May 1;71(2):920-37. doi: 10.1002/prot.21776.
7
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.
Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.
8
Optimization of a new score function for the generation of accurate alignments.
Proteins. 2002 Sep 1;48(4):605-10. doi: 10.1002/prot.10132.
9
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.

引用本文的文献

1
Finding Protein and Nucleotide Similarities with FASTA.
Curr Protoc Bioinformatics. 2016 Mar 24;53:3.9.1-3.9.25. doi: 10.1002/0471250953.bi0309s53.
3
Novel type IV secretion system involved in propagation of genomic islands.
J Bacteriol. 2007 Feb;189(3):761-71. doi: 10.1128/JB.01327-06. Epub 2006 Nov 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验