Suppr超能文献

使用基于长度和结构的序列比对工具(LESTAT)进行远距离同源性检测。

Distant homology detection using a LEngth and STructure-based sequence Alignment Tool (LESTAT).

作者信息

Lee Marianne M, Bundschuh Ralf, Chan Michael K

机构信息

The Ohio State Biophysics Program, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

Proteins. 2008 May 15;71(3):1409-19. doi: 10.1002/prot.21830.

Abstract

A new machine learning algorithm, LESTAT (LEngth and STructure-based sequence Alignment Tool) has been developed for detecting protein homologs having low-sequence identity. LESTAT is an iterative profile-based method that runs without reliance on a predefined library and incorporates several novel features that enhance its ability to identify remote sequences. To overcome the inherent bias associated with a single starting model, LESTAT utilizes three structural homologs to create a profile consisting of structurally conserved positions and block separation distances. Subsequent profiles are refined iteratively using sequence information obtained from previous cycles. Additionally, the refinement process incorporates a "lock-in" feature to retain the high-scoring sequences involved in previous alignments for subsequent model building and an enhancement factor to complement the weighting scheme used to build the position specific scoring matrix. A comparison of the performance of LESTAT against PSI-BLAST for seven systems reveals that LESTAT exhibits increased sensitivity and specificity over PSI-BLAST in six of these systems, based on the number of true homologs detected and the number of families these homologs covered. Notably, many of the hits identified are unique to each method, presumably resulting from the distinct differences in the two approaches. Taken together, these findings suggest that LESTAT is a useful complementary method to PSI-BLAST in the detection of distant homologs.

摘要

一种名为LESTAT(基于长度和结构的序列比对工具)的新型机器学习算法已被开发出来,用于检测低序列同一性的蛋白质同源物。LESTAT是一种基于迭代轮廓的方法,其运行不依赖于预定义的库,并结合了几个新特性,增强了其识别远缘序列的能力。为了克服与单个起始模型相关的固有偏差,LESTAT利用三个结构同源物来创建一个由结构保守位置和块分离距离组成的轮廓。随后的轮廓使用从先前循环中获得的序列信息进行迭代优化。此外,优化过程包含一个“锁定”功能,以保留先前比对中涉及的高分序列,用于后续的模型构建,以及一个增强因子,以补充用于构建位置特异性评分矩阵的加权方案。对LESTAT和PSI-BLAST在七个系统上的性能比较表明,基于检测到的真实同源物数量以及这些同源物覆盖的家族数量,在其中六个系统中,LESTAT比PSI-BLAST表现出更高的灵敏度和特异性。值得注意的是,许多识别出的命中结果是每种方法所特有的,这可能是由于两种方法的明显差异所致。综上所述,这些发现表明LESTAT在检测远缘同源物方面是PSI-BLAST的一种有用的补充方法。

相似文献

2
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
3
Fast model-based protein homology detection without alignment.
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
4
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.
J Mol Biol. 2002 Feb 1;315(5):1257-75. doi: 10.1006/jmbi.2001.5293.
5
Incremental window-based protein sequence alignment algorithms.
Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297.
7
A comparison of scoring functions for protein sequence profile alignment.
Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.
8
SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures.
Bioinformatics. 2005 Sep 15;21(18):3615-21. doi: 10.1093/bioinformatics/bti582. Epub 2005 Jul 14.
10
Sequence comparison and protein structure prediction.
Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19.

引用本文的文献

1
Using amino acid physicochemical distance transformation for fast protein remote homology detection.
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验