Suppr超能文献

MTRAP:一种基于两个连续残基对之间转移概率的新度量的两两序列比对算法。

MTRAP: pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues.

机构信息

Department of Information Sciences, Tokyo University of Science, 2641 Yamazaki, Noda City, Chiba, Japan.

出版信息

BMC Bioinformatics. 2010 May 8;11:235. doi: 10.1186/1471-2105-11-235.

Abstract

BACKGROUND

Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues.

RESULTS

We introduced a new method of alignment, called MTRAP by introducing a metric defined on compound systems of two sequences. In the benchmark tests by PREFAB 4.0 and HOMSTRAD, our pairwise alignment method gives higher accuracy than other methods such as ClustalW2, TCoffee, MAFFT. Especially for the sequences with sequence identity less than 15%, our method improves the alignment accuracy significantly. Moreover, we also showed that our algorithm works well together with a consistency-based progressive multiple alignment by modifying the TCoffee to use our measure.

CONCLUSIONS

We indicated that our method leads to a significant increase in alignment accuracy compared with other methods. Our improvement is especially clear in low identity range of sequences. The source code is available at our web page, whose address is found in the section "Availability and requirements".

摘要

背景

序列比对是分析生物系统的最重要技术之一。同样,比对并不完全,我们必须开发它以寻找更准确的方法。特别是,同源序列的比对在低序列相似性的情况下并不令人满意。近年来,用于对齐蛋白质序列的常用方法使用经验确定的度量。例如,度量通常由两个量(1)和(2)的组合定义:(1)两个残基段之间的替换总和,(2)插入/缺失区域的间隙罚分总和。这种度量是基于序列之间没有站点相关性的假设来确定的。在本文中,我们通过考虑连续残基的相关性来改进比对。

结果

我们通过引入一种在两个序列的复合系统上定义的度量,引入了一种新的比对方法,称为 MTRAP。在 PREFAB 4.0 和 HOMSTRAD 的基准测试中,我们的两两比对方法比其他方法(如 ClustalW2、TCoffee、MAFFT)具有更高的准确性。特别是对于序列同一性小于 15%的序列,我们的方法显著提高了比对准确性。此外,我们还通过修改 TCoffee 使用我们的度量来证明我们的算法与基于一致性的渐进多重比对配合良好。

结论

我们表明,与其他方法相比,我们的方法导致比对准确性显著提高。我们的改进在序列同一性较低的范围内尤为明显。源代码可在我们的网页上获得,其地址在“可用性和要求”部分中找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/cbe0e24b881f/1471-2105-11-235-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验