• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MTRAP:一种基于两个连续残基对之间转移概率的新度量的两两序列比对算法。

MTRAP: pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues.

机构信息

Department of Information Sciences, Tokyo University of Science, 2641 Yamazaki, Noda City, Chiba, Japan.

出版信息

BMC Bioinformatics. 2010 May 8;11:235. doi: 10.1186/1471-2105-11-235.

DOI:10.1186/1471-2105-11-235
PMID:20459682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2875243/
Abstract

BACKGROUND

Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues.

RESULTS

We introduced a new method of alignment, called MTRAP by introducing a metric defined on compound systems of two sequences. In the benchmark tests by PREFAB 4.0 and HOMSTRAD, our pairwise alignment method gives higher accuracy than other methods such as ClustalW2, TCoffee, MAFFT. Especially for the sequences with sequence identity less than 15%, our method improves the alignment accuracy significantly. Moreover, we also showed that our algorithm works well together with a consistency-based progressive multiple alignment by modifying the TCoffee to use our measure.

CONCLUSIONS

We indicated that our method leads to a significant increase in alignment accuracy compared with other methods. Our improvement is especially clear in low identity range of sequences. The source code is available at our web page, whose address is found in the section "Availability and requirements".

摘要

背景

序列比对是分析生物系统的最重要技术之一。同样,比对并不完全,我们必须开发它以寻找更准确的方法。特别是,同源序列的比对在低序列相似性的情况下并不令人满意。近年来,用于对齐蛋白质序列的常用方法使用经验确定的度量。例如,度量通常由两个量(1)和(2)的组合定义:(1)两个残基段之间的替换总和,(2)插入/缺失区域的间隙罚分总和。这种度量是基于序列之间没有站点相关性的假设来确定的。在本文中,我们通过考虑连续残基的相关性来改进比对。

结果

我们通过引入一种在两个序列的复合系统上定义的度量,引入了一种新的比对方法,称为 MTRAP。在 PREFAB 4.0 和 HOMSTRAD 的基准测试中,我们的两两比对方法比其他方法(如 ClustalW2、TCoffee、MAFFT)具有更高的准确性。特别是对于序列同一性小于 15%的序列,我们的方法显著提高了比对准确性。此外,我们还通过修改 TCoffee 使用我们的度量来证明我们的算法与基于一致性的渐进多重比对配合良好。

结论

我们表明,与其他方法相比,我们的方法导致比对准确性显著提高。我们的改进在序列同一性较低的范围内尤为明显。源代码可在我们的网页上获得,其地址在“可用性和要求”部分中找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/dcebe3f2def2/1471-2105-11-235-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/cbe0e24b881f/1471-2105-11-235-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/aeae43c86566/1471-2105-11-235-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/dcebe3f2def2/1471-2105-11-235-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/cbe0e24b881f/1471-2105-11-235-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/aeae43c86566/1471-2105-11-235-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2204/2875243/dcebe3f2def2/1471-2105-11-235-3.jpg

相似文献

1
MTRAP: pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues.MTRAP:一种基于两个连续残基对之间转移概率的新度量的两两序列比对算法。
BMC Bioinformatics. 2010 May 8;11:235. doi: 10.1186/1471-2105-11-235.
2
Multiple sequence alignment based on profile alignment of intermediate sequences.基于中间序列的轮廓比对进行多序列比对。
J Comput Biol. 2008 Sep;15(7):767-77. doi: 10.1089/cmb.2007.0132.
3
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.通过将结构信息纳入基于MAFFT的框架提高多种非编码RNA比对的准确性。
BMC Bioinformatics. 2008 Apr 25;9:212. doi: 10.1186/1471-2105-9-212.
4
Improvement in the accuracy of multiple sequence alignment program MAFFT.多重序列比对程序MAFFT准确性的提高。
Genome Inform. 2005;16(1):22-33.
5
MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本:多重序列比对准确性的提升。
Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.
6
Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.使用具有分段线性间隙成本的新型组对组序列比对算法提高多序列比对的准确性。
BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524.
7
A knowledge-based multiple-sequence alignment algorithm.基于知识的多序列比对算法。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):884-96. doi: 10.1109/TCBB.2013.102.
8
Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments.通过参照结构比对进行迭代优化,多重蛋白质序列比对的准确性得到显著提高。
J Mol Biol. 1996 Dec 13;264(4):823-38. doi: 10.1006/jmbi.1996.0679.
9
Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties.一种先验分配空位数量而非采用空位罚分的两两比对算法的统计评估与比较。
Bioinformatics. 2005 Apr 15;21(8):1421-8. doi: 10.1093/bioinformatics/bti198. Epub 2004 Dec 10.
10
R-Coffee: a method for multiple alignment of non-coding RNA.R-Coffee:一种非编码RNA多重比对的方法。
Nucleic Acids Res. 2008 May;36(9):e52. doi: 10.1093/nar/gkn174. Epub 2008 Apr 17.

引用本文的文献

1
An Extension of the Kimura Two-Parameter Model to the Natural Evolutionary Process.Kimura 两参数模型在自然进化过程中的扩展。
J Mol Evol. 2019 Jan;87(1):60-67. doi: 10.1007/s00239-018-9885-1. Epub 2019 Jan 10.
2
Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads.可变阶序列建模可提高对Ion Torrent DNA读数的细菌菌株鉴别能力。
BMC Bioinformatics. 2017 Jun 12;18(1):299. doi: 10.1186/s12859-017-1710-0.
3
Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.

本文引用的文献

1
Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues.基于相邻残基比对提高多序列比对算法的准确性。
Nucleic Acids Res. 2009 Feb;37(2):463-72. doi: 10.1093/nar/gkn945. Epub 2008 Dec 4.
2
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.
3
Probalign: multiple sequence alignment using partition function posterior probabilities.Probalign:使用配分函数后验概率进行多序列比对。
利用同义蛋白质词的评估函数提高一致性比对器的比对质量。
PLoS One. 2011;6(12):e27872. doi: 10.1371/journal.pone.0027872. Epub 2011 Dec 2.
Bioinformatics. 2006 Nov 15;22(22):2715-21. doi: 10.1093/bioinformatics/btl472. Epub 2006 Sep 5.
4
Analysis and comparison of benchmarks for multiple sequence alignment.多序列比对基准的分析与比较
In Silico Biol. 2006;6(4):321-39.
5
Incorporating background frequency improves entropy-based residue conservation measures.纳入背景频率可改善基于熵的残基保守性度量。
BMC Bioinformatics. 2006 Aug 17;7:385. doi: 10.1186/1471-2105-7-385.
6
Pairwise alignment incorporating dipeptide covariation.纳入二肽共变的成对序列比对。
Bioinformatics. 2005 Oct 1;21(19):3704-10. doi: 10.1093/bioinformatics/bti616. Epub 2005 Aug 25.
7
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.BAliBASE 3.0:多序列比对基准测试的最新进展。
Proteins. 2005 Oct 1;61(1):127-36. doi: 10.1002/prot.20527.
8
ProbCons: Probabilistic consistency-based multiple sequence alignment.ProbCons:基于概率一致性的多序列比对。
Genome Res. 2005 Feb;15(2):330-40. doi: 10.1101/gr.2821705.
9
SABmark--a benchmark for sequence alignment that covers the entire known fold space.SABmark——一种涵盖整个已知折叠空间的序列比对基准。
Bioinformatics. 2005 Apr 1;21(7):1267-8. doi: 10.1093/bioinformatics/bth493. Epub 2004 Aug 27.
10
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.