Suppr超能文献

利用同义蛋白质词的评估函数提高一致性比对器的比对质量。

Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.

机构信息

Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan.

出版信息

PLoS One. 2011;6(12):e27872. doi: 10.1371/journal.pone.0027872. Epub 2011 Dec 2.

Abstract

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.

摘要

大多数序列比对工具都可以成功地将具有较高序列同一性的蛋白质序列进行比对。然而,当考虑到亲缘关系较远的序列(<20%同一性)时,相应结构比对的准确性会迅速下降。在这个同一性范围内,为了最大化序列相似性而优化的比对通常从结构角度来看是不准确的。在过去的二十年中,大多数多蛋白质比对器都针对其基于序列信息复制结构比对的能力进行了优化。目前可用的方法在使用替换矩阵、傅里叶变换、复杂的轮廓-轮廓函数或基于一致性的方法进行对齐残基之间的相似性测量方面存在本质区别,最近还出现了一些基于一致性的方法。在本文中,我们提出了一种灵活的残基对相似性度量方法,以提高蛋白质序列比对的质量。我们的方法称为 SymAlign,它依赖于在相当大的数据集部分中发现的保守词的识别,并得到进化分析的支持。然后,这些词用于定义一个位置特定的替换矩阵,该矩阵更好地反映局部相似性的生物学意义。实验结果表明,SymAlign 评分方案可以被整合到 T-Coffee 中以提高序列比对的准确性。我们还证明 SymAlign 对结构上不相似的蛋白质的存在不那么敏感。在序列同一性和结构相似性之间的关系分析中,SymAlign 可以更好地区分结构相似的蛋白质和非相似的蛋白质。我们表明,通过基于加权 n-gram 的相似性估计,可以显著改进蛋白质序列比对。在我们对由此产生的比对的分析中,序列保守性成为结构相似性的更好指标。SymAlign 还提供了对齐可视化功能,可以在点矩阵上显示次优对齐。可视化功能使识别可能未被动态编程识别的有充分支持的替代对齐变得容易。SymAlign 可在 http://bio-cluster.iis.sinica.edu.tw/SymAlign/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/3229492/2a1deea8045d/pone.0027872.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验