Deng Xin, Cheng Jianlin
Computer Science Department, University of Missouri, Columbia, MO, USA.
Methods Mol Biol. 2014;1079:273-83. doi: 10.1007/978-1-62703-646-7_18.
Multiple Sequence Alignment (MSA) is an essential tool in protein structure modeling, gene and protein function prediction, DNA motif recognition, phylogenetic analysis, and many other bioinformatics tasks. Therefore, improving the accuracy of multiple sequence alignment is an important long-term objective in bioinformatics. We designed and developed a new method MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. Different from the multiple sequence alignment methods that use the tertiary structure information of some sequences, our method uses the structural information purely predicted from sequences. In this chapter, we first introduce some background and related techniques in the field of multiple sequence alignment. Then, we describe the detailed algorithm of MSACompro. Finally, we show that integrating predicted protein structural information improved the multiple sequence alignment accuracy.
多序列比对(MSA)是蛋白质结构建模、基因和蛋白质功能预测、DNA 基序识别、系统发育分析以及许多其他生物信息学任务中的一项重要工具。因此,提高多序列比对的准确性是生物信息学中的一个重要长期目标。我们设计并开发了一种新方法 MSACompro,将预测的二级结构、相对溶剂可及性和残基-残基接触信息纳入当前最准确的基于后验概率的 MSA 方法中,以提高多序列比对的准确性。与使用某些序列三级结构信息的多序列比对方法不同,我们的方法使用的是完全从序列预测的结构信息。在本章中,我们首先介绍多序列比对领域的一些背景和相关技术。然后,我们描述 MSACompro 的详细算法。最后,我们表明整合预测的蛋白质结构信息提高了多序列比对的准确性。