Ranwez Vincent, Chantret Nathalie, Delsuc Frédéric
AGAP, University of Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France.
Institut des Sciences de l'Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France.
Methods Mol Biol. 2021;2231:51-70. doi: 10.1007/978-1-0716-1036-7_4.
Most genomic and evolutionary comparative analyses rely on accurate multiple sequence alignments. With their underlying codon structure, protein-coding nucleotide sequences pose a specific challenge for multiple sequence alignment. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that provided the first automatic solution for aligning protein-coding gene datasets containing both functional and nonfunctional sequences (pseudogenes). Through its unique features, reliable codon alignments can be built in the presence of frameshifts and stop codons suitable for subsequent analysis of selection based on the ratio of nonsynonymous to synonymous substitutions. Here we offer a practical overview and guidelines on the use of MACSE v2. This major update of the initial algorithm now comes with a graphical interface providing user-friendly access to different subprograms to handle multiple alignments of protein-coding sequences. We also present new pipelines based on MACSE v2 subprograms to handle large datasets and distributed as Singularity containers. MACSE and associated pipelines are available at: https://bioweb.supagro.inra.fr/macse/ .
大多数基因组和进化比较分析都依赖于准确的多序列比对。蛋白质编码核苷酸序列因其潜在的密码子结构,给多序列比对带来了特殊挑战。编码序列多序列比对工具(MACSE)是一个多序列比对程序,它首次提供了自动比对包含功能序列和无功能序列(假基因)的蛋白质编码基因数据集的解决方案。通过其独特的功能,可以在存在移码和终止密码子的情况下构建可靠的密码子比对,适用于随后基于非同义替换与同义替换比率的选择分析。在此,我们提供关于MACSE v2使用的实用概述和指南。初始算法的这一重大更新现在配有图形界面,为用户提供了方便地访问不同子程序以处理蛋白质编码序列多序列比对的途径。我们还展示了基于MACSE v2子程序的新流程,用于处理大型数据集,并以Singularity容器的形式分发。MACSE及相关流程可在以下网址获取:https://bioweb.supagro.inra.fr/macse/ 。