Department of Biology, University of Ottawa, Marie-Curie Private, Ottawa, ON K1N 9A7, Canada.
Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada.
Genes (Basel). 2021 Nov 18;12(11):1809. doi: 10.3390/genes12111809.
Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.
多序列比对(MSA)是几乎所有序列比较和分子系统发育推断的基础。大规模基因组分析通常与自动渐进 MSA 相关联,而无需随后进行手动调整,由于缺乏一致和明确的标准,这种方法本身往往容易出错。在这里,我概述了核苷酸、氨基酸和密码子序列的渐进 MSA 无法避免的几种常见比对错误。然后提出了可以自动化修复此类比对错误的方法。我强调了位置权重矩阵作为 MSA 细化的新工具的实用性,并通过细化核苷酸和氨基酸序列的 MSA 来说明其用法。位置权重矩阵方法的主要优点包括:(1)它使用所有序列的信息,与基于成对比对得分和不一致性度量的其他常用方法形成对比;(2)它的快速计算,使其适用于大量长的病毒基因组序列。