Department of Biological Sciences, Rowan University, Glassboro, NJ, 08028, USA.
Department of Molecular and Cellular Biosciences, Rowan University, Glassboro, NJ, 08028, USA.
BMC Ecol Evol. 2021 Nov 29;21(1):214. doi: 10.1186/s12862-021-01931-5.
Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored.
We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA.
We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.
多序列比对 (MSA) 是大多数比较序列分析输入的基本数据单元。特别是在系统发育分析中,MSA 构建中的错误有可能在下游分析中进一步引入错误,如系统发育重建本身、祖先状态重建和分歧时间估计。除了为分析提供 MSA 之外,研究人员还必须为给定的分析指定合适的进化模型。最常见的是,研究人员应用相对模型选择从候选集中选择一个模型,然后将 MSA 和选定的模型作为输入提供给后续分析。虽然已经探索了 MSA 错误对系统发育学管道的大多数阶段的影响,但 MSA 不确定性对相对模型选择过程本身的潜在影响尚未得到探索。
我们评估了在给定 MSA 的多个扰动版本中呈现时相对模型选择的一致性。我们发现,虽然相对模型选择对 MSA 不确定性具有很强的鲁棒性,但在很大比例的情况下,相对模型选择会从同一组序列生成的不同 MSA 中识别出不同的最佳拟合模型。我们发现与氨基酸数据相比,核苷酸数据的问题更为普遍。然而,我们还发现,很难预测相对模型选择在给定的 MSA 中是否对不确定性具有鲁棒性或敏感性。
我们发现 MSA 不确定性可以比以前认识到的更广泛地影响系统发育分析管道的几乎所有步骤,包括相对模型选择。