• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

进化替换模型的相对模型选择可能对多重序列比对的不确定性敏感。

Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty.

机构信息

Department of Biological Sciences, Rowan University, Glassboro, NJ, 08028, USA.

Department of Molecular and Cellular Biosciences, Rowan University, Glassboro, NJ, 08028, USA.

出版信息

BMC Ecol Evol. 2021 Nov 29;21(1):214. doi: 10.1186/s12862-021-01931-5.

DOI:10.1186/s12862-021-01931-5
PMID:34844571
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8628390/
Abstract

BACKGROUND

Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored.

RESULTS

We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA.

CONCLUSIONS

We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.

摘要

背景

多序列比对 (MSA) 是大多数比较序列分析输入的基本数据单元。特别是在系统发育分析中,MSA 构建中的错误有可能在下游分析中进一步引入错误,如系统发育重建本身、祖先状态重建和分歧时间估计。除了为分析提供 MSA 之外,研究人员还必须为给定的分析指定合适的进化模型。最常见的是,研究人员应用相对模型选择从候选集中选择一个模型,然后将 MSA 和选定的模型作为输入提供给后续分析。虽然已经探索了 MSA 错误对系统发育学管道的大多数阶段的影响,但 MSA 不确定性对相对模型选择过程本身的潜在影响尚未得到探索。

结果

我们评估了在给定 MSA 的多个扰动版本中呈现时相对模型选择的一致性。我们发现,虽然相对模型选择对 MSA 不确定性具有很强的鲁棒性,但在很大比例的情况下,相对模型选择会从同一组序列生成的不同 MSA 中识别出不同的最佳拟合模型。我们发现与氨基酸数据相比,核苷酸数据的问题更为普遍。然而,我们还发现,很难预测相对模型选择在给定的 MSA 中是否对不确定性具有鲁棒性或敏感性。

结论

我们发现 MSA 不确定性可以比以前认识到的更广泛地影响系统发育分析管道的几乎所有步骤,包括相对模型选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/616d5ce280ea/12862_2021_1931_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/b12ed0651ef6/12862_2021_1931_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/b25217bc40db/12862_2021_1931_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/24ad34a4bbc7/12862_2021_1931_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/0be8e6cd7ec3/12862_2021_1931_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/a62e0d238c89/12862_2021_1931_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/616d5ce280ea/12862_2021_1931_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/b12ed0651ef6/12862_2021_1931_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/b25217bc40db/12862_2021_1931_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/24ad34a4bbc7/12862_2021_1931_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/0be8e6cd7ec3/12862_2021_1931_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/a62e0d238c89/12862_2021_1931_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f764/8628390/616d5ce280ea/12862_2021_1931_Fig6_HTML.jpg

相似文献

1
Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty.进化替换模型的相对模型选择可能对多重序列比对的不确定性敏感。
BMC Ecol Evol. 2021 Nov 29;21(1):214. doi: 10.1186/s12862-021-01931-5.
2
Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.在存在多序列比对不确定性的情况下系统发育方法统计不一致性的证据。
Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.
3
Class of multiple sequence alignment algorithm affects genomic analysis.多序列比对算法的类别会影响基因组分析。
Mol Biol Evol. 2013 Mar;30(3):642-53. doi: 10.1093/molbev/mss256. Epub 2012 Nov 9.
4
LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation.LMAP_S:轻量级多基因对齐与系统发育估算。
BMC Bioinformatics. 2019 Dec 30;20(1):739. doi: 10.1186/s12859-019-3292-5.
5
Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map.使用完全似然得分和位置偏移图对多序列比对错误进行表征。
BMC Bioinformatics. 2016 Mar 18;17:133. doi: 10.1186/s12859-016-0945-5.
6
Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences.评估对齐过滤方法在减少错误对进化推断影响方面的有用性。
BMC Evol Biol. 2019 Jan 11;19(1):21. doi: 10.1186/s12862-019-1350-2.
7
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.在系统发育树重建中,k元组距离与四种基于模型的距离之间的性能比较。
Nucleic Acids Res. 2008 Mar;36(5):e33. doi: 10.1093/nar/gkn075. Epub 2008 Feb 22.
8
Evaluation measures of multiple sequence alignments.多序列比对的评估方法。
J Comput Biol. 2000 Feb-Apr;7(1-2):261-76. doi: 10.1089/10665270050081513.
9
Characterization of pairwise and multiple sequence alignment errors.成对和多序列比对错误的特征描述。
Gene. 2009 Jul 15;441(1-2):141-7. doi: 10.1016/j.gene.2008.05.016. Epub 2008 Jun 3.
10
Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.鉴定多重序列比对中高置信同源簇。
Mol Biol Evol. 2019 Oct 1;36(10):2340-2351. doi: 10.1093/molbev/msz142.

引用本文的文献

1
Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling.通过位点抽样和上采样来驯服系统发育基因组学中最优替代模型的选择。
Mol Biol Evol. 2022 Nov 3;39(11). doi: 10.1093/molbev/msac236.
2
Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences.MAHDS方法在高度分化氨基酸序列多重比对中的应用。
Int J Mol Sci. 2022 Mar 29;23(7):3764. doi: 10.3390/ijms23073764.

本文引用的文献

1
ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.ModelTeller:使用机器学习进行最优系统发育重建的模型选择。
Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.
2
Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics.相对模型拟合度不能预测单基因蛋白质系统发生的拓扑准确性。
Mol Biol Evol. 2020 Jul 1;37(7):2110-2123. doi: 10.1093/molbev/msaa075.
3
The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life.
在构建和定时代哺乳动物系统发育树时,配准不确定性、替代模型和先验概率的影响。
BMC Evol Biol. 2019 Nov 6;19(1):203. doi: 10.1186/s12862-019-1534-9.
4
The Prevalence and Impact of Model Violations in Phylogenetic Analysis.系统发育分析中模型违反的普遍性及其影响。
Genome Biol Evol. 2019 Dec 1;11(12):3341-3352. doi: 10.1093/gbe/evz193.
5
Model selection may not be a mandatory step for phylogeny reconstruction.模型选择可能不是系统发育重建的强制性步骤。
Nat Commun. 2019 Feb 25;10(1):934. doi: 10.1038/s41467-019-08822-w.
6
Relative Evolutionary Rates in Proteins Are Largely Insensitive to the Substitution Model.蛋白质的相对进化率在很大程度上不受替换模型的影响。
Mol Biol Evol. 2018 Sep 1;35(9):2307-2317. doi: 10.1093/molbev/msy127.
7
Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.多序列比对平均法提高系统发育重建。
Syst Biol. 2019 Jan 1;68(1):117-130. doi: 10.1093/sysbio/syy036.
8
PhyloMAd: efficient assessment of phylogenomic model adequacy.PhyloMAd:高效评估系统发育基因组模型适配性。
Bioinformatics. 2018 Jul 1;34(13):2300-2301. doi: 10.1093/bioinformatics/bty103.
9
ModelFinder: fast model selection for accurate phylogenetic estimates.ModelFinder:用于准确系统发育估计的快速模型选择
Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8.
10
bModelTest: Bayesian phylogenetic site model averaging and model comparison.bModelTest:贝叶斯系统发育位点模型平均与模型比较。
BMC Evol Biol. 2017 Feb 6;17(1):42. doi: 10.1186/s12862-017-0890-6.