Suppr超能文献

MAST:跨越站点和树的混合系统发育推断。

MAST: Phylogenetic Inference with Mixtures Across Sites and Trees.

机构信息

School of Computing, Australian National University, Canberra, ACT 2601, Australia.

Research School of Biology, Australian National University, Canberra, ACT 2601, Australia.

出版信息

Syst Biol. 2024 Jul 27;73(2):375-391. doi: 10.1093/sysbio/syae008.

Abstract

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

摘要

现在,数百个甚至数千个基因座通常用于现代系统发育基因组学研究。树推断的串联方法假设整个数据集只有一个拓扑结构,但由于不完全谱系分选(ILS)、基因渗入和/或水平基因转移,不同的基因座可能具有不同的进化历史;甚至单个基因座由于重组也可能不是树状的。为了克服这一缺点,我们引入了一种多树混合物模型的实现,我们称之为跨位点和树的混合物(MAST)。该模型通过允许用户在单个比对中估计一组预定义分支树中每棵树的权重,扩展了 Boussau 等人(2009 年)的先前实现。MAST 模型允许每棵树都有自己的权重、拓扑结构、分支长度、替代模型、核苷酸或氨基酸频率以及跨位点的速率异质性模型。我们在流行的系统发育软件 IQ-TREE 中以最大似然框架实现了 MAST 模型。模拟表明,在广泛的生物学现实场景下,我们可以准确地恢复真实的模型参数,包括给定一组树拓扑的分支长度和树权重。我们还表明,我们可以使用标准的统计推断方法来拒绝在多棵树下模拟数据的单树模型(反之亦然)。我们将 MAST 模型应用于多个灵长类数据集,发现它可以恢复大猿中的 ILS 信号,以及几个猕猴物种之间基因渗入导致的小树的不对称性。当应用于 4 种 Platyrrhine 物种的数据集时,标准的串联最大似然(ML)和基因树方法不一致,我们观察到 MAST 赋予基因树方法也支持的树最高权重(即,最大比例的位点)。这些结果表明,MAST 模型能够使用 ML 分析串联比对,同时避免了假设只有一棵树所带来的一些偏差。我们讨论了如何在未来扩展 MAST 模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ead/11282360/0bfa9f9bc2e4/syae008_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验