Suppr超能文献

从全基因组比对中检测系统发育断点和不和谐,以重建物种树。

Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction.

机构信息

Departments of Statistics and Botany, University of Wisconsin-Madison, USA.

出版信息

Genome Biol Evol. 2011;3:246-58. doi: 10.1093/gbe/evr013. Epub 2011 Feb 28.

Abstract

With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods.

摘要

随着序列数据的获取变得更加容易,现在可以获取和比对多个相关物种或种群的整个基因组。在这项工作中,我评估了一种统计方法的性能,该方法可以重建整个基因组中系统发育树的分布,估计给定分支真实存在的基因组比例,并推断出一个概括主要垂直遗传模式的一致树。与多个基因相比,处理全基因组比对有两个主要问题:数据量和重组断点的检测。这些断点将基因组比对划分为系统发育上同质的基因座,在给定的基因座内,所有位点都具有相同的系统发育树拓扑结构。为了确定这些基因座,我在这里描述了一种基于最小描述长度 (MDL) 原理的方法,该方法通过动态编程实现了计算效率。模拟表明,将 MDL 分区与贝叶斯一致性分析相结合,可以有效地估计垂直遗传信号和水平系统发育信号。该方法在不完全谱系分选和水平基因转移的情况下都表现良好。这里发现了系统偏差水平较高,这突出了需要良好的单树构建方法的必要性,这些方法是更精细的基因树/种系树协调方法的基础。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验