Mossel Elchanan, Roch Sebastien
U.C. Berkeley, Berkeley, USA.
J Math Biol. 2013 Oct;67(4):767-97. doi: 10.1007/s00285-012-0571-4. Epub 2012 Aug 9.
Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.
众所周知,位点间的突变率变化会给从分子序列重建进化树带来困难,尤其是可识别性问题。在此,我们引入一种新方法来估计通用的位点特异模型。我们的结果特别表明,在突变率变化的情况下,大型系统发育树通常是可识别的。我们还推导出了高概率重建所需的序列长度要求。我们的主要贡献是一种新颖的算法,该算法根据位点的突变率对位点进行聚类。在这个位点聚类步骤之后,可以使用标准的重建技术来恢复系统发育树。我们的结果依赖于一个基本观点:对于大型树,某些位点统计量会经历测度集中现象。