Computational & Mathematical Biology, Genome Institute of Singapore, Singapore, Singapore.
PLoS One. 2010 Feb 4;5(2):e8985. doi: 10.1371/journal.pone.0008985.
Accurate reconstruction of ancestral character states on a phylogeny is crucial in many genomics studies. We study how to select species to achieve the best reconstruction of ancestral character states on a phylogeny. We first show that the marginal maximum likelihood has the monotonicity property that more taxa give better reconstruction, but the Fitch method does not have it even on an ultrametric phylogeny. We further validate a greedy approach for species selection using simulation. The validation tests indicate that backward greedy selection outperforms forward greedy selection. In addition, by applying our selection strategy, we obtain a set of the ten most informative species for the reconstruction of the genomic sequence of the so-called boreoeutherian ancestor of placental mammals. This study has broad relevance in comparative genomics and paleogenomics since limited research resources do not allow researchers to sequence the large number of descendant species required to reconstruct an ancestral sequence.
准确重建系统发育树上的祖先特征状态在许多基因组学研究中至关重要。我们研究如何选择物种,以实现对系统发育树上祖先特征状态的最佳重建。我们首先表明,边际最大似然具有单调性,即更多的分类单元可以提供更好的重建效果,但即使在超度量系统发育树上,Fitch 方法也没有这个性质。我们进一步使用模拟验证了一种用于物种选择的贪婪算法。验证测试表明,后向贪婪选择优于前向贪婪选择。此外,通过应用我们的选择策略,我们为重建所谓的胎盘哺乳动物有胎盘祖先的基因组序列选择了十个最具信息量的物种。由于有限的研究资源不允许研究人员对重建祖先序列所需的大量后裔物种进行测序,因此这项研究在比较基因组学和古基因组学中具有广泛的相关性。