Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street, New Haven, CT 06520, USA.
Biomed Res Int. 2013;2013:621604. doi: 10.1155/2013/621604. Epub 2013 Jun 26.
Phylogenetic research is often stymied by selection of a marker that leads to poor phylogenetic resolution despite considerable cost and effort. Profiles of phylogenetic informativeness provide a quantitative measure for prioritizing gene sampling to resolve branching order in a particular epoch. To evaluate the utility of these profiles, we analyzed phylogenomic data sets from metazoans, fungi, and mammals, thus encompassing diverse time scales and taxonomic groups. We also evaluated the utility of profiles created based on simulated data sets. We found that genes selected via their informativeness dramatically outperformed haphazard sampling of markers. Furthermore, our analyses demonstrate that the original phylogenetic informativeness method can be extended to trees with more than four taxa. Thus, although the method currently predicts phylogenetic signal without specifically accounting for the misleading effects of stochastic noise, it is robust to the effects of homoplasy. The phylogenetic informativeness rankings obtained will allow other researchers to select advantageous genes for future studies within these clades, maximizing return on effort and investment. Genes identified might also yield efficient experimental designs for phylogenetic inference for many sister clades and outgroup taxa that are closely related to the diverse groups of organisms analyzed.
系统发育研究常常受到标记选择的阻碍,尽管花费了相当大的成本和精力,但标记选择导致系统发育分辨率较差。系统发育信息量分布提供了一种定量衡量标准,可优先选择基因采样以解决特定时期的分支顺序问题。为了评估这些分布的实用性,我们分析了后生动物、真菌和哺乳动物的基因组数据集,从而涵盖了不同的时间尺度和分类群。我们还评估了基于模拟数据集创建的分布的实用性。我们发现,通过信息量选择的基因大大优于随机选择标记的方法。此外,我们的分析表明,原始的系统发育信息量方法可以扩展到具有四个以上分类群的树。因此,尽管该方法目前可以预测系统发育信号,而无需专门考虑随机噪声的误导影响,但它对同形性的影响具有鲁棒性。获得的系统发育信息量排名将允许其他研究人员在这些进化枝内选择有利的基因进行未来的研究,从而最大限度地提高工作效率和投资回报。鉴定出的基因也可能为与分析的生物体多样化组密切相关的许多姐妹进化枝和外群分类群的系统发育推断提供有效的实验设计。