O'Brien John D, Didelot Xavier, Iqbal Zamin, Amenga-Etego Lucas, Ahiska Bartu, Falush Daniel
Department of Mathematics, Bowdoin College, Brunswick, Maine 04011
School of Public Health, Imperial College London, London W2 1PG, United Kingdom.
Genetics. 2014 Jul;197(3):925-37. doi: 10.1534/genetics.114.161299. Epub 2014 May 1.
Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of high rates of migration among sample sites. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples.
宏基因组学为研究与环境的进化相互作用提供了一套强大的新工具。然而,缺乏基于模型的统计方法意味着研究人员往往无法充分利用这些复杂信息。我们提出了一种贝叶斯方法,用于推断宏基因组样本中相关生物体之间的系统发育关系。我们的方法利用样本中分类单元频率的变化,同时推断每个谱系单倍型、连接它们的系统发育树以及它们在每个样本中的频率。该算法在模拟数据上的应用表明,即使在样本位点之间存在高迁移率的情况下,我们的方法也能恢复相当一部分系统发育结构。我们提供了该方法应用于从南极湖泊中回收的绿硫细菌数据、恶性疟原虫混合感染的质体数据以及毒力脑膜炎奈瑟菌样本数据的实例。