Khan Mehmood Alam, Mahmudi Owais, Ullah Ikram, Arvestad Lars, Lagergren Jens
KTH Royal Institute of Technology, School of Computer Science and Communication, Box 1031, Solna, 171 21, Sweden.
Science for Life Laboratory, Box 1031, Solna, 171 21, Sweden.
BMC Bioinformatics. 2016 Nov 11;17(Suppl 14):431. doi: 10.1186/s12859-016-1268-2.
Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.
In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify "highways" of LGT.
Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.
横向基因转移(LGT)是一种在生物学中具有重要作用的进化过程。它挑战了物种传统的二叉树状进化模式,并且由于其与抗生素抗性相关,正吸引着分子生物学家越来越多的关注。人们已经进行了许多尝试来在存在基因复制和丢失的情况下对LGT进行建模,但将LGT事件可靠地置于物种树中仍然是一个挑战。
在本文中,我们提出了概率方法,该方法对基因树与带时间标记的物种树的协调进行采样,并计算最大后验概率。基于马尔可夫链蒙特卡罗(MCMC)的方法使用概率模型DLTRS,该模型在宽松的分子钟下整合了LGT、基因复制、基因丢失和序列进化以用于替换率。我们可以估计基因树上的后验分布,并且与之前的工作不同,能够确定潜在LGT的实际位置,这可用于例如识别LGT的“高速公路”。
基于模拟研究,我们得出结论,该方法能够在大多数情况下推断基因树上的真实LGT事件,并将其与物种树上的正确分支进行协调。应用于两个包含来自蓝细菌和柔膜菌门基因家族的生物学数据集,我们发现了潜在的LGT“高速公路”,这些结果证实了其他研究以及先前未检测到的例子。