Bansal Mukul S, Banay Guy, Gogarten J Peter, Shamir Ron
The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.
J Comput Biol. 2011 Sep;18(9):1087-114. doi: 10.1089/cmb.2011.0066.
In a horizontal gene transfer (HGT) event, a gene is transferred between two species that do not have an ancestor-descendant relationship. Typically, no more than a few genes are horizontally transferred between any two species. However, several studies identified pairs of species between which many different genes were horizontally transferred. Such a pair is said to be linked by a highway of gene sharing. We present a method for inferring such highways. Our method is based on the fact that the evolutionary histories of horizontally transferred genes disagree with the corresponding species phylogeny. Specifically, given a set of gene trees and a trusted rooted species tree, each gene tree is first decomposed into its constituent quartet trees and the quartets that are inconsistent with the species tree are identified. Our method finds a pair of species such that a highway between them explains the largest (normalized) fraction of inconsistent quartets. For a problem on n species and m input quartet trees, we give an efficient O(m + n(2))-time algorithm for detecting highways, which is optimal with respect to the quartets input size. An application of our method to a dataset of 1128 genes from 11 cyanobacterial species, as well as to simulated datasets, illustrates the efficacy of our method.
在水平基因转移(HGT)事件中,一个基因在两个没有祖先 - 后代关系的物种之间转移。通常,任意两个物种之间水平转移的基因不超过少数几个。然而,多项研究确定了许多不同基因在其间发生水平转移的物种对。这样的一对物种被称为通过基因共享通道相连。我们提出了一种推断此类通道的方法。我们的方法基于这样一个事实,即水平转移基因的进化历史与相应的物种系统发育不一致。具体而言,给定一组基因树和一个可信的有根物种树,首先将每个基因树分解为其组成的四重树,并识别出与物种树不一致的四重树。我们的方法找到一对物种,使得它们之间的通道能够解释最大(归一化)比例的不一致四重树。对于一个涉及n个物种和m个输入四重树的问题,我们给出了一种用于检测通道的高效O(m + n(2))时间算法,就四重树输入大小而言,该算法是最优的。我们的方法应用于来自11个蓝藻物种的1128个基因的数据集以及模拟数据集,说明了我们方法的有效性。