Darby Charlotte A, Stolzer Maureen, Ropp Patrick J, Barker Daniel, Durand Dannie
Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
School of Biology, University of St. Andrews, St. Andrews, Fife KY16 9TH, UK.
Bioinformatics. 2017 Mar 1;33(5):640-649. doi: 10.1093/bioinformatics/btw686.
Orthology analysis is a fundamental tool in comparative genomics. Sophisticated methods have been developed to distinguish between orthologs and paralogs and to classify paralogs into subtypes depending on the duplication mechanism and timing, relative to speciation. However, no comparable framework exists for xenologs: gene pairs whose history, since their divergence, includes a horizontal transfer. Further, the diversity of gene pairs that meet this broad definition calls for classification of xenologs with similar properties into subtypes.
We present a xenolog classification that uses phylogenetic reconciliation to assign each pair of genes to a class based on the event responsible for their divergence and the historical association between genes and species. Our classes distinguish between genes related through transfer alone and genes related through duplication and transfer. Further, they separate closely-related genes in distantly-related species from distantly-related genes in closely-related species. We present formal rules that assign gene pairs to specific xenolog classes, given a reconciled gene tree with an arbitrary number of duplications and transfers. These xenology classification rules have been implemented in software and tested on a collection of ∼13 000 prokaryotic gene families. In addition, we present a case study demonstrating the connection between xenolog classification and gene function prediction.
The xenolog classification rules have been implemented in N otung 2.9, a freely available phylogenetic reconciliation software package. http://www.cs.cmu.edu/~durand/Notung . Gene trees are available at http://dx.doi.org/10.7488/ds/1503 .
Supplementary data are available at Bioinformatics online.
直系同源分析是比较基因组学中的一项基本工具。已经开发出了复杂的方法来区分直系同源基因和旁系同源基因,并根据复制机制和时间(相对于物种形成)将旁系同源基因分类为不同的亚型。然而,对于异源同源基因(即自分化以来其历史包含水平转移的基因对),不存在类似的框架。此外,符合这一宽泛定义的基因对的多样性要求将具有相似特性的异源同源基因分类为不同的亚型。
我们提出了一种异源同源基因分类方法,该方法利用系统发育和解将每对基因根据导致它们分化的事件以及基因与物种之间的历史关联分配到一个类别中。我们的类别区分仅通过转移相关的基因和通过复制与转移相关的基因。此外,它们将远缘物种中的密切相关基因与近缘物种中的远缘相关基因区分开来。给定一个具有任意数量复制和转移的已和解基因树,我们提出了将基因对分配到特定异源同源基因类别的正式规则。这些异源同源基因分类规则已在软件中实现,并在约13000个原核基因家族的集合上进行了测试。此外,我们提供了一个案例研究,展示了异源同源基因分类与基因功能预测之间的联系。
异源同源基因分类规则已在Notung 2.9中实现,Notung 2.9是一个免费的系统发育和解软件包。http://www.cs.cmu.edu/~durand/Notung 。基因树可在http://dx.doi.org/10.7488/ds/1503获取。
补充数据可在《生物信息学》在线获取。