Li Lei, Ji Guoli, Ye Congting, Shu Changlong, Zhang Jie, Liang Chun
Department of Automation, Xiamen University, Fujian, 361005, China.
Department of Biology, Miami University, Oxford, OH, 45056, USA.
BMC Plant Biol. 2015 Jun 26;15:161. doi: 10.1186/s12870-015-0531-4.
Genes with different functions are originally generated from some ancestral genes by gene duplication, mutation and functional recombination. It is widely accepted that orthologs are homologous genes evolved from speciation events while paralogs are homologous genes resulted from gene duplication events.With the rapid increase of genomic data, identifying and distinguishing these genes among different species is becoming an important part of functional genomics research.
Using 35 plant and 6 green algal genomes from Phytozome v9, we clustered 1,291,670 peptide sequences into 49,355 homologous gene families in terms of sequence similarity. For each gene family, we have generated a peptide sequence alignment and phylogenetic tree, and identified the speciation/duplication events for every node within the tree. For each node, we also identified and highlighted diagnostic characters that facilitate appropriate addition of a new query sequence into the existing phylogenetic tree and sequence alignment of its best matched gene family. Based on a desired species or subgroup of all species, users can view the phylogenetic tree, sequence alignment and diagnostic characters for a given gene family selectively. PlantOrDB not only allows users to identify orthologs or paralogs from phylogenetic trees, but also provides all orthologs that are built using Reciprocal Best Hit (RBH) pairwise alignment method. Users can upload their own sequences to find the best matched gene families, and visualize their query sequences within the relevant phylogenetic trees and sequence alignments.
PlantOrDB ( http://bioinfolab.miamioh.edu/plantordb ) is a genome-wide ortholog database for land plants and green algae. PlantOrDB offers highly interactive visualization, accurate query classification and powerful search functions useful for functional genomic research.
具有不同功能的基因最初是由一些祖先基因通过基因复制、突变和功能重组产生的。普遍认为直系同源基因是由物种形成事件进化而来的同源基因,而旁系同源基因是由基因复制事件产生的同源基因。随着基因组数据的迅速增加,在不同物种中识别和区分这些基因正成为功能基因组学研究的重要组成部分。
利用来自Phytozome v9的35个植物和6个绿藻基因组,我们根据序列相似性将1,291,670个肽序列聚类为49,355个同源基因家族。对于每个基因家族,我们生成了一个肽序列比对和系统发育树,并确定了树中每个节点的物种形成/复制事件。对于每个节点,我们还识别并突出了诊断特征,这些特征有助于将新的查询序列适当地添加到现有的系统发育树及其最佳匹配基因家族的序列比对中。基于所有物种中的一个期望物种或亚组,用户可以选择性地查看给定基因家族的系统发育树、序列比对和诊断特征。PlantOrDB不仅允许用户从系统发育树中识别直系同源基因或旁系同源基因,还提供使用相互最佳比对(RBH)成对比对方法构建的所有直系同源基因。用户可以上传自己的序列以找到最佳匹配的基因家族,并在相关的系统发育树和序列比对中可视化他们的查询序列。