School of Informatics and Computing, Indiana University, 150 S Woodlawn Ave, Bloomington, IN 47405, USA.
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-12-S2-S2. Epub 2011 Jul 27.
Proximity-based methods and co-evolution-based phylogenetic profiles methods have been successfully used for the identification of functionally related genes. Proximity-based methods are effective for physically clustered genes while the phylogenetic profiles method is effective for co-occurring gene sets. However, both methods predict many false positives and false negatives. In this paper, we propose the Gene Cluster Profile Vector (GCPV) method, which combines these two methods by using phylogenetic profiles of whole gene clusters. The GCPV method is, currently, the only genome comparison based method that allows for the characterization of relationships between gene clusters based profiles of individual genes in clusters.
The GCPV method groups together reasonably related operons in E. coli about 60% of the time. The method is not sensitive to the choice of a reference genome set used and it outperforms the conventional phylogenetic profiles method. Finally, we show that the method works well for predicted gene clusters from C. crescentus and can serve as an important tool not only for understanding gene function, but also for elucidating mechanisms of general biological processes.
The GCPV method has shown to be an effective and robust approach to the prediction of functionally related gene sets from proximity-based gene clusters or operons.
基于邻近度的方法和基于共同进化的系统发育分布方法已成功用于鉴定功能相关基因。基于邻近度的方法对于物理聚类基因是有效的,而系统发育分布方法对于共现基因集是有效的。然而,这两种方法都预测了许多假阳性和假阴性。在本文中,我们提出了基因簇分布向量(GCPV)方法,该方法通过使用整个基因簇的系统发育分布来结合这两种方法。GCPV 方法是目前唯一一种基于基因组比较的方法,允许根据基因簇中单个基因的分布特征来描述基因簇之间的关系。
GCPV 方法可以将大肠杆菌中合理相关的操纵子分组在一起,成功率约为 60%。该方法对参考基因组集的选择不敏感,并且优于传统的系统发育分布方法。最后,我们表明该方法适用于新月柄杆菌的预测基因簇,不仅可以作为理解基因功能的重要工具,也可以作为阐明一般生物学过程机制的重要工具。
GCPV 方法已被证明是一种有效的、稳健的方法,可以从基于邻近度的基因簇或操纵子中预测功能相关的基因集。