Kolesov G, Mewes H W, Frishman D
GSF - National Research Center for Environment and Health, Institute for Bioinformatics, Ingolstädter Landstrasse 1, Neueherberg, 85764, Germany
J Mol Biol. 2001 Aug 24;311(4):639-56. doi: 10.1006/jmbi.2001.4701.
We describe a computational approach for finding genes that are functionally related but do not possess any noticeable sequence similarity. Our method, which we call SNAP (similarity-neighborhood approach), reveals the conservation of gene order on bacterial chromosomes based on both cross-genome comparison and context information. The novel feature of this method is that it does not rely on detection of conserved colinear gene strings. Instead, we introduce the notion of a similarity-neighborhood graph (SN-graph), which is constructed from the chains of similarity and neighborhood relationships between orthologous genes in different genomes and adjacent genes in the same genome, respectively. An SN-cycle is defined as a closed path on the SN-graph and is postulated to preferentially join functionally related gene products that participate in the same biochemical or regulatory process. We demonstrate the substantial non-randomness and functional significance of SN-cycles derived from real genome data and estimate the prediction accuracy of SNAP in assigning broad function to uncharacterized proteins. Examples of practical application of SNAP for improving the quality of genome annotation are described.
我们描述了一种计算方法,用于寻找功能相关但不具有任何明显序列相似性的基因。我们的方法,我们称之为SNAP(相似性邻域方法),基于跨基因组比较和上下文信息揭示细菌染色体上基因顺序的保守性。该方法的新颖之处在于它不依赖于保守共线基因串的检测。相反,我们引入了相似性邻域图(SN-图)的概念,它分别由不同基因组中直系同源基因之间以及同一基因组中相邻基因之间的相似性和邻域关系链构建而成。一个SN-循环被定义为SN-图上的一条封闭路径,并假定它优先连接参与相同生化或调控过程的功能相关基因产物。我们证明了从真实基因组数据中得出的SN-循环具有显著的非随机性和功能意义,并估计了SNAP在为未表征蛋白质分配广泛功能时的预测准确性。还描述了SNAP在提高基因组注释质量方面的实际应用示例。