Mao Fenglou, Su Zhengchang, Olman Victor, Dam Phuongan, Liu Zhijie, Xu Ying
Computational Systems Biology Laboratory, Biochemistry and Molecular Biology Department, University of Georgia, A110 Life Science Building, 120 Green Street, Athens, GA 30602, USA.
Proc Natl Acad Sci U S A. 2006 Jan 3;103(1):129-34. doi: 10.1073/pnas.0509737102. Epub 2005 Dec 22.
Mapping biological pathways across microbial genomes is a highly important technique in functional studies of biological systems. Existing methods mainly rely on sequence-based orthologous gene mapping, which often leads to suboptimal mapping results because sequence-similarity information alone does not contain sufficient information for accurate identification of orthology relationship. Here we present an algorithm for pathway mapping across microbial genomes. The algorithm takes into account both sequence similarity and genomic structure information such as operons and regulons. One basic premise of our approach is that a microbial pathway could generally be decomposed into a few operons or regulons. We formulated the pathway-mapping problem to map genes across genomes to maximize their sequence similarity under the constraint that the mapped genes be grouped into a few operons, preferably coregulated in the target genome. We have developed an integer-programming algorithm for solving this constrained optimization problem and implemented the algorithm as a computer software program, p-map. We have tested p-map on a number of known homologous pathways. We conclude that using genomic structure information as constraints could greatly improve the pathway-mapping accuracy over methods that use sequence-similarity information alone.
在生物系统的功能研究中,跨微生物基因组绘制生物途径是一项极为重要的技术。现有方法主要依赖基于序列的直系同源基因映射,然而这常常导致映射结果不尽人意,因为仅序列相似性信息并不包含足够信息来准确识别直系同源关系。在此,我们提出一种用于跨微生物基因组进行途径映射的算法。该算法同时考虑了序列相似性和基因组结构信息,如操纵子和调控子。我们方法的一个基本前提是,微生物途径通常可分解为几个操纵子或调控子。我们将途径映射问题表述为跨基因组映射基因,以便在将映射基因分组为几个操纵子(最好在目标基因组中受到共同调控)的约束下最大化它们的序列相似性。我们开发了一种整数规划算法来解决这个约束优化问题,并将该算法实现为一个计算机软件程序p-map。我们已在多个已知的同源途径上对p-map进行了测试。我们得出结论,与仅使用序列相似性信息的方法相比,将基因组结构信息用作约束条件可极大提高途径映射的准确性。