Le Bouder-Langevin Stéphanie, Capron-Montaland Isabelle, De Rosa Renaud, Labedan Bernard
Evolution Moléculaire et Génomique, Institut de Génétique et Microbiologie, Université Paris-Sud, 91405 Orsay Cedex, France.
Genome Res. 2002 Dec;12(12):1961-73. doi: 10.1101/gr.393902.
Protein homology is often limited to long structural segments that we have previously called modules. We describe here a suite of programs used to catalog the whole set of modules present in microbial proteomes. First, the Darwin AllAll program detects homologous segments using thresholds for evolutionary distance and alignment length, and another program classifies these modules. After assembling these homologous modules in families, we further group families which are related by a chain of neighboring unrelated homologous modules. With the automatic analysis of these groups of families sharing homologous modules in independent multimodular proteins, one can split into their component parts many fused modules and/or deduce by logic more distant modules. All detected and inferred modules are reassembled in refined families. These two last steps are made by a unique program. Eventually, the soundness of the data obtained by this experimental approach is checked using independent tests. To illustrate this modular approach, we compared four proteobacterial proteomes (Campylobacter jejuni, Escherichia coli, Haemophilus influenzae, and Helicobacter pylori). It appears that this method might retrieve from present-day proteins many of the modules which can help to trace back ancient events of gene duplication and/or fusion.
蛋白质同源性通常局限于我们之前称为模块的长结构片段。我们在此描述了一套用于编目微生物蛋白质组中所有模块的程序。首先,达尔文全对全程序使用进化距离和比对长度阈值来检测同源片段,另一个程序对这些模块进行分类。在将这些同源模块组装成家族后,我们进一步将通过相邻不相关同源模块链相关的家族进行分组。通过自动分析在独立多模块蛋白质中共享同源模块的这些家族组,人们可以将许多融合模块拆分成其组成部分,和/或通过逻辑推断出更远距离的模块。所有检测到的和推断出的模块都在精细的家族中重新组装。最后这两个步骤由一个单独的程序完成。最终,使用独立测试来检验通过这种实验方法获得的数据的可靠性。为了说明这种模块化方法,我们比较了四种变形菌门细菌的蛋白质组(空肠弯曲菌、大肠杆菌、流感嗜血杆菌和幽门螺杆菌)。看来这种方法可能从现代蛋白质中检索出许多有助于追溯基因复制和/或融合的古代事件的模块。