Yeats Corin, Bentley Stephen, Bateman Alex
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.
BMC Microbiol. 2003 Feb 6;3:3. doi: 10.1186/1471-2180-3-3.
Streptomyces coelicolor has long been considered a remarkable bacterium with a complex life-cycle, ubiquitous environmental distribution, linear chromosomes and plasmids, and a huge range of pharmaceutically useful secondary metabolites. Completion of the genome sequence demonstrated that this diversity carried through to the genetic level, with over 7000 genes identified. We sought to expand our understanding of this organism at the molecular level through identification and annotation of novel protein domains. Protein domains are the evolutionary conserved units from which proteins are formed.
Two automated methods were employed to rapidly generate an optimised set of targets, which were subsequently analysed manually. A final set of 37 domains or structural repeats, represented 204 times in the genome, was developed. Using these families enabled us to correlate items of information from many different resources. Several immediately enhance our understanding both of S. coelicolor and also general bacterial molecular mechanisms, including cell wall biosynthesis regulation and streptomycete telomere maintenance.
Delineation of protein domain families enables detailed analysis of protein function, as well as identification of likely regions or residues of particular interest. Hence this kind of prior approach can increase the rate of discovery in the laboratory. Furthermore we demonstrate that using this type of in silico method it is possible to fairly rapidly generate new biological information from previously uncorrelated data.
长期以来,天蓝色链霉菌一直被认为是一种非凡的细菌,具有复杂的生命周期、广泛的环境分布、线性染色体和质粒,以及大量具有药学用途的次级代谢产物。基因组序列的完成表明,这种多样性在遗传水平上也有所体现,已鉴定出7000多个基因。我们试图通过鉴定和注释新的蛋白质结构域,在分子水平上加深对这种生物体的理解。蛋白质结构域是构成蛋白质的进化保守单元。
采用两种自动化方法快速生成一组优化的目标,随后进行人工分析。最终确定了一组37个结构域或结构重复序列,在基因组中出现了204次。利用这些家族,我们能够关联来自许多不同资源的信息项。其中一些立即增强了我们对天蓝色链霉菌以及一般细菌分子机制的理解,包括细胞壁生物合成调控和链霉菌端粒维持。
蛋白质结构域家族的描绘能够对蛋白质功能进行详细分析,以及识别特别感兴趣的可能区域或残基。因此,这种前期方法可以提高实验室中的发现率。此外,我们证明,使用这种类型的计算机模拟方法,可以相当迅速地从以前不相关的数据中生成新的生物学信息。