Teichmann S A, Park J, Chothia C
Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom.
Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14658-63. doi: 10.1073/pnas.95.25.14658.
The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs PSI-BLAST and GEANFAMMER to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.
寄生细菌生殖支原体拥有一个小的、精简的基因组,带有近乎一套基本的基因。作为确定构成这些基因产物的蛋白质结构域家族的第一步,我们使用了多重序列程序PSI-BLAST和GEANFAMMER,将生殖支原体467个基因产物的序列与构成已知结构蛋白质的结构域序列[蛋白质数据库(PDB)序列]进行匹配。PDB序列(274个)与106个生殖支原体序列全部匹配,与另外85个序列的部分区域匹配;因此,其总序列的41%全部或部分得到了匹配。与生殖支原体匹配的PDB结构域的进化关系在蛋白质结构分类(SCOP)数据库中有描述。利用这些信息,我们表明,匹配的生殖支原体序列中的结构域来自114个超家族,其中58%通过基因复制产生。这种复制水平是通过成对序列比较所发现水平的两倍多。PDB结构域匹配还描述了匹配序列的结构域结构:略超过四分之一包含一个结构域,其余的具有两个或更多结构域的组合。