Mariño-Ramírez Leonardo, Minor Jonathan L, Reading Nicola, Hu James C
Department of Biochemistry and Biophysics and Center for Advanced Biomolecular Research, Texas A&M University, College Station, Texas 77843-2128, USA.
J Bacteriol. 2004 Mar;186(5):1311-9. doi: 10.1128/JB.186.5.1311-1319.2004.
Self-assembling proteins and protein fragments encoded by the Escherichia coli genome were identified from E. coli K-12 strain MG1655. Libraries of random DNA fragments cloned into a series of lambda repressor fusion vectors were subjected to selection for immunity to infection by phage lambda. Survivors were identified by sequencing the ends of the inserts, and the fused protein sequence was inferred from the known genomic sequence. Four hundred sixty-three nonredundant open reading frame-encoded interacting sequence tags (ISTs) were recovered from sequencing 2,089 candidates. These ISTs, which range from 16 to 794 amino acids in length, were clustered into families of overlapping fragments, identifying potential homotypic interactions encoded by 232 E. coli genes. Repressor fusions identified ISTs from genes in every protein-based functional category, but membrane proteins were underrepresented. The IST-containing genes were enriched for regulatory proteins and for proteins that form higher-order oligomers. Forty-eight (20.7%) homotypic proteins identified by ISTs are predicted to contain coiled coils. Although most of the IST-containing genes are identifiably related to proteins in other bacterial genomes, more than half of the ISTs do not have identifiable homologs in the Protein Data Bank, suggesting that they may include many novel structures. The data are available online at http://oligomers.tamu.edu/.
从大肠杆菌K-12菌株MG1655中鉴定出由大肠杆菌基因组编码的自组装蛋白和蛋白片段。将克隆到一系列λ阻遏物融合载体中的随机DNA片段文库进行筛选,以获得对λ噬菌体感染的免疫性。通过对插入片段末端进行测序来鉴定存活者,并根据已知的基因组序列推断融合蛋白序列。从对2089个候选物的测序中回收了463个非冗余开放阅读框编码的相互作用序列标签(IST)。这些IST的长度从16到794个氨基酸不等,被聚类成重叠片段家族,从而鉴定出由232个大肠杆菌基因编码的潜在同型相互作用。阻遏物融合从基于蛋白质的每个功能类别中的基因中鉴定出IST,但膜蛋白的代表性不足。含IST的基因富含调节蛋白和形成高阶寡聚体的蛋白。通过IST鉴定出的48个(20.7%)同型蛋白预计含有卷曲螺旋。虽然大多数含IST的基因与其他细菌基因组中的蛋白有明显的相关性,但超过一半的IST在蛋白质数据库中没有可识别的同源物,这表明它们可能包括许多新结构。数据可在http://oligomers.tamu.edu/在线获取。