Gerstein M, Lin J, Hegyi H
Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, USA.
Pac Symp Biocomput. 2000:30-41. doi: 10.1142/9789814447331_0004.
We survey the protein folds in the worm genome, using pairwise and multiple-sequence comparison methods (i.e. FASTA and PSI-blast). Overall, we find that approximately 250 folds match approximately 8000 domains in approximately 4500 ORFs, about 32 matches per fold involving a quarter of the total worm ORFs. We compare the folds in the worm genome to those in other model organisms, in particular yeast and E. coli, and find that the worm shares more folds with the phylogenetically closer yeast than with E. coli. There appear to be 36 folds unique to the worm compared to these two model organisms, and many of these are obviously implicated in aspects of multicellularity. The most common fold in the worm genome is the immunoglobulin fold, and many of the common folds are repeated in various combinations and permutations in multidomain proteins. In addition, an approach is presented for the identification of "sure" and "marginal" membrane proteins. When applied to the worm genome, this reveals a much greater relative prevalence of proteins with seven transmembrane helices in comparison to the other completely sequenced genomes, which are not of metazoans. Combining these analyses with some other simple filters allows one to identify ORFs that potentially code for soluble proteins of unknown fold, which may be promising targets for experimental investigation in structural genomics. A regularly updated worm fold analysis will be available from bioinfo.mbb.yale.edu/genome/worm.
我们使用成对和多序列比较方法(即FASTA和PSI-blast)对线虫基因组中的蛋白质折叠进行了研究。总体而言,我们发现约250种折叠与约4500个开放阅读框(ORF)中的约8000个结构域相匹配,每个折叠约有32个匹配,涉及线虫ORF总数的四分之一。我们将线虫基因组中的折叠与其他模式生物(特别是酵母和大肠杆菌)中的折叠进行了比较,发现线虫与系统发育关系更近的酵母共享的折叠比与大肠杆菌更多。与这两种模式生物相比,线虫似乎有36种独特的折叠,其中许多显然与多细胞性的各个方面有关。线虫基因组中最常见的折叠是免疫球蛋白折叠,许多常见折叠在多结构域蛋白质中以各种组合和排列形式重复出现。此外,还提出了一种识别“确定”和“边缘”膜蛋白的方法。将该方法应用于线虫基因组时,发现与其他非后生动物的完全测序基因组相比,具有七个跨膜螺旋的蛋白质相对丰度要高得多。将这些分析与其他一些简单筛选相结合,可以识别出可能编码未知折叠可溶性蛋白质的ORF,这些可能是结构基因组学实验研究的有前景的目标。线虫折叠分析的定期更新版本可从bioinfo.mbb.yale.edu/genome/worm获取。