Sonnhammer E L, Durbin R
Sanger Centre, Cambridge, United Kingdom.
Genomics. 1997 Dec 1;46(2):200-16. doi: 10.1006/geno.1997.4989.
The Caenorhabditis elegans genome sequencing project has completed over half of this nematode's 100-Mb genome. Proteins predicted in the finished sequence have been compiled and released in the data-base Wormpep. Presented here is a comprehensive analysis of protein domain families in Wormpep 11, which comprises 7299 proteins. The relative abundance of common protein domain families was counted by comparing all Wormpep proteins to the Pfam collection of protein families, which is based on recognition by hidden Markov models. This analysis also identified a number of previously unannotated domains. To investigate new apparently nematode-specific protein families, Wormpep was clustered into domain families on the basis of sequence similarity using the Domainer program. The largest clusters that lacked clear homology to proteins outside Nematoda were analyzed in further detail, after which some could be assigned a putative function. We compared all proteins in Wormpep 11 to proteins in the human, Saccharomyces cerevisiae, and Haemophilus influenzae genomes. Among the results are the estimation that over two-thirds of the currently known human proteins are likely to have a homologue in the whole C. elegans genome and that a significant number of proteins are well conserved between C. elegans and H. influenzae, that are not found in S. cerevisiae.
秀丽隐杆线虫基因组测序项目已完成了这种线虫100兆碱基基因组的一半以上。已对完成序列中预测的蛋白质进行了汇编,并在Wormpep数据库中发布。本文展示了对Wormpep 11中蛋白质结构域家族的全面分析,该数据库包含7299种蛋白质。通过将所有Wormpep蛋白质与基于隐马尔可夫模型识别的蛋白质家族Pfam集合进行比较,统计了常见蛋白质结构域家族的相对丰度。该分析还鉴定出了一些以前未注释的结构域。为了研究新的明显线虫特异性蛋白质家族,使用Domainer程序根据序列相似性将Wormpep聚类为结构域家族。对与线虫以外的蛋白质缺乏明显同源性的最大聚类进行了更详细的分析,之后一些聚类可以被赋予推定功能。我们将Wormpep 11中的所有蛋白质与人类、酿酒酵母和流感嗜血杆菌基因组中的蛋白质进行了比较。结果包括估计目前已知的人类蛋白质中有超过三分之二可能在整个秀丽隐杆线虫基因组中有同源物,并且大量蛋白质在秀丽隐杆线虫和流感嗜血杆菌之间高度保守,而在酿酒酵母中未发现。