Remm M, Sonnhammer E
Center for Genomics Research, Karolinska Institute, Stockholm, 17177 Sweden.
Genome Res. 2000 Nov;10(11):1679-89. doi: 10.1101/gr.gr-1491r.
The complete genome sequence of the nematode Caenorhabditis elegans provides an excellent basis for studying the distribution and evolution of protein families in higher eukaryotes. Three fundamental questions are as follows: How many paralog clusters exist in one species, how many of these are shared with other species, and how many proteins can be assigned a functional counterpart in other species? We have addressed these questions in a detailed study of predicted membrane proteins in C. elegans and their mammalian homologs. All worm proteins predicted to contain at least two transmembrane segments were clustered on the basis of sequence similarity. This resulted in 189 groups with two or more sequences, containing, in total, 2647 worm proteins. Hidden Markov models (HMMs) were created for each family, and were used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS databases. About one-half of these clusters had mammalian homologs. Putative worm-mammalian orthologs were extracted by use of nine different phylogenetic methods and BLAST. Eight clusters initially thought to be worm-specific were assigned mammalian homologs after searching EST and genomic sequences. A compilation of 174 orthology assignments made with high confidence is presented.
线虫秀丽隐杆线虫的完整基因组序列为研究高等真核生物中蛋白质家族的分布和进化提供了绝佳的基础。三个基本问题如下:一个物种中存在多少个旁系同源簇,其中有多少与其他物种共享,以及有多少蛋白质在其他物种中可以找到功能对应物?我们在对秀丽隐杆线虫中预测的膜蛋白及其哺乳动物同源物的详细研究中解决了这些问题。所有预测至少含有两个跨膜区段的线虫蛋白根据序列相似性进行聚类。这产生了189个包含两个或更多序列的组,总共包含2647个线虫蛋白。为每个家族创建了隐马尔可夫模型(HMM),并用于从SWISSPROT、TREMBL和VTS数据库中检索哺乳动物同源物。这些簇中约有一半有哺乳动物同源物。通过使用九种不同的系统发育方法和BLAST提取推定的线虫-哺乳动物直系同源物。在搜索EST和基因组序列后,最初被认为是线虫特有的八个簇被赋予了哺乳动物同源物。本文给出了174个高可信度直系同源关系的汇编。