Galat Andrzej
Département d'Ingénierie et d'Etudes des Protéines, DSV/CEA, Gif-sur-Yvette Cedex, France.
Proteins. 2004 Sep 1;56(4):808-20. doi: 10.1002/prot.20156.
The 18 kDa archetypal cyclosporin-A binding protein, cyclophilin-A, has multiple paralogues in the human genome. Only 18 of those paralogues have been detected as mRNAs or proteins whose masses vary from 18 to 354 kDa, whereas the functional significance of the open reading frames (ORFs) encoding other paralogues of cyclophilin-A remains unknown. The genomes of Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Schizosaccharomyces pombe, and Saccharomyces cerevisiae encode different numbers of the cyclophilin paralogues, some of which are orthologous to the human cyclophilins. A library of novel algorithms was developed and used for computation of the conservation levels for hydrophobicity and bulkiness profiles, and amino acid compositions (AACs) of 303 aligned sequences of cyclophilins. The majority of the paralogues and orthologues encoded in these 6 genomes differ considerably from each other. Some of the orthologues and paralogues have high correlation coefficients (CCFs) for pairwise compared hydrophobicity and bulkiness profiles, and whose AACs differ to a low degree. Convergence of these three properties of the polypeptide chain and apparent conservation of the typical sequence hallmarks and parameters allowed for the clustering of the functionally related orthologues and paralogues of the cyclophilins. The clustering method allowed for sorting out the cyclophilins into several distinct classes. Analyses of the overlapping clusters of sequences permitted delineation of some hypothetical pathways that might have led to the creation of certain paralogues of cyclophilins in the eukaryotic genomes.
18 kDa的原型环孢菌素A结合蛋白亲环蛋白A在人类基因组中有多个旁系同源物。这些旁系同源物中只有18种被检测为mRNA或蛋白质,其质量从18 kDa到354 kDa不等,而编码亲环蛋白A其他旁系同源物的开放阅读框(ORF)的功能意义仍然未知。黑腹果蝇、秀丽隐杆线虫、拟南芥、粟酒裂殖酵母和酿酒酵母的基因组编码不同数量的亲环蛋白旁系同源物,其中一些与人类亲环蛋白是直系同源的。开发了一个新算法库,并用于计算303条亲环蛋白比对序列的疏水性、体积分布和亲氨基酸组成(AAC)的保守水平。这6个基因组中编码的大多数旁系同源物和直系同源物彼此差异很大。一些直系同源物和旁系同源物在成对比较的疏水性和体积分布方面具有较高的相关系数(CCF),且其AAC差异程度较低。多肽链这三种特性的趋同以及典型序列特征和参数的明显保守使得亲环蛋白的功能相关直系同源物和旁系同源物能够聚类。聚类方法允许将亲环蛋白分为几个不同的类别。对重叠序列簇的分析有助于描绘一些可能导致真核基因组中某些亲环蛋白旁系同源物产生的假设途径。