Takkinen K, Vidgren G, Ekstrand J, Hellman U, Kalkkinen N, Wernstedt C, Pettersson R F
Recombinant DNA Laboratory, University of Helsinki, Finland.
J Gen Virol. 1988 Mar;69 ( Pt 3):603-12. doi: 10.1099/0022-1317-69-3-603.
The nucleotide sequence of the rubella virus capsid protein (C) gene has been determined from a cDNA clone derived from the 40S genomic RNA. The sequence covers the coding region of the C protein (831 nucleotides), 70 nucleotides of the 5' untranslated region, and the 5' end of the downstream E2 membrane protein gene. The capsid gene is unusually rich in C (41.6%) and G (31.2%) residues (G + C 72.8%), and poor in A (15.4%) and U residues (11.8%). There are regions with long runs of up to 45% C or 35% G residues. The codon usage is non-random, with a strong preference for C and G residues in the third position. Starting from two in-frame AUG codons (seven amino acid residues apart) an open reading frame (ORF) was identified that extended in frame into the ORF coding for the downstream E2 membrane protein gene. Since the amino terminus of the capsid protein is blocked, we could not determine which of the AUGs serve as the initiating codon. To verify that the deduced ORF was correct, we have determined the amino acid sequence of 13 tryptic peptides corresponding to one-third of the C protein. Our data show that the C protein is about 277 residues in length (Mr about 30750). It is very hydrophilic and rich in prolines (14.1%) and arginines (14.4%). Clusters of these amino acids are concentrated in the amino-terminal third of the C protein. No sequence homology to the capsid protein of several alphaviruses was observed. Together with our previous sequence data we have now completed the sequence of the genes coding for the structural proteins C, E2 and E1 of rubella virus.
风疹病毒衣壳蛋白(C)基因的核苷酸序列已从一个源自40S基因组RNA的cDNA克隆中测定出来。该序列涵盖了C蛋白的编码区(831个核苷酸)、5'非翻译区的70个核苷酸以及下游E2膜蛋白基因的5'端。衣壳基因中C(41.6%)和G(31.2%)残基异常丰富(G + C为72.8%),而A(15.4%)和U残基(11.8%)较少。存在一些区域,其中C或G残基的连续出现率高达45%或35%。密码子使用是非随机的,在第三位强烈偏好C和G残基。从两个框内AUG密码子(相隔七个氨基酸残基)开始,鉴定出一个开放阅读框(ORF),它框内延伸至编码下游E2膜蛋白基因的ORF中。由于衣壳蛋白的氨基末端被封闭,我们无法确定哪个AUG作为起始密码子。为了验证推导的ORF是否正确,我们测定了与C蛋白三分之一相对应的13个胰蛋白酶肽段的氨基酸序列。我们的数据表明,C蛋白长度约为277个残基(Mr约为30750)。它非常亲水,富含脯氨酸(14.1%)和精氨酸(14.4%)。这些氨基酸簇集中在C蛋白的氨基末端三分之一处。未观察到与几种甲病毒衣壳蛋白的序列同源性。连同我们之前的序列数据,我们现在已经完成了风疹病毒结构蛋白C、E2和E1编码基因的序列测定。