Smith G L, Chan Y S, Howard S T
Sir William Dunn School of Pathology, University of Oxford, U.K.
J Gen Virol. 1991 Jun;72 ( Pt 6):1349-76. doi: 10.1099/0022-1317-72-6-1349.
The nucleotide sequence of 42090 bp of vaccinia virus strain WR is presented. The sequence includes the SalI L, F, G and I fragments and starts near the centre of the HindIII A fragment and extends rightwards towards the genomic terminus, finishing approximately 0.5 kb internal of the inverted terminal repeat (ITR). Translation of this region has identified 65 open reading frames (ORFs) of greater than 65 amino acids in length. Fifty-one of these which do not extensively overlap other larger ORFs have been subjected to further analysis; the other 14 are termed minor ORFs. In the rightmost 28.7 kb, the genes are, with one exception, transcribed towards the genomic terminus, similar to the arrangement of genes at the left end of the virus genome. Internal of this region the genes are expressed off either DNA strand but still predominately rightwards. ORFs are tightly packed with few intergenic non-coding regions of greater than 250 bp. Protein sequence comparisons have established a remarkably high number of homologies with entries in existing protein databases. Of these, DNA ligase, thymidylate kinase, two serine-threonine protein kinases, two serine proteinase inhibitors (serpins), two interleukin-1 receptor homologous and a discontinuous ORF related to tumour necrosis factor receptor have been reported. Other homologies include lectins, profilin, 3 beta-hydroxy steroid dehydrogenase, superoxide dismutase, guanylate kinase, ankyrin and complement factor H. In addition, there are a number of polypeptides with predicted properties of membrane-associated, secretory or glyco-proteins. Twelve gene families are described here and elsewhere. There is considerable similarity between genes from the right and left end of the virus genome that may have arisen by terminal transposition events. Several differences from the corresponding region of vaccinia virus strain Copenhagen sequence are noted. Near the right terminus the sequences diverge completely, and internal of this there are multiple examples of deletion of short sequences (eight to 10 nucleotides) that lie within penta- or hexanucleotide direct repeats.
本文给出了痘苗病毒WR株42090 bp的核苷酸序列。该序列包括SalI L、F、G和I片段,起始于HindIII A片段的中心附近,向右延伸至基因组末端,在反向末端重复序列(ITR)内部约0.5 kb处结束。该区域的翻译确定了65个长度大于65个氨基酸的开放阅读框(ORF)。其中51个与其他较大的ORF没有广泛重叠,已进行了进一步分析;另外14个被称为小ORF。在最右侧的28.7 kb区域内,除一个例外,基因均朝着基因组末端转录,这与病毒基因组左端的基因排列方式相似。在该区域内部,基因从两条DNA链上表达,但仍然主要是向右表达。ORF紧密排列,很少有大于250 bp的基因间非编码区。蛋白质序列比较表明,与现有蛋白质数据库中的条目有大量的同源性。其中,已报道了DNA连接酶、胸苷酸激酶、两种丝氨酸 - 苏氨酸蛋白激酶、两种丝氨酸蛋白酶抑制剂(丝氨酸蛋白酶抑制剂)、两种白细胞介素 - 1受体同源物以及一个与肿瘤坏死因子受体相关的不连续ORF。其他同源物包括凝集素、肌动蛋白结合蛋白、3β - 羟基类固醇脱氢酶、超氧化物歧化酶、鸟苷酸激酶、锚蛋白和补体因子H。此外,还有一些具有预测的膜相关、分泌或糖蛋白特性的多肽。本文及其他地方描述了12个基因家族。病毒基因组左右两端的基因之间存在相当大的相似性,这可能是由末端转座事件引起的。还注意到与痘苗病毒哥本哈根株序列的相应区域存在一些差异。在右端附近,序列完全不同,在此内部有多个短序列(8至10个核苷酸)缺失的例子,这些短序列位于五核苷酸或六核苷酸直接重复序列内。