Ba Abdullah Mohammed M, Palermo Richard D, Palser Anne L, Grayson Nicholas E, Kellam Paul, Correia Samantha, Szymula Agnieszka, White Robert E
Section of Virology, Imperial College Faculty of Medicine, St. Mary's Hospital, Norfolk Place, London, United Kingdom.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
J Virol. 2017 Nov 14;91(23). doi: 10.1128/JVI.00920-17. Print 2017 Dec 1.
Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified through both coevolution with its host and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging because of the large number and lengths of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat 1 of EBV (IR1; also known as the BamW repeats) for more than 70 strains. The diversity of the latency protein EBV nuclear antigen leader protein (EBNA-LP) resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 open reading frame (ORF) is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp) and one zone upstream of and two within BWRF1. IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as from spontaneous mutation, with interstrain recombination being more common in tumor-derived viruses. This genetic exchange often incorporates regions of <1 kb, and allelic gene conversion changes the frequency of small regions within the repeat but not close to the flanks. These observations suggest that IR1-and, by extension, EBV-diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four nonconsensus variants within a single IR1 repeat unit, including a stop codon in the EBNA-LP gene. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 bacterial artificial chromosome (BAC). Epstein-Barr virus (EBV) infects the majority of the world population but causes illness in only a small minority of people. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity to see if different strains have different disease impacts have excluded regions of repeating sequence, as they are more technically challenging. Here we analyze the sequence of the largest repeat in EBV (IR1). We first characterized the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and we suggest that tumor-associated viruses may be more likely to contain DNA mixed from two strains. The patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage.
爱泼斯坦-巴尔病毒(EBV)是一种广泛存在于人类中的病原体,可引发多种类型的淋巴瘤和癌。与其他疱疹病毒一样,EBV通过与宿主的共同进化以及病毒株之间的基因交换实现了多样化。由于病毒内重复区域的数量众多且长度各异,对EBV基因组进行序列分析极具挑战性。在此,我们描述了70多个毒株的EBV大内部重复序列1(IR1;也称为BamW重复序列)的序列组装和分析。潜伏蛋白EBV核抗原前导蛋白(EBNA-LP)的多样性主要存在于IR1下游的外显子中。假定的BWRF1开放阅读框(ORF)在超过80%的毒株中保持完整,截断IR1的缺失总是会保留BWRF1。保守区域包括IR1潜伏启动子(Wp)以及BWRF1上游的一个区域和其内部的两个区域。70%的毒株中IR1是异质的,这种异质性源于毒株间的序列交换以及自发突变,其中株间重组在肿瘤衍生病毒中更为常见。这种基因交换通常包含小于1 kb的区域,等位基因转换改变了重复序列内小区域的频率,但不影响侧翼区域。这些观察结果表明,IR1以及由此延伸的EBV通过重组和断点修复实现多样化,而IR1的协同进化由小区域的基因转换驱动。最后,EBV原型毒株B95-8在单个IR1重复单元内包含四个非共识变体,包括EBNA-LP基因中的一个终止密码子。修复IR1可提高EBNA-LP水平以及B95-8细菌人工染色体(BAC)的转化质量。爱泼斯坦-巴尔病毒(EBV)感染了世界上大多数人口,但仅在少数人身上引发疾病。然而,全球超过1%的癌症归因于EBV。最近的测序项目在研究病毒多样性以确定不同毒株是否有不同的疾病影响时,排除了重复序列区域,因为这些区域在技术上更具挑战性。在此,我们分析了EBV中最大的重复序列(IR1)的序列。我们首先对IR1编码的蛋白质序列变异进行了表征。在研究每个毒株重复序列内的变异时,我们在EBV的主要实验室毒株中发现了一个损害病毒功能的突变,并提出肿瘤相关病毒可能更有可能包含来自两个毒株的混合DNA。这种混合模式表明,序列可以通过从另一个毒株(或重复单元)复制序列来修复DNA损伤,从而在毒株之间(以及重复序列内)传播。