Kuchibhatla Durga B, Sherman Westley A, Chung Betty Y W, Cook Shelley, Schneider Georg, Eisenhaber Birgit, Karlin David G
Bioinformatics Institute (BII), A*STAR (Agency for Science, Technology and Research), Matrix, Singapore.
J Virol. 2014 Jan;88(1):10-20. doi: 10.1128/JVI.02595-13. Epub 2013 Oct 23.
The genome sequences of new viruses often contain many "orphan" or "taxon-specific" proteins apparently lacking homologs. However, because viral proteins evolve very fast, commonly used sequence similarity detection methods such as BLAST may overlook homologs. We analyzed a data set of proteins from RNA viruses characterized as "genus specific" by BLAST. More powerful methods developed recently, such as HHblits or HHpred (available through web-based, user-friendly interfaces), could detect distant homologs of a quarter of these proteins, suggesting that these methods should be used to annotate viral genomes. In-depth manual analyses of a subset of the remaining sequences, guided by contextual information such as taxonomy, gene order, or domain cooccurrence, identified distant homologs of another third. Thus, a combination of powerful automated methods and manual analyses can uncover distant homologs of many proteins thought to be orphans. We expect these methodological results to be also applicable to cellular organisms, since they generally evolve much more slowly than RNA viruses. As an application, we reanalyzed the genome of a bee pathogen, Chronic bee paralysis virus (CBPV). We could identify homologs of most of its proteins thought to be orphans; in each case, identifying homologs provided functional clues. We discovered that CBPV encodes a domain homologous to the Alphavirus methyltransferase-guanylyltransferase; a putative membrane protein, SP24, with homologs in unrelated insect viruses and insect-transmitted plant viruses having different morphologies (cileviruses, higreviruses, blunerviruses, negeviruses); and a putative virion glycoprotein, ORF2, also found in negeviruses. SP24 and ORF2 are probably major structural components of the virions.
新病毒的基因组序列通常包含许多明显缺乏同源物的“孤儿”或“分类群特异性”蛋白质。然而,由于病毒蛋白进化非常快,常用的序列相似性检测方法(如BLAST)可能会忽略同源物。我们分析了一组通过BLAST被鉴定为“属特异性”的RNA病毒蛋白质数据集。最近开发的更强大的方法,如HHblits或HHpred(可通过基于网络的用户友好界面获得),可以检测到这些蛋白质中四分之一的远缘同源物,这表明这些方法应用于注释病毒基因组。在分类学、基因顺序或结构域共现等上下文信息的指导下,对其余序列的一个子集进行深入的人工分析,又鉴定出另外三分之一的远缘同源物。因此,强大的自动化方法和人工分析相结合,可以揭示许多被认为是孤儿的蛋白质的远缘同源物。我们预计这些方法学结果也适用于细胞生物,因为它们的进化速度通常比RNA病毒慢得多。作为一个应用,我们重新分析了一种蜜蜂病原体——慢性蜜蜂麻痹病毒(CBPV)的基因组。我们能够鉴定出其大多数被认为是孤儿的蛋白质的同源物;在每种情况下,鉴定同源物都提供了功能线索。我们发现CBPV编码一个与甲病毒甲基转移酶-鸟苷酸转移酶同源的结构域;一种假定的膜蛋白SP24,在形态不同的无关昆虫病毒和昆虫传播的植物病毒(卷曲病毒、高里病毒、蓝纳病毒、奈格病毒)中有同源物;以及一种假定的病毒粒子糖蛋白ORF2,也在奈格病毒中发现。SP24和ORF2可能是病毒粒子的主要结构成分。