Upton Chris, Slack Stephanie, Hunter Arwen L, Ehlers Angelika, Roper Rachel L
Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.
J Virol. 2003 Jul;77(13):7590-600. doi: 10.1128/jvi.77.13.7590-7600.2003.
Increasingly complex bioinformatic analysis is necessitated by the plethora of sequence information currently available. A total of 21 poxvirus genomes have now been completely sequenced and annotated, and many more genomes will be available in the next few years. First, we describe the creation of a database of continuously corrected and updated genome sequences and an easy-to-use and extremely powerful suite of software tools for the analysis of genomes, genes, and proteins. These tools are available free to all researchers and, in most cases, alleviate the need for using multiple Internet sites for analysis. Further, we describe the use of these programs to identify conserved families of genes (poxvirus orthologous clusters) and have named the software suite POCs, which is available at www.poxvirus.org. Using POCs, we have identified a set of 49 absolutely conserved gene families-those which are conserved between the highly diverged families of insect-infecting entomopoxviruses and vertebrate-infecting chordopoxviruses. An additional set of 41 gene families conserved in chordopoxviruses was also identified. Thus, 90 genes are completely conserved in chordopoxviruses and comprise the minimum essential genome, and these will make excellent drug, antibody, vaccine, and detection targets. Finally, we describe the use of these tools to identify necessary annotation and sequencing updates in poxvirus genomes. For example, using POCs, we identified 19 genes that were widely conserved in poxviruses but missing from the vaccinia virus strain Tian Tan 1998 GenBank file. We have reannotated and resequenced fragments of this genome and verified that these genes are conserved in Tian Tan. The results for poxvirus genes and genomes are discussed in light of evolutionary processes.
当前可用的大量序列信息使得生物信息学分析变得越来越复杂。目前已有21种痘病毒基因组完成了全序列测定和注释,未来几年还会有更多基因组信息公布。首先,我们描述了一个不断校正和更新的基因组序列数据库的创建,以及一套易于使用且功能强大的软件工具,用于基因组、基因和蛋白质分析。这些工具对所有研究人员免费开放,在大多数情况下,无需使用多个网站进行分析。此外,我们描述了如何使用这些程序来识别保守的基因家族(痘病毒直系同源簇),并将该软件套件命名为POCs,可在www.poxvirus.org获取。通过POCs,我们鉴定出一组49个绝对保守的基因家族,这些基因在感染昆虫的昆虫痘病毒和感染脊椎动物的脊索痘病毒这两个高度分化的家族之间是保守的。还鉴定出了另外一组在脊索痘病毒中保守的41个基因家族。因此,90个基因在脊索痘病毒中完全保守,构成了最小必需基因组,这些基因将成为出色的药物、抗体、疫苗和检测靶点。最后,我们描述了如何使用这些工具来识别痘病毒基因组中必要的注释和测序更新。例如,通过POCs,我们鉴定出19个在痘病毒中广泛保守但在痘苗病毒天坛株1998年GenBank文件中缺失的基因。我们对该基因组的片段进行了重新注释和重新测序,并验证了这些基因在天坛株中是保守的。结合进化过程对痘病毒基因和基因组的结果进行了讨论。