Suppr超能文献

病毒基因组的整理:挑战、应用及未来发展方向

Curation of viral genomes: challenges, applications and the way forward.

作者信息

Kulkarni-Kale Urmila, Bhosle Shriram G, Manjari G Sunitha, Joshi Manali, Bansode Sandeep, Kolaskar Ashok S

机构信息

Bioinformatics Centre, University of Pune, Pune 411 007 India.

出版信息

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S12. doi: 10.1186/1471-2105-7-S5-S12.

Abstract

BACKGROUND

Whole genome sequence data is a step towards generating the 'parts list' of life to understand the underlying principles of Biocomplexity. Genome sequencing initiatives of human and model organisms are targeted efforts towards understanding principles of evolution with an application envisaged to improve human health. These efforts culminated in the development of dedicated resources. Whereas a large number of viral genomes have been sequenced by groups or individuals with an interest to study antigenic variation amongst strains and species. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of genomes. However, due to lack of concerted efforts, viral genomic sequences merely remained as entries in the public repositories until recently.

RESULTS

VirGen is a curated resource of viral genomes and their analyses. Since its first release, it has grown both in terms of coverage of viral families and development of new modules for annotation and analysis. The current release (2.0) includes data for twenty-five families with broad host range as against eight in the first release. The taxonomic description of viruses in VirGen is in accordance with the ICTV nomenclature. A well-characterised strain is identified as a 'representative entry' for every viral species. This non-redundant dataset is used for subsequent annotation and analyses using sequenced-based Bioinformatics approaches. VirGen archives precomputed data on genome and proteome comparisons. A new data module that provides structures of viral proteins available in PDB has been incorporated recently. One of the unique features of VirGen is predicted conformational and sequential epitopes of known antigenic proteins using in-house developed algorithms, a step towards reverse vaccinology.

CONCLUSION

Structured organization of genomic data facilitates use of data mining tools, which provides opportunities for knowledge discovery. One of the approaches to achieve this goal is to carry out functional annotations using comparative genomics. VirGen, a comprehensive viral genome resource that serves as an annotation and analysis pipeline has been developed for the curation of public domain viral genome data http://bioinfo.ernet.in/virgen/virgen.html. Various steps in the curation and annotation of the genomic data and applications of the value-added derived data are substantiated with case studies.

摘要

背景

全基因组序列数据是朝着生成生命的“部件清单”迈出的一步,以了解生物复杂性的潜在原理。人类和模式生物的基因组测序计划是旨在理解进化原理并期望应用于改善人类健康的目标努力。这些努力最终促成了专门资源的开发。而大量病毒基因组是由有兴趣研究毒株和物种间抗原变异的团队或个人进行测序的。这些独立的努力使病毒凭借数量最多的基因组获得了“代表性最强的分类群”地位。然而,由于缺乏协同努力,直到最近病毒基因组序列仅仅只是公共数据库中的条目。

结果

VirGen是一个经过整理的病毒基因组及其分析资源。自首次发布以来,它在病毒家族覆盖范围以及用于注释和分析的新模块开发方面都有所增长。当前版本(2.0)包含了25个宿主范围广泛的病毒家族的数据,而首次发布时只有8个。VirGen中病毒的分类描述符合国际病毒分类委员会(ICTV)的命名法。每个病毒物种都有一个特征明确的毒株被确定为“代表性条目”。这个非冗余数据集用于后续基于测序的生物信息学方法的注释和分析。VirGen存档了关于基因组和蛋白质组比较的预计算数据。最近纳入了一个新的数据模块,该模块提供蛋白质数据银行(PDB)中可用的病毒蛋白结构。VirGen的独特特征之一是使用内部开发的算法预测已知抗原蛋白的构象和序列表位,这是朝着反向疫苗学迈出的一步。

结论

基因组数据的结构化组织便于使用数据挖掘工具,这为知识发现提供了机会。实现这一目标的方法之一是使用比较基因组学进行功能注释。已经开发了VirGen,这是一个全面的病毒基因组资源,用作注释和分析管道,用于整理公共领域的病毒基因组数据http://bioinfo.ernet.in/virgen/virgen.html。通过案例研究证实了基因组数据整理和注释的各个步骤以及增值衍生数据的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51f6/1764468/b55fad39a53a/1471-2105-7-S5-S12-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验