Suppr超能文献

病毒宏基因组:病毒宏基因组序列分析的标准操作规程

VIROME: a standard operating procedure for analysis of viral metagenome sequences.

作者信息

Wommack K Eric, Bhavsar Jaysheel, Polson Shawn W, Chen Jing, Dumas Michael, Srinivasiah Sharath, Furman Megan, Jamindar Sanchita, Nasko Daniel J

机构信息

Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711.

出版信息

Stand Genomic Sci. 2012 Jul 30;6(3):427-39. doi: 10.4056/sigs.2945050. Epub 2012 Jul 27.

Abstract

One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses.

摘要

在使用鸟枪法宏基因组学分析整个病毒群落的研究中,一个一致的发现是,大多数病毒序列与已知序列没有显著的同源性。因此,基于诸如GenBank nr等序列集合进行的生物信息学分析,这些集合主要由来自已知生物体的序列组成,往往会忽略大多数鸟枪法病毒宏基因组文库中的大部分序列。在这里,我们描述了一种生物信息学流程,即宏基因组探索病毒信息资源(VIROME),它强调基于对已知序列和环境序列的同源性搜索结果对病毒宏基因组序列(预测的开放阅读框)进行分类。功能和分类信息来自五个注释序列数据库,这些数据库与UniRef 100数据库相关联。环境分类是通过与一个自定义数据库MetaGenomes On-Line的比对获得的,该数据库包含4900万个预测的环境肽段。每个通过VIROME流程运行的预测病毒宏基因组开放阅读框被归入七个开放阅读框类别之一,因此,每个序列都获得了有意义的注释。此外,该流程还包括质量控制措施,以去除污染和质量差的序列,并通过筛选rRNA基因评估病毒宏基因组文库中细胞DNA污染的潜在量。通过一个与关系型后端数据库动态链接的网络应用程序界面,可以访问VIROME流程和分析结果。VIROME网络应用程序界面旨在让用户灵活检索序列(读数、开放阅读框、预测肽段)和搜索结果,以便进行重点二次分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec51/3558967/b0189b9068f3/sigs.2945050-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验