VADR：病毒序列提交到 GenBank 的验证和注释。

VADR: validation and annotation of virus sequence submissions to GenBank.

机构信息

Cancer Data Science Laboratory National, Cancer Insitute, National Institutes of Health, Bethesda, 20892, MD, USA.

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, MD, USA.

出版信息

BMC Bioinformatics. 2020 May 24;21(1):211. doi: 10.1186/s12859-020-3537-3.

DOI:10.1186/s12859-020-3537-3

PMID:32448124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7245624/

Abstract

BACKGROUND

GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions.

RESULTS

We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of "alerts" that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank's submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Since March 2020, VADR has also been used to check SARS-CoV-2 sequence submissions. Other viruses with high numbers of submissions will be added incrementally.

CONCLUSION

VADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.

摘要

背景

GenBank 中包含超过 300 万条病毒序列。美国国家生物技术信息中心（NCBI）曾提供了一种用于验证和注释流感病毒序列的工具，用于检查向 GenBank 的提交内容。在此项目之前，对于非流感病毒序列的提交，没有类似的工具在使用。

结果

我们开发了一个名为 VADR（病毒注释定义器）的系统，用于验证和注释 GenBank 提交中的病毒序列。注释系统基于使用从已审定 RefSeq 构建的模型对输入核苷酸序列的分析。隐马尔可夫模型用于通过确定与 RefSeq 最相似的 RefSeq 来对序列进行分类，并且基于全长序列与协方差模型的核苷酸比对，将 RefSeq 的特征注释映射。通过使用 BLAST 进行核苷酸到蛋白质的比对来验证序列编码的预测蛋白质。该系统确定了 43 种“警报”类型，这些警报（与以前的基于 BLAST 的系统不同）为提交具有意外特征的序列的研究人员提供了确定性和严格的反馈。VADR 已集成到 GenBank 的提交处理管道中，允许通过所有测试的病毒提交自动被接受和自动注释，而无需任何人工（GenBank 索引器）干预。与以前的提交检查系统不同，VADR 可免费使用（https://github.com/nawrockie/vadr），可进行本地安装和使用。自 2018 年 5 月以来，VADR 一直用于诺如病毒提交，自 2019 年 1 月以来，VADR 一直用于登革热病毒提交。自 2020 年 3 月以来，VADR 也用于检查 SARS-CoV-2 序列提交。其他提交数量较高的病毒将逐步添加。

结论

VADR 提高了检查 GenBank 中非流感病毒提交的速度，并提高了 GenBank 注释的内容和质量。软件的可用性和可移植性允许研究人员在提交病毒序列之前运行 GenBank 检查，从而有信心他们的提交将立即被接受，而无需与 GenBank 工作人员联系。反过来，VADR 的采用使 GenBank 工作人员可以腾出更多时间用于除检查常规病毒序列提交之外的其他服务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c51a/7245941/eb07ac2d3481/12859_2020_3537_Fig1_HTML.jpg

相似文献

VADR: validation and annotation of virus sequence submissions to GenBank.VADR：病毒序列提交到 GenBank 的验证和注释。

BMC Bioinformatics. 2020 May 24;21(1):211. doi: 10.1186/s12859-020-3537-3.

Influenza sequence validation and annotation using VADR.使用 VADR 进行流感序列验证和注释。

Database (Oxford). 2024 Sep 19;2024. doi: 10.1093/database/baae091.

Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR.使用VADR对GenBank进行更快的SARS-CoV-2序列验证和注释。

bioRxiv. 2022 Apr 27:2022.04.25.489427. doi: 10.1101/2022.04.25.489427.

Influenza sequence validation and annotation using VADR.使用VADR进行流感序列验证和注释。

bioRxiv. 2024 Mar 25:2024.03.21.585980. doi: 10.1101/2024.03.21.585980.

VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank.VAPiD：一个轻量级跨平台病毒注释管道和鉴定工具，可方便病毒基因组提交到 NCBI GenBank。

BMC Bioinformatics. 2019 Jan 23;20(1):48. doi: 10.1186/s12859-019-2606-y.

Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation.核糖体 RNA 序列分析用于 GenBank 提交和数据库管理。

BMC Bioinformatics. 2021 Aug 12;22(1):400. doi: 10.1186/s12859-021-04316-z.

Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR.使用VADR对GenBank进行更快的SARS-CoV-2序列验证和注释。

NAR Genom Bioinform. 2023 Jan 20;5(1):lqad002. doi: 10.1093/nargab/lqad002. eCollection 2023 Mar.

Rapid automated validation, annotation and publication of SARS-CoV-2 sequences to GenBank.快速自动化验证、注释和向 GenBank 发布 SARS-CoV-2 序列。

Database (Oxford). 2022 Mar 1;2022. doi: 10.1093/database/baac006.

CoV-Seq, a New Tool for SARS-CoV-2 Genome Analysis and Visualization: Development and Usability Study.CoV-Seq，一种用于SARS-CoV-2基因组分析和可视化的新工具：开发与可用性研究

J Med Internet Res. 2020 Oct 2;22(10):e22299. doi: 10.2196/22299.

GenBank.GenBank

Nucleic Acids Res. 2021 Jan 8;49(D1):D92-D96. doi: 10.1093/nar/gkaa1023.

引用本文的文献

WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.WaveSeekerNet：基于注意力机制的深度学习对甲型流感病毒亚型和宿主来源的准确预测

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf089.

Automated Annotation and Validation of Human Respiratory Virus Sequences using VADR.使用VADR对人类呼吸道病毒序列进行自动注释和验证

bioRxiv. 2025 Aug 11:2025.08.07.669219. doi: 10.1101/2025.08.07.669219.

Longitudinal dynamics of the nasopharyngeal microbiome in response to SARS-CoV-2 Omicron variant and HIV infection in Kenyan women and their children.肯尼亚妇女及其子女的鼻咽微生物群对新冠病毒奥密克戎变种和艾滋病毒感染的纵向动态变化。

mSystems. 2025 May 20;10(5):e0156824. doi: 10.1128/msystems.01568-24. Epub 2025 Apr 22.

Computational tools and data integration to accelerate vaccine development: challenges, opportunities, and future directions.加速疫苗开发的计算工具与数据整合：挑战、机遇及未来方向

Front Immunol. 2025 Mar 7;16:1502484. doi: 10.3389/fimmu.2025.1502484. eCollection 2025.

VITALdb: to select the best viroinformatics tools for a desired virus or application.VITALdb：为所需病毒或应用选择最佳的病毒信息学工具。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf084.

The Presence of Two Distinct Lineages of the Foot-And-Mouth Disease Virus Type A in Russia in 2013-2014 Has Significant Implications for the Epidemiology of the Virus in the Region.2013 - 2014年俄罗斯境内存在两种不同谱系的A型口蹄疫病毒，这对该地区该病毒的流行病学具有重大影响。

Viruses. 2024 Dec 25;17(1):8. doi: 10.3390/v17010008.

Influenza sequence validation and annotation using VADR.使用 VADR 进行流感序列验证和注释。

Database (Oxford). 2024 Sep 19;2024. doi: 10.1093/database/baae091.

UnCoVar: a reproducible and scalable workflow for transparent and robust virus variant calling and lineage assignment using SARS-CoV-2 as an example.UnCoVar：一个可重现和可扩展的工作流程，用于使用 SARS-CoV-2 作为示例进行透明且稳健的病毒变异体检测和谱系分配。

BMC Genomics. 2024 Jun 28;25(1):647. doi: 10.1186/s12864-024-10539-0.

GenBase: A Nucleotide Sequence Database.GenBase：一个核苷酸序列数据库。

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae047.

Lessons learned: overcoming common challenges in reconstructing the SARS-CoV-2 genome from short-read sequencing data via CoVpipe2.经验教训：通过CoVpipe2从短读长测序数据重建严重急性呼吸综合征冠状病毒2（SARS-CoV-2）基因组时克服常见挑战。

F1000Res. 2024 Apr 16;12:1091. doi: 10.12688/f1000research.136683.1. eCollection 2023.

本文引用的文献

Updated classification of norovirus genogroups and genotypes.诺如病毒基因群和基因型的更新分类。

J Gen Virol. 2019 Oct;100(10):1393-1406. doi: 10.1099/jgv.0.001318.

Vgas: A Viral Genome Annotation System.Vgas：一种病毒基因组注释系统。

Front Microbiol. 2019 Feb 13;10:184. doi: 10.3389/fmicb.2019.00184. eCollection 2019.

BMC Bioinformatics. 2019 Jan 23;20(1):48. doi: 10.1186/s12859-019-2606-y.

GenBank.GenBank。

Nucleic Acids Res. 2019 Jan 8;47(D1):D94-D99. doi: 10.1093/nar/gky989.

RNA Structure-A Neglected Puppet Master for the Evolution of Virus and Host Immunity.RNA 结构——病毒和宿主免疫进化中被忽视的傀儡大师。

Front Immunol. 2018 Sep 19;9:2097. doi: 10.3389/fimmu.2018.02097. eCollection 2018.

Mechanism and structural diversity of exoribonuclease-resistant RNA structures in flaviviral RNAs.黄病毒 RNA 中外切核酸酶抗性 RNA 结构的机制和结构多样性。

Nat Commun. 2018 Jan 9;9(1):119. doi: 10.1038/s41467-017-02604-y.

Dengue virus genomic variation associated with mosquito adaptation defines the pattern of viral non-coding RNAs and fitness in human cells.与蚊子适应性相关的登革病毒基因组变异决定了病毒非编码RNA的模式以及在人类细胞中的适应性。

PLoS Pathog. 2017 Mar 6;13(3):e1006265. doi: 10.1371/journal.ppat.1006265. eCollection 2017 Mar.

Virus Variation Resource - improved response to emergent viral outbreaks.病毒变异资源——提升对新出现病毒爆发的应对能力。

Nucleic Acids Res. 2017 Jan 4;45(D1):D482-D490. doi: 10.1093/nar/gkw1065. Epub 2016 Nov 28.

NCBI prokaryotic genome annotation pipeline.美国国立生物技术信息中心原核生物基因组注释管道

Nucleic Acids Res. 2016 Aug 19;44(14):6614-24. doi: 10.1093/nar/gkw569. Epub 2016 Jun 24.

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列（RefSeq）数据库：当前状态、分类扩展及功能注释。

Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

VADR：病毒序列提交到 GenBank 的验证和注释。

VADR: validation and annotation of virus sequence submissions to GenBank.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献