Center for Applied Genomics, Children's Hospital of Philadelphia, PA 19104, USA.
Nucleic Acids Res. 2010 Sep;38(16):e164. doi: 10.1093/nar/gkq603. Epub 2010 Jul 3.
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
高通量测序平台正在为各种基因组生成大量的遗传变异数据,但要精确定位一小部分功能重要的变体仍然是一个挑战。为了满足这些未满足的需求,我们开发了 ANNOVAR 工具来注释单核苷酸变体 (SNVs) 和插入/缺失,例如检查它们对基因的功能影响,推断细胞遗传学带,报告功能重要性评分,在保守区域中查找变体,或识别 1000 基因组计划和 dbSNP 中报告的变体。ANNOVAR 可以利用 UCSC 基因组浏览器的注释数据库或符合通用特征格式版本 3 (GFF3) 的任何注释数据集。我们还展示了一种“变体减少”方案,该方案基于人类基因组中的 470 万 SNVs 和 indels,包括米勒综合征的两个因果突变,米勒综合征是一种罕见的隐性疾病。通过逐步的过程,我们排除了不太可能是因果关系的变体,并确定了包括因果基因在内的 20 个候选基因。使用台式计算机,ANNOVAR 执行基于基因的注释大约需要 4 分钟,执行 470 万变体的变体减少大约需要 15 分钟,使其能够在一天内处理数百个人类基因组。ANNOVAR 可在 http://www.openbioinformatics.org/annovar/ 免费获得。