利用 VAPr 进行高效的群体规模变异分析和优先级排序。

Efficient population-scale variant analysis and prioritization with VAPr.

机构信息

Center for Computational Biology and Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA.

出版信息

Bioinformatics. 2018 Aug 15;34(16):2843-2845. doi: 10.1093/bioinformatics/bty192.

DOI:10.1093/bioinformatics/bty192

PMID:29659724

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6084604/

Abstract

SUMMARY

With the growing availability of population-scale whole-exome and whole-genome sequencing, demand for reproducible, scalable variant analysis has spread within genomic research communities. To address this need, we introduce the Python package Variant Analysis and Prioritization (VAPr). VAPr leverages existing annotation tools ANNOVAR and MyVariant.info with MongoDB-based flexible storage and filtering functionality. It offers biologists and bioinformatics generalists easy-to-use and scalable analysis and prioritization of genomic variants from large cohort studies.

AVAILABILITY AND IMPLEMENTATION

VAPr is developed in Python and is available for free use and extension under the MIT License. An install package is available on PyPi at https://pypi.python.org/pypi/VAPr, while source code and extensive documentation are on GitHub at https://github.com/ucsd-ccbb/VAPr.

摘要

随着人群规模的外显子组和全基因组测序的日益普及，可重复性、可扩展性的变异分析需求在基因组研究群体中迅速传播。为满足这一需求，我们引入了 Python 包 Variant Analysis and Prioritization（VAPr）。VAPr 利用现有的注释工具 ANNOVAR 和 MyVariant.info，以及基于 MongoDB 的灵活存储和过滤功能。它为生物学家和生物信息学通才提供了易于使用和可扩展的分析和优先级排序功能，可用于来自大型队列研究的基因组变异。

可用性和实现

VAPr 是用 Python 开发的，可根据麻省理工学院许可免费使用和扩展。安装包可在 PyPi 上获得，网址为 https://pypi.python.org/pypi/VAPr，而源代码和广泛的文档则可在 GitHub 上获得，网址为 https://github.com/ucsd-ccbb/VAPr。

相似文献

Efficient population-scale variant analysis and prioritization with VAPr.利用 VAPr 进行高效的群体规模变异分析和优先级排序。

Bioinformatics. 2018 Aug 15;34(16):2843-2845. doi: 10.1093/bioinformatics/bty192.

Simulating Illumina metagenomic data with InSilicoSeq.用 InSilicoSeq 模拟 Illumina 宏基因组数据。

Bioinformatics. 2019 Feb 1;35(3):521-522. doi: 10.1093/bioinformatics/bty630.

Interactive network visualization in Jupyter notebooks: visJS2jupyter.交互式网络可视化在 Jupyter 笔记本：visJS2jupyter。

Bioinformatics. 2018 Jan 1;34(1):126-128. doi: 10.1093/bioinformatics/btx581.

Gos: a declarative library for interactive genomics visualization in Python.Gos：一个用于 Python 中交互式基因组学可视化的声明式库。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad050.

Goldilocks: a tool for identifying genomic regions that are 'just right'.金发姑娘：一种用于识别“恰到好处”的基因组区域的工具。

Bioinformatics. 2016 Jul 1;32(13):2047-9. doi: 10.1093/bioinformatics/btw116. Epub 2016 Mar 7.

Pygenprop: a Python library for programmatic exploration and comparison of organism genome properties.Pygenprop：一个用于程序化探索和比较生物基因组属性的 Python 库。

Bioinformatics. 2019 Dec 1;35(23):5063-5065. doi: 10.1093/bioinformatics/btz522.

Analysing high-throughput sequencing data in Python with HTSeq 2.0.用 HTSeq 2.0 分析 Python 中的高通量测序数据。

Bioinformatics. 2022 May 13;38(10):2943-2945. doi: 10.1093/bioinformatics/btac166.

Pybedtools: a flexible Python library for manipulating genomic datasets and annotations.Pybedtools：一个灵活的 Python 库，用于操作基因组数据集和注释。

Bioinformatics. 2011 Dec 15;27(24):3423-4. doi: 10.1093/bioinformatics/btr539. Epub 2011 Sep 23.

PyIOmica: longitudinal omics analysis and trend identification.PyIOmica：纵向组学分析和趋势识别。

Bioinformatics. 2020 Apr 1;36(7):2306-2307. doi: 10.1093/bioinformatics/btz896.

PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences.PERF：一种从大型 DNA 序列中进行超快速和高效微卫星识别的穷举算法。

Bioinformatics. 2018 Mar 15;34(6):943-948. doi: 10.1093/bioinformatics/btx721.

引用本文的文献

Biological Interpretation of Complex Genomic Data.复杂基因组数据的生物学解释

Methods Mol Biol. 2019;1908:61-71. doi: 10.1007/978-1-4939-9004-7_5.

本文引用的文献

Evaluation of relational and NoSQL database architectures to manage genomic annotations.用于管理基因组注释的关系型和非关系型数据库架构评估。

J Biomed Inform. 2016 Dec;64:288-295. doi: 10.1016/j.jbi.2016.10.015. Epub 2016 Oct 31.

Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

High-performance web services for querying gene and variant annotation.用于查询基因和变异注释的高性能网络服务。

Genome Biol. 2016 May 6;17(1):91. doi: 10.1186/s13059-016-0953-9.

BigQ: a NoSQL based framework to handle genomic variants in i2b2.BigQ：一种基于NoSQL的框架，用于处理i2b2中的基因组变异。

BMC Bioinformatics. 2015 Dec 29;16:415. doi: 10.1186/s12859-015-0861-0.

GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies.全基因组关联研究中心：用于全基因组关联研究比较和查询的综合资源。

Eur J Hum Genet. 2014 Jul;22(7):949-52. doi: 10.1038/ejhg.2013.274. Epub 2013 Dec 4.

A survey of tools for variant analysis of next-generation genome sequencing data.下一代基因组测序数据变异分析工具综述。

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.

The variant call format and VCFtools.变异调用格式和 VCFtools。

Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.

Clinical assessment incorporating a personal genome.结合个人基因组的临床评估。

Lancet. 2010 Sep 11;376(9744):869; author reply 869-70. doi: 10.1016/S0140-6736(10)61404-3.

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.ANNOVAR：从高通量测序数据中注释遗传变异的功能。

Nucleic Acids Res. 2010 Sep;38(16):e164. doi: 10.1093/nar/gkq603. Epub 2010 Jul 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验