Department of Biology, Brigham Young University, Provo, UT, 84602, USA.
F1000Res. 2020 Oct 8;9:1211. doi: 10.12688/f1000research.26848.2. eCollection 2020.
Compound Heterozygous ( ) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. To overcome these challenges, we developed Compound Heterozygous Variant Identification Pipeline (CompoundHetVIP), which uses a single Docker image to encapsulate commonly used software tools for file aggregation ( or ), VCF liftover ( ), joint-genotyping ( ), file conversion ( ), phasing ( , , and/or ), variant normalization ( tools), annotation ( ), relational database generation ( ), and identification of , homozygous alternate, and variants in a series of 13 steps. To begin using our tool, researchers need only install the Docker engine and download the CompoundHetVIP Docker image. The tools provided in CompoundHetVIP, subject to the limitations of the underlying software, can be applied to whole-genome, whole-exome, or targeted exome sequencing data of individual samples or trios (a child and both parents), using VCF or gVCF files as initial input. Each step of the pipeline produces an analysis-ready output file that can be further evaluated. To illustrate its use, we applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index. While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. CompoundHetVIP is open-source software and can be found at https://github.com/dmiller903/CompoundHetVIP; this repository also provides detailed, step-by-step examples.
复合杂合子 () 变异体的鉴定需要区分母源和父源核苷酸,这一过程需要大量的计算工具。使用这些工具通常会带来意想不到的挑战,例如特定于操作系统的安装程序、必须安装的软件依赖项以及输入文件的格式要求。为了克服这些挑战,我们开发了复合杂合子变异体鉴定管道 (CompoundHetVIP),该管道使用单个 Docker 镜像来封装常用的文件聚合软件工具 ( 或 )、VCF 提升 ( )、联合基因分型 ( )、文件转换 ( )、相位 ( 、 和/或 )、变异体标准化 ( 工具)、注释 ( )、关系数据库生成 ( )以及在一系列 13 个步骤中鉴定 、纯合替代和 变体。要开始使用我们的工具,研究人员只需安装 Docker 引擎并下载 CompoundHetVIP Docker 镜像。CompoundHetVIP 中提供的工具(受底层软件的限制)可以应用于个体样本或三体型(一个孩子和两个父母)的全基因组、全外显子或靶向外显子测序数据,使用 VCF 或 gVCF 文件作为初始输入。管道的每个步骤都会生成一个可进一步评估的分析就绪输出文件。为了说明其用途,我们将 CompoundHetVIP 应用于公开可用的阿什肯纳兹三人组的数据,并在基于用户设置的全局次要等位基因频率、综合注释依赖耗竭和基因损伤指数的阈值对数据进行过滤后,鉴定出两个候选 变体基因和两个候选纯合替代变体基因。虽然此示例使用来自健康儿童的基因组数据,但我们预计大多数研究人员将使用 CompoundHetVIP 揭示人类疾病和其他表型中的遗传缺失。CompoundHetVIP 是开源软件,可以在 https://github.com/dmiller903/CompoundHetVIP 找到;该存储库还提供了详细的、逐步的示例。