Lee Sejoon, Lee Soohyun, Ouellette Scott, Park Woong-Yang, Lee Eunjung A, Park Peter J
Samsung Genome Institute, Samsung Medical Center, Seoul, 06351, South Korea.
SD Genomics Co., Ltd, Seoul, 06336, South Korea.
Nucleic Acids Res. 2017 Jun 20;45(11):e103. doi: 10.1093/nar/gkx193.
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies.
在许多下一代测序(NGS)研究中,每个个体都对多个样本或数据类型进行了分析。这些研究中的一个重要质量控制(QC)步骤是确保来自同一受试者的数据集正确配对。鉴于多维研究中数据类型、文件类型和测序深度的异质性,一个能为基因型比较提供标准化指标的强大程序将很有用。在此,我们描述了NGSCheckMate,这是一个用于从FASTQ、BAM或VCF文件验证样本身份的用户友好型软件包。该工具使用基于模型的方法,在已知单核苷酸多态性处比较等位基因读数分数,同时考虑相同和不相关样本相似性度量的深度依赖性行为。我们的评估表明,NGSCheckMate对多种数据类型有效,包括外显子组测序、全基因组测序、RNA测序、ChIP测序、靶向测序和单细胞全基因组测序,对测序深度的要求极低(>0.5X)。一个无比对模块可以直接在FASTQ文件上运行以进行快速初步检查。我们建议在NGS研究中使用该软件作为质量控制步骤。