Yi Ming, Zhao Yongmei, Jia Li, He Mei, Kebebew Electron, Stephens Robert M
Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA Current address: Cancer Research and Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc. PO Box B, Frederick, MD, 21702.
Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA.
Nucleic Acids Res. 2014 Jul;42(12):e101. doi: 10.1093/nar/gku392. Epub 2014 May 15.
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios--family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.
为了在临床环境中应用外显子组测序衍生的变异,迫切需要从大量可用选项中识别出最佳的变异检测工具。我们使用了一个Illumina外显子组测序数据集作为基准,采用两种验证方案——同一批样本的家系信息和SNP芯片数据,以实现全局高通量交叉验证,从而使用一组指定的质量指标来评估来自开源和商业社区的几种流行变异发现工具所得到的SNP检测质量。据我们所知,这是首次使用孟德尔遗传检查和SNP芯片数据进行高通量验证的外显子组测序变异发现工具的大规模性能比较,这使我们能够以前所未有的方式通过这种高通量验证深入了解SNP检测的准确性,而此前报道的比较研究仅评估了这些工具的一致性,并未直接评估所得到的SNP的质量。更重要的是,我们研究的主要目的是建立一个可重复使用的程序,该程序应用高通量验证来比较以外显子组测序为重点的SNP发现工具的质量,可用于比较任何即将出现的感兴趣的工具。