Suppr超能文献

下一代测序数据的变异调用者:一项比较研究。

Variant callers for next-generation sequencing data: a comparison study.

机构信息

Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America ; VA CT Health Care Center, West Haven, Connecticut, United States of America.

出版信息

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

Abstract

Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a "gold-standard" method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.

摘要

下一代测序 (NGS) 技术已经引领人类疾病的遗传学研究进入了一个空前高效的时代。许多生物信息学管道已经被开发出来,用于从 NGS 数据中调用变异。这些管道的性能在很大程度上取决于所使用的变异调用器和所实施的调用策略。我们使用单样本和多样本变异调用策略,研究了四个流行的调用器(SAMtools、GATK、glftools 和 Atlas2)的性能。使用相同的比对器 BWA,我们构建了四个单样本和三个多样本调用管道,并将这些管道应用于从 20 个人中获取的全外显子组测序数据。我们获得了 Illumina Infinium HumanExome v1.1 Beadchip 生成的基因型,用于验证分析,然后使用 Sanger 测序作为“金标准”方法来解决高不一致性区域的差异。最后,我们使用已知的模拟全基因组序列数据作为金标准,比较了三个单样本调用管道的灵敏度。总的来说,对于单样本调用,不同调用器之间的变异调用结果高度一致,两两重叠率约为 0.9。与其他调用器相比,GATK 的重发现率(0.9969)和特异性(0.99996)最高,并且 GATK 的 Ti/Tv 比值最接近 3.02 的预期值。多样本调用提高了灵敏度。模拟数据的结果表明,在灵敏度方面,GATK 优于 SAMtools 和 glfSingle,特别是在低覆盖数据的情况下。此外,对于通过 Sanger 测序评估的选定差异区域,与外显子组测序相比,外显子组芯片调用的变异基因型更准确,尽管平均变异灵敏度和总体基因型一致性率分别高达 95.87%和 99.82%。总之,GATK 在一般 NGS 分析中相对于其他变异调用器具有一些优势。我们开发的 GATK 管道表现非常出色。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f78/3785481/d11840f7eac2/pone.0075619.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验