Suppr超能文献

为了更好地理解高覆盖样本中变体调用中的伪影。

Toward better understanding of artifacts in variant calling from high-coverage samples.

机构信息

Medical Population Genetics Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.

Abstract

MOTIVATION

Whole-genome high-coverage sequencing has been widely used for personal and cancer genomics as well as in various research areas. However, in the lack of an unbiased whole-genome truth set, the global error rate of variant calls and the leading causal artifacts still remain unclear even given the great efforts in the evaluation of variant calling methods.

RESULTS

We made 10 single nucleotide polymorphism and INDEL call sets with two read mappers and five variant callers, both on a haploid human genome and a diploid genome at a similar coverage. By investigating false heterozygous calls in the haploid genome, we identified the erroneous realignment in low-complexity regions and the incomplete reference genome with respect to the sample as the two major sources of errors, which press for continued improvements in these two areas. We estimated that the error rate of raw genotype calls is as high as 1 in 10-15 kb, but the error rate of post-filtered calls is reduced to 1 in 100-200 kb without significant compromise on the sensitivity.

AVAILABILITY AND IMPLEMENTATION

BWA-MEM alignment and raw variant calls are available at http://bit.ly/1g8XqRt scripts and miscellaneous data at https://github.com/lh3/varcmp.

CONTACT

hengli@broadinstitute.org

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组高覆盖率测序已广泛应用于个人和癌症基因组学以及各个研究领域。然而,由于缺乏无偏的全基因组真实数据集,即使在评估变异调用方法方面付出了巨大努力,变异调用的全局错误率和主要因果人工制品仍然不清楚。

结果

我们使用两种读取映射器和五种变异调用者,在单倍体人类基因组和相似覆盖率的二倍体基因组上分别制作了 10 个单核苷酸多态性和 INDEL 调用集。通过调查单倍体基因组中假杂合子调用,我们确定了低复杂度区域中的错误重比对和相对于样本的不完整参考基因组是这两个主要错误源,这需要在这两个方面继续改进。我们估计原始基因型调用的错误率高达每 10-15kb 一个,但过滤后调用的错误率降低到每 100-200kb 一个,而不会对灵敏度造成显著影响。

可用性和实现

BWA-MEM 比对和原始变异调用可在 http://bit.ly/1g8XqRt 上获得;脚本和各种数据可在 https://github.com/lh3/varcmp 上获得。

联系人

hengli@broadinstitute.org

补充信息

补充数据可在《生物信息学》在线获得。

相似文献

7
FermiKit: assembly-based variant calling for Illumina resequencing data.FermiKit:用于Illumina重测序数据的基于组装的变异检测
Bioinformatics. 2015 Nov 15;31(22):3694-6. doi: 10.1093/bioinformatics/btv440. Epub 2015 Jul 27.

引用本文的文献

5
Tracing the stepwise Darwinian evolution of a plant halogenase.追踪植物卤化酶的逐步达尔文进化过程。
Sci Adv. 2025 Aug 15;11(33):eadv6898. doi: 10.1126/sciadv.adv6898. Epub 2025 Aug 13.

本文引用的文献

2
SMaSH: a benchmarking toolkit for human genome variant calling.SMaSH:一种用于人类基因组变异检测的基准测试工具包。
Bioinformatics. 2014 Oct;30(19):2787-95. doi: 10.1093/bioinformatics/btu345. Epub 2014 Jun 3.
8
Emerging patterns of somatic mutations in cancer.癌症中体细胞突变的新兴模式。
Nat Rev Genet. 2013 Oct;14(10):703-18. doi: 10.1038/nrg3539. Epub 2013 Sep 11.
9
A comparative analysis of algorithms for somatic SNV detection in cancer.癌症体细胞单核苷酸变异检测算法的比较分析。
Bioinformatics. 2013 Sep 15;29(18):2223-30. doi: 10.1093/bioinformatics/btt375. Epub 2013 Jul 9.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验