优化过滤可降低短读测序检测基因组变异的错误率。

Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing.

机构信息

Vesalius Research Center, Vlaams Instituut voor Biotechnologie (VIB), Leuven, Belgium.

出版信息

Nat Biotechnol. 2011 Dec 18;30(1):61-8. doi: 10.1038/nbt.2053.

DOI:10.1038/nbt.2053

Abstract

Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.

摘要

区分单核苷酸变异 (SNVs) 和全基因组序列中的错误仍然具有挑战性。在这里，我们描述了一组过滤器，以及一个免费的可用软件工具，该工具可选择性地降低错误率，从而促进来自两种短读长测序技术（Complete Genomics 和 Illumina）的数据中的变异检测。通过对同卵双胞胎的几乎相同的基因组进行测序，并将共享的 SNVs 视为“真正的变异”，将不一致的 SNVs 视为“错误”，我们针对 12 个单独的过滤器优化了阈值，并评估了 1,048 种过滤器组合中的哪些在灵敏度和特异性方面有效。所有有效过滤器的累积应用将错误率降低了 290 倍，有助于识别同卵双胞胎之间的遗传差异。我们还应用了一组经过改编的、不那么严格的过滤器，以可靠地识别高度重排肿瘤中的体细胞突变，并识别相对于参考 SNV 集的 NA19240 HapMap 基因组中的变体。

相似文献

Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing.优化过滤可降低短读测序检测基因组变异的错误率。

Nat Biotechnol. 2011 Dec 18;30(1):61-8. doi: 10.1038/nbt.2053.

Development of the variant calling algorithm, ADIScan, and its use to estimate discordant sequences between monozygotic twins.变异调用算法 ADIScan 的开发及其在估计同卵双胞胎之间不一致序列中的应用。

Nucleic Acids Res. 2018 Sep 6;46(15):e92. doi: 10.1093/nar/gky445.

Technical strategy for monozygotic twin discrimination by single-nucleotide variants.利用单核苷酸变异进行单卵双胞胎鉴别技术策略。

Int J Legal Med. 2024 May;138(3):767-779. doi: 10.1007/s00414-023-03150-7. Epub 2024 Jan 10.

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。

Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.

Revising a personal genome by comparing and combining data from two different sequencing platforms.通过比较和整合来自两种不同测序平台的数据来修正个人基因组。

PLoS One. 2013 Apr 8;8(4):e60585. doi: 10.1371/journal.pone.0060585. Print 2013.

Statistical modeling for sensitive detection of low-frequency single nucleotide variants.用于低频单核苷酸变异灵敏检测的统计建模

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):514. doi: 10.1186/s12864-016-2905-x.

Aging as accelerated accumulation of somatic variants: whole-genome sequencing of centenarian and middle-aged monozygotic twin pairs.衰老作为体细胞变异的加速积累：百岁老人和中年同卵双胞胎对的全基因组测序

Twin Res Hum Genet. 2013 Dec;16(6):1026-32. doi: 10.1017/thg.2013.73. Epub 2013 Nov 4.

Precise detection of de novo single nucleotide variants in human genomes.精准检测人类基因组中的新单核苷酸变异。

Proc Natl Acad Sci U S A. 2018 May 22;115(21):5516-5521. doi: 10.1073/pnas.1802244115. Epub 2018 May 7.

From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software.从下一代测序比对到单核苷酸变异的精确比较和验证：pibase 软件。

Nucleic Acids Res. 2013 Jan 7;41(1):e16. doi: 10.1093/nar/gks836. Epub 2012 Sep 10.

Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing.质量控制指标提高了源自全基因组测序的单核苷酸变异的可重复性和再现性。

Pharmacogenomics J. 2015 Aug;15(4):298-309. doi: 10.1038/tpj.2014.70. Epub 2014 Nov 11.

引用本文的文献

Variant Ataxia-Telangiectasia Presenting as Tremor-Dystonia Syndrome in a Bulgarian Religious Minority.以震颤-肌张力障碍综合征为表现的变异型共济失调-毛细血管扩张症在保加利亚一个宗教少数群体中的病例报告

Genes (Basel). 2025 May 27;16(6):641. doi: 10.3390/genes16060641.

Investigation of the role of miRNA variants in neurodegenerative brain diseases.微小RNA变体在神经退行性脑疾病中的作用研究。

Front Genet. 2025 Feb 26;16:1506169. doi: 10.3389/fgene.2025.1506169. eCollection 2025.

Alternative splicing expands the clinical spectrum of NDUFS6-related mitochondrial disorders.可变剪接扩展了 NDUFS6 相关线粒体疾病的临床谱。

Genet Med. 2024 Jun;26(6):101117. doi: 10.1016/j.gim.2024.101117. Epub 2024 Mar 6.

QTL-seq analysis identified the genomic regions of plant height and days to heading in high-latitude rice.QTL-seq分析确定了高纬度水稻株高和抽穗天数的基因组区域。

Front Genet. 2024 Feb 14;15:1305681. doi: 10.3389/fgene.2024.1305681. eCollection 2024.

Combinatorial optimization of gene expression through recombinase-mediated promoter and terminator shuffling in yeast.通过重组酶介导的启动子和终止子改组在酵母中进行基因表达的组合优化。

Nat Commun. 2024 Feb 7;15(1):1112. doi: 10.1038/s41467-024-44997-7.

Mitochondrial GpC and CpG DNA Hypermethylation Cause Metabolic Stress-Induced Mitophagy and Cholestophagy.线粒体 GpC 和 CpG DNA 超甲基化导致代谢应激诱导的自噬和胆固醇自噬。

Int J Mol Sci. 2023 Nov 16;24(22):16412. doi: 10.3390/ijms242216412.

Construction and Application of an F1-Derived Doubled-Haploid Population and High-Density Genetic Map for Ornamental Kale Breeding.构建和应用源自 F1 的双单倍体群体和观赏羽衣甘蓝高密度遗传图谱用于观赏羽衣甘蓝的育种。

Genes (Basel). 2023 Nov 20;14(11):2104. doi: 10.3390/genes14112104.

Construction of High-Density Genetic Map and QTL Mapping for Grain Shape in the Rice RIL Population.水稻重组自交系群体高密度遗传图谱构建及粒形QTL定位

Plants (Basel). 2023 Aug 10;12(16):2911. doi: 10.3390/plants12162911.

Fine Mapping and Candidate Gene Analysis of Rice Grain Length QTL .水稻粒长 QTL 的精细定位和候选基因分析。

Int J Mol Sci. 2023 Jul 14;24(14):11447. doi: 10.3390/ijms241411447.

Integrated genetic analyses of immunodeficiency-associated Epstein-Barr virus- (EBV) positive primary CNS lymphomas.免疫缺陷相关的 EBV 阳性原发性中枢神经系统淋巴瘤的综合遗传学分析。

Acta Neuropathol. 2023 Sep;146(3):499-514. doi: 10.1007/s00401-023-02613-w. Epub 2023 Jul 26.

本文引用的文献

Integrated genomic analyses of ovarian carcinoma.卵巢癌的综合基因组分析。

Nature. 2011 Jun 29;474(7353):609-15. doi: 10.1038/nature10166.

The sequence is dead: long live the genome.

Nat Biotechnol. 2011 Jun 7;29(6):463. doi: 10.1038/nbt.1901.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

Haplotype-resolved genome sequencing of a Gujarati Indian individual.单体型解析的古吉拉特邦印度个体基因组测序。

Nat Biotechnol. 2011 Jan;29(1):59-63. doi: 10.1038/nbt.1740. Epub 2010 Dec 19.

L1 retrotransposition in neurons is modulated by MeCP2.神经元中的 L1 反转录转座子受 MeCP2 调控。

Nature. 2010 Nov 18;468(7322):443-6. doi: 10.1038/nature09544.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing.利用大规模平行测序对日本人进行全基因组测序和全面变异分析。

Nat Genet. 2010 Nov;42(11):931-6. doi: 10.1038/ng.691. Epub 2010 Oct 24.

The characterization of twenty sequenced human genomes.二十个人类测序基因组的特征描述。

PLoS Genet. 2010 Sep 9;6(9):e1001111. doi: 10.1371/journal.pgen.1001111.

Allele-specific copy number analysis of tumors.肿瘤的等位基因特异性拷贝数分析。

Proc Natl Acad Sci U S A. 2010 Sep 28;107(39):16910-5. doi: 10.1073/pnas.1009843107. Epub 2010 Sep 13.

Sequencing and analysis of an Irish human genome.爱尔兰人类基因组的测序与分析。

Genome Biol. 2010;11(9):R91. doi: 10.1186/gb-2010-11-9-r91. Epub 2010 Sep 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

优化过滤可降低短读测序检测基因组变异的错误率。

Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献