利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

机构信息

Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

出版信息

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

DOI:10.1038/ng.806

PMID:21478889

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3083463/

Abstract

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.

摘要

测序技术的最新进展使得全面编目人群样本中的遗传变异成为可能，为理解人类疾病、祖源和进化奠定了基础。产生的原始数据量非常巨大，需要许多计算步骤才能将这些输出转化为高质量的变异调用。我们提出了一个统一的分析框架，可以同时发现和分析多个样本中的变异，在五种测序技术和三种不同的、典型的实验设计中实现了敏感和特异的结果。我们的流程包括：（i）初始读映射；（ii）插入缺失的局部重-align；（iii）碱基质量评分再校准；（iv）SNP 发现和 genotyping 以找到所有潜在的变异；以及（v）机器学习，以将真正的分离变异与常见于下一代测序技术的机器伪影区分开来。我们在这里讨论了这些工具的应用，它们体现在基因组分析工具包中，应用于深度全基因组、全外显子捕获和多样本低深度（约 4×）1000 基因组计划数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48a1/3083463/23a2dbe536ea/nihms281651f1.jpg

相似文献

A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.一种用于从群体规模的DNA序列数据中提取和优化变异体的高效且可扩展的分析框架。

Genome Res. 2015 Jun;25(6):918-25. doi: 10.1101/gr.176552.114. Epub 2015 Apr 16.

A probabilistic method for the detection and genotyping of small indels from population-scale sequence data.一种基于概率方法的用于从人群规模序列数据中检测和分型小型插入缺失的方法。

Bioinformatics. 2011 Aug 1;27(15):2047-53. doi: 10.1093/bioinformatics/btr344. Epub 2011 Jun 7.

A survey of tools for variant analysis of next-generation genome sequencing data.下一代基因组测序数据变异分析工具综述。

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

The functional spectrum of low-frequency coding variation.低频编码变异的功能谱。

Genome Biol. 2011 Sep 14;12(9):R84. doi: 10.1186/gb-2011-12-9-r84.

Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响

BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.

Imputation-based assessment of next generation rare exome variant arrays.基于插补法的新一代罕见外显子变异阵列评估

Pac Symp Biocomput. 2014:241-52.

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。

BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.

Accurate detection and genotyping of SNPs utilizing population sequencing data.利用群体测序数据进行 SNP 的精确检测和基因分型。

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

引用本文的文献

Multi-modal characterization of transcriptional programs that drive metastatic cascades to solid sites and ascites in ovarian cancer.驱动卵巢癌转移至实体部位和腹水的转录程序的多模态表征。

bioRxiv. 2025 Aug 27:2025.08.26.672372. doi: 10.1101/2025.08.26.672372.

Finding easy regions for short-read variant calling from pangenome data.从泛基因组数据中寻找易于进行短读变异检测的区域。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf103.

One mother for two species via obligate cross-species cloning in ants.蚂蚁通过专性跨物种克隆实现两个物种共享一位蚁后。

Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09425-w.

Single-cell transcriptomic and genomic changes in the ageing human brain.衰老人类大脑中的单细胞转录组和基因组变化。

Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09435-8.

Two New Species of Sect. From Sichuan, China.来自中国四川的[某属]两个新物种。（注：原文中“Sect.”后面缺少具体属名，翻译时根据语境补充了“[某属]”）

Ecol Evol. 2025 Aug 27;15(9):e72047. doi: 10.1002/ece3.72047. eCollection 2025 Sep.

Genome-wide selection signal analysis reveals the adaptability of Tibetan sheep to high altitudes.全基因组选择信号分析揭示了藏羊对高海拔环境的适应性。

Front Vet Sci. 2025 Aug 14;12:1632017. doi: 10.3389/fvets.2025.1632017. eCollection 2025.

DFFB suppresses interferon to enable cancer persister cell regrowth.DFFB抑制干扰素以促进癌症持久性细胞再生。

bioRxiv. 2025 Aug 21:2025.08.15.670603. doi: 10.1101/2025.08.15.670603.

Case-Control Study for 23 Cancer Types With Functional Analysis of : Risk Estimation and Clinical Recommendations in East Asia.东亚地区23种癌症类型的病例对照研究及功能分析：风险评估与临床建议

JCO Precis Oncol. 2025 Sep;9:e2400945. doi: 10.1200/PO-24-00945. Epub 2025 Sep 2.

A comprehensive water buffalo pangenome reveals extensive structural variation linked to population-specific signatures of selection.一个全面的水牛泛基因组揭示了与群体特异性选择特征相关的广泛结构变异。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf099.

Genetic Consequences of Tree Planting Versus Natural Colonisation: Implications for Afforestation Programmes in the United Kingdom.植树造林与自然定居的遗传后果：对英国造林计划的启示

Evol Appl. 2025 Aug 27;18(8):e70146. doi: 10.1111/eva.70146. eCollection 2025 Aug.

本文引用的文献

Variation in genome-wide mutation rates within and between human families.人类家族内和家族间全基因组突变率的变化。

Nat Genet. 2011 Jun 12;43(7):712-4. doi: 10.1038/ng.862.

Discovery and genotyping of genome structural polymorphism by sequencing on a population scale.基于人群规模测序的基因组结构多态性的发现和基因分型。

Nat Genet. 2011 Mar;43(3):269-76. doi: 10.1038/ng.768. Epub 2011 Feb 13.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包：一种用于分析下一代 DNA 测序数据的 MapReduce 框架。

Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.

Sequencing of 50 human exomes reveals adaptation to high altitude.对 50 个人类外显子组的测序揭示了对高海拔的适应。

Science. 2010 Jul 2;329(5987):75-8. doi: 10.1126/science.1190371.

The mutation spectrum revealed by paired genome sequences from a lung cancer patient.配对肺癌患者基因组序列揭示的突变谱。

Nature. 2010 May 27;465(7297):473-7. doi: 10.1038/nature09004.

A draft sequence of the Neandertal genome.尼安德特人基因组草图。

Science. 2010 May 7;328(5979):710-722. doi: 10.1126/science.1188021.

Analysis of genetic inheritance in a family quartet by whole-genome sequencing.全基因组测序分析一家四口的遗传情况。

Science. 2010 Apr 30;328(5978):636-9. doi: 10.1126/science.1186802. Epub 2010 Mar 10.

High quality SNP calling using Illumina data at shallow coverage.使用 Illumina 数据进行低深度覆盖的高质量 SNP 调用。

Bioinformatics. 2010 Apr 15;26(8):1029-35. doi: 10.1093/bioinformatics/btq092. Epub 2010 Feb 26.

The landscape of somatic copy-number alteration across human cancers.人类癌症中体细胞拷贝数改变的全景。

Nature. 2010 Feb 18;463(7283):899-905. doi: 10.1038/nature08822.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献