利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

A framework for variation discovery and genotyping using next-generation DNA sequencing data.

机构信息

Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

出版信息

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

Abstract

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.

摘要

测序技术的最新进展使得全面编目人群样本中的遗传变异成为可能,为理解人类疾病、祖源和进化奠定了基础。产生的原始数据量非常巨大,需要许多计算步骤才能将这些输出转化为高质量的变异调用。我们提出了一个统一的分析框架,可以同时发现和分析多个样本中的变异,在五种测序技术和三种不同的、典型的实验设计中实现了敏感和特异的结果。我们的流程包括:(i)初始读映射;(ii)插入缺失的局部重-align;(iii)碱基质量评分再校准;(iv)SNP 发现和 genotyping 以找到所有潜在的变异;以及(v)机器学习,以将真正的分离变异与常见于下一代测序技术的机器伪影区分开来。我们在这里讨论了这些工具的应用,它们体现在基因组分析工具包中,应用于深度全基因组、全外显子捕获和多样本低深度(约 4×)1000 基因组计划数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48a1/3083463/23a2dbe536ea/nihms281651f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索