一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

A statistical method for the detection of variants from next-generation resequencing of DNA pools.

机构信息

Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, CA 92037, USA.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

DOI:10.1093/bioinformatics/btq214

PMID:20529923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881398/

Abstract

MOTIVATION

Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing.

RESULTS

We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP.

AVAILABILITY

Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

摘要

动机

新一代测序技术已经能够对整个人类基因组进行测序。然而，对完整基因组的常规重测序仍然不可行。新一代测序仪的巨大容量可以用于对数百至数千个人的特定基因组区域进行测序。基于测序的关联研究目前受到测序平台提供的低多重性的限制。池化测序是研究大人群中稀有变异的一种具有成本效益的方法。为了利用 DNA 池化的力量，从池化测序数据中准确识别序列变异是很重要的。从池化测序中检测稀有变体与从个体测序中检测变体具有不同的挑战。

结果

我们描述了一种新的统计方法 CRISP（用于从池化测序中识别单核苷酸多态性（SNP）的综合读分析），它可以通过两种方法来识别稀有和常见的变体：（i）使用列联表比较多个池之间等位基因计数的分布，以及（ii）评估由于测序错误而单独观察多个非参考碱基调用的概率。该框架还结合了有关正向和反向链之间的读取分布以及池大小的信息，以过滤掉假变体。在使用 Illumina Genome Analyzer 生成的两个独立的池化测序数据集上对 CRISP 的验证表明，它可以检测到使用个体测序识别出的 80-85%的 SNP，同时实现低的假发现率（3-5%）。与用于池化 SNP 检测的先前方法的比较表明，CRISP 的假阳性和假阴性率明显更低。

可用性

此方法的实现可在 http://polymorphism.scripps.edu/~vbansal/software/CRISP/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/242e/2881398/639b6a629bc4/btq214f1.jpg

相似文献

A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

Accurate detection and genotyping of SNPs utilizing population sequencing data.利用群体测序数据进行 SNP 的精确检测和基因分型。

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.

A probabilistic method for the detection and genotyping of small indels from population-scale sequence data.一种基于概率方法的用于从人群规模序列数据中检测和分型小型插入缺失的方法。

Bioinformatics. 2011 Aug 1;27(15):2047-53. doi: 10.1093/bioinformatics/btr344. Epub 2011 Jun 7.

SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools.利用 DNA 池的基因组重测序检测和预测鸡系间的 SNP 变异。

BMC Genomics. 2010 Nov 25;11:665. doi: 10.1186/1471-2164-11-665.

Detection of rare genomic variants from pooled sequencing using SPLINTER.使用SPLINTER从混合测序中检测罕见基因组变异。

J Vis Exp. 2012 Jun 23(64):3943. doi: 10.3791/3943.

Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.用于阐明新的基因组风险因素、阿尔茨海默病潜在罕见变异的混合DNA测序

Methods Mol Biol. 2016;1303:299-314. doi: 10.1007/978-1-4939-2627-5_18.

Evaluation of variant detection software for pooled next-generation sequence data.用于混合下一代测序数据的变异检测软件评估

BMC Bioinformatics. 2015 Jul 29;16:235. doi: 10.1186/s12859-015-0624-y.

Statistical modeling for sensitive detection of low-frequency single nucleotide variants.用于低频单核苷酸变异灵敏检测的统计建模

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):514. doi: 10.1186/s12864-016-2905-x.

引用本文的文献

A meta-analysis of genetic and phenotypic diversity of European local pig breeds reveals genomic regions associated with breed differentiation for production traits.一项对欧洲本地猪品种遗传和表型多样性的荟萃分析揭示了与生产性状相关的与品种分化相关的基因组区域。

Genet Sel Evol. 2023 Dec 7;55(1):88. doi: 10.1186/s12711-023-00858-3.

Allele mining through TILLING and EcoTILLING approaches in vegetable crops.通过 TILLING 和 EcoTILLING 方法在蔬菜作物中进行等位基因挖掘。

Planta. 2023 Jun 13;258(1):15. doi: 10.1007/s00425-023-04176-2.

Whole Genome Sequencing Provides Information on the Genomic Architecture and Diversity of Cultivated Gilthead Seabream () Broodstock Nuclei.全基因组测序提供了关于养殖金头鲷（）亲鱼核基因组结构和多样性的信息。

Genes (Basel). 2023 Mar 30;14(4):839. doi: 10.3390/genes14040839.

Whole Genome Resequencing Revealed the Effect of Helicase Gene on Regulating LLP29 against Ultraviolet Radiation Stress.全基因组重测序揭示解旋酶基因对调控 LLP29 抵抗紫外线辐射应激的作用。

Int J Mol Sci. 2023 Mar 18;24(6):5810. doi: 10.3390/ijms24065810.

Reconstructing queen genotypes by pool sequencing colonies in eusocial insects: Statistical Methods and their application to honeybee.通过对社会性昆虫群体进行池测序来重建蜂王基因型：统计方法及其在蜜蜂中的应用。

Mol Ecol Resour. 2022 Nov;22(8):3035-3048. doi: 10.1111/1755-0998.13685. Epub 2022 Aug 1.

Contribution of Rare and Low-Frequency Variants to Multiple Sclerosis Susceptibility in the Italian Continental Population.意大利大陆人群中罕见和低频变异对多发性硬化易感性的贡献。

Front Genet. 2022 Jan 3;12:800262. doi: 10.3389/fgene.2021.800262. eCollection 2021.

Interest of exome sequencing trio-like strategy based on pooled parental DNA for diagnosis and translational research in rare diseases.基于混合父母 DNA 的外显子组测序三联体策略在罕见病诊断和转化研究中的应用。

Mol Genet Genomic Med. 2021 Dec;9(12):e1836. doi: 10.1002/mgg3.1836. Epub 2021 Oct 30.

TILLING-by-Sequencing Reveals the Role of Novel Fatty Acid Desaturases (GmFAD2-2s) in Increasing Soybean Seed Oleic Acid Content.测序靶向诱变揭示新型脂肪酸去饱和酶（GmFAD2-2s）在提高大豆种子油酸含量中的作用。

Cells. 2021 May 19;10(5):1245. doi: 10.3390/cells10051245.

TILLING-by-Sequencing to Decipher Oil Biosynthesis Pathway in Soybeans: A New and Effective Platform for High-Throughput Gene Functional Analysis.利用测序进行靶向诱变解析大豆油脂生物合成途径：一种高通量基因功能分析的新有效平台。

Int J Mol Sci. 2021 Apr 19;22(8):4219. doi: 10.3390/ijms22084219.

Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach.基于位置和核苷酸特异性的非配对样本低水平变异调用。

BMC Bioinformatics. 2021 Apr 8;22(1):181. doi: 10.1186/s12859-021-04090-y.

本文引用的文献

Accurate detection and genotyping of SNPs utilizing population sequencing data.利用群体测序数据进行 SNP 的精确检测和基因分型。

Genome Res. 2010 Apr;20(4):537-45. doi: 10.1101/gr.100040.109. Epub 2010 Feb 11.

Deep sequencing to reveal new variants in pooled DNA samples.深度测序揭示混合 DNA 样本中的新变体。

Hum Mutat. 2009 Dec;30(12):1703-12. doi: 10.1002/humu.21122.

Finding the missing heritability of complex diseases.寻找复杂疾病中缺失的遗传力。

Nature. 2009 Oct 8;461(7265):747-53. doi: 10.1038/nature08494.

A highly annotated whole-genome sequence of a Korean individual.一名韩国个体的高度注释全基因组序列。

Nature. 2009 Aug 20;460(7258):1011-5. doi: 10.1038/nature08211. Epub 2009 Jul 8.

VarScan: variant detection in massively parallel sequencing of individual and pooled samples.VarScan：个体样本与混合样本大规模平行测序中的变异检测

Bioinformatics. 2009 Sep 1;25(17):2283-5. doi: 10.1093/bioinformatics/btp373. Epub 2009 Jun 19.

The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

SOAP2: an improved ultrafast tool for short read alignment.SOAP2：一种用于短读序列比对的改进型超快速工具。

Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.

SHRiMP: accurate mapping of short color-space reads.SHRiMP：短颜色空间读数的精确映射

PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

Overlapping pools for high-throughput targeted resequencing.用于高通量靶向重测序的重叠文库。

Genome Res. 2009 Jul;19(7):1254-61. doi: 10.1101/gr.088559.108. Epub 2009 May 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

A statistical method for the detection of variants from next-generation resequencing of DNA pools.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献