通过双测序应用程序从具有高错误率的下一代测序数据中估计 DNA 多态性。

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications.

机构信息

State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, Sun Yat-sen University, 135 Xingang West Road, Guangzhou 510275, China.

出版信息

BMC Genomics. 2013 Aug 7;14:535. doi: 10.1186/1471-2164-14-535.

DOI:10.1186/1471-2164-14-535

PMID:23919637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3750404/

Abstract

BACKGROUND

As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data.

RESULTS

By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low.

CONCLUSIONS

In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice.

摘要

背景

由于下一代测序（NGS）数据中的错误率较高，且错误在各站点的分布不均匀，因此准确估计 DNA 多态性（θ）一直是一个挑战。

结果

通过计算机模拟，我们比较了两种数据采集方法 - 分别对每个二倍体个体进行测序和对混合样本进行测序。在当前的 NGS 错误率下，除非每个个体的覆盖度很高（>20X），否则分别对每个个体进行测序几乎没有优势。因此，我们提出了一种从经过两轮独立 DNA 测序的混合样本中估计θ的新方法。由于来自两种测序应用的错误通常不重叠，因此可以将低频多态性与测序错误区分开来。模拟结果表明，即使在错误率高且θ值低的情况下，双重应用方法也是可靠的。

结论

在测序覆盖率通常适中（每个个体约 2X）的自然种群研究中，混合样本的双重应用方法应该是一个合理的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1dc/3750404/324f43b1e98b/1471-2164-14-535-1.jpg

相似文献

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications.通过双测序应用程序从具有高错误率的下一代测序数据中估计 DNA 多态性。

BMC Genomics. 2013 Aug 7;14:535. doi: 10.1186/1471-2164-14-535.

The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.基于 DNA 样本池的高通量测序的下一代分子标记物。

Genetics. 2010 Sep;186(1):207-18. doi: 10.1534/genetics.110.114397. Epub 2010 May 10.

Barcode-free next-generation sequencing error validation for ultra-rare variant detection.无条码的下一代测序错误验证，用于超低频变异检测。

Nat Commun. 2019 Feb 28;10(1):977. doi: 10.1038/s41467-019-08941-4.

distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data.distAngsd：用于下一代测序数据的快速准确的遗传距离推断。

Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac119.

Empirical estimation of sequencing error rates using smoothing splines.使用平滑样条对测序错误率进行经验估计。

BMC Bioinformatics. 2016 Apr 22;17:177. doi: 10.1186/s12859-016-1052-3.

Efficient identification of SNPs in pooled DNA samples using a dual mononucleotide addition-based sequencing method.使用基于双单核苷酸添加的测序方法在混合DNA样本中高效鉴定单核苷酸多态性

Mol Genet Genomics. 2017 Oct;292(5):1069-1081. doi: 10.1007/s00438-017-1332-2. Epub 2017 Jun 13.

How to infer reliable diploid genotypes from NGS or traditional sequence data: from basic probability to experimental optimization.如何从二代测序（NGS）或传统序列数据中推断可靠的二倍体基因型：从基本概率到实验优化

J Evol Biol. 2012 May;25(5):949-60. doi: 10.1111/j.1420-9101.2012.02488.x. Epub 2012 Mar 16.

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap：用于下一代测序读数的加权单倍型组装

J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.

SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

Routine performance and errors of 454 HLA exon sequencing in diagnostics.454 HLA 外显子测序在诊断中的常规性能和误差。

BMC Bioinformatics. 2013 Jun 3;14:176. doi: 10.1186/1471-2105-14-176.

引用本文的文献

Whole-genome sequencing to identify rare variants in East Asian patients with dementia with Lewy bodies.全基因组测序以鉴定东亚路易体痴呆患者中的罕见变异。

NPJ Aging. 2024 Nov 21;10(1):52. doi: 10.1038/s41514-024-00180-2.

Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent.利用三亲同缘关系估计全基因组突变率。

Am J Hum Genet. 2019 Nov 7;105(5):883-893. doi: 10.1016/j.ajhg.2019.09.012. Epub 2019 Oct 3.

Pronounced genetic differentiation and recent secondary contact in the mangrove tree Lumnitzera racemosa revealed by population genomic analyses.种群基因组分析揭示红树植物角果木的遗传分化和近期的次级接触。

Sci Rep. 2016 Jul 6;6:29486. doi: 10.1038/srep29486.

Leveraging Distant Relatedness to Quantify Human Mutation and Gene-Conversion Rates.利用远缘相关性来量化人类突变和基因转换率。

Am J Hum Genet. 2015 Dec 3;97(6):775-89. doi: 10.1016/j.ajhg.2015.10.006. Epub 2015 Nov 12.

Development and Validation of EST-SSR Markers from the Transcriptome of Adzuki Bean (Vigna angularis).小豆（Vigna angularis）转录组EST-SSR标记的开发与验证

PLoS One. 2015 Jul 6;10(7):e0131939. doi: 10.1371/journal.pone.0131939. eCollection 2015.

Target gene capture sequencing in Chinese population of sporadic Parkinson disease.中国散发性帕金森病患者群体中的目标基因捕获测序

Medicine (Baltimore). 2015 May;94(20):e836. doi: 10.1097/MD.0000000000000836.

本文引用的文献

Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.无需文库制备直接对 Pacific Biosciences RS 进行小基因组的直接测序。

Biotechniques. 2012 Dec;53(6):365-72. doi: 10.2144/000113962.

Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean.多个 Rhg1 基因的拷贝数变异介导大豆线虫抗性。

Science. 2012 Nov 30;338(6111):1206-9. doi: 10.1126/science.1228746. Epub 2012 Oct 11.

SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.从新一代测序数据中进行 SNP 调用、基因型调用和样本等位基因频率估计。

PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.

Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase.利用突变型 MspA 纳米孔和 phi29 DNA 聚合酶实现单核苷酸分辨率下的 DNA 读取。

Nat Biotechnol. 2012 Mar 25;30(4):349-53. doi: 10.1038/nbt.2171.

A systematic survey of loss-of-function variants in human protein-coding genes.人类蛋白编码基因功能丧失变异的系统调查。

Science. 2012 Feb 17;335(6070):823-8. doi: 10.1126/science.1215040.

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.Illumina HiSeq 和基因组分析仪系统生成的基因组高通量测序数据评估。

Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

An integrated semiconductor device enabling non-optical genome sequencing.一种用于非光学基因组测序的集成半导体设备。

Nature. 2011 Jul 20;475(7356):348-52. doi: 10.1038/nature10242.

Two evolutionary histories in the genome of rice: the roles of domestication genes.基因组中的两种进化历史：驯化基因的作用。

PLoS Genet. 2011 Jun;7(6):e1002100. doi: 10.1371/journal.pgen.1002100. Epub 2011 Jun 9.

Population genetics in nonmodel organisms: II. natural selection in marginal habitats revealed by deep sequencing on dual platforms.非模式生物的群体遗传学：二、通过双平台深度测序揭示边缘生境中的自然选择。

Mol Biol Evol. 2011 Oct;28(10):2833-42. doi: 10.1093/molbev/msr102. Epub 2011 Apr 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过双测序应用程序从具有高错误率的下一代测序数据中估计 DNA 多态性。

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献