用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展

Extension of Lander-Waterman theory for sequencing filtered DNA libraries.

作者信息

Wendl Michael C, Barbazuk W Brad

机构信息

Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA.

出版信息

BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245.

DOI:10.1186/1471-2105-6-245

PMID:16216129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1280921/

Abstract

BACKGROUND

The degree to which conventional DNA sequencing techniques will be successful for highly repetitive genomes is unclear. Investigators are therefore considering various filtering methods to select against high-copy sequence in DNA clone libraries. The standard model for random sequencing, Lander-Waterman theory, does not account for two important issues in such libraries, discontinuities and position-based sampling biases (the so-called "edge effect"). We report an extension of the theory for analyzing such configurations.

RESULTS

The edge effect cannot be neglected in most cases. Specifically, rates of coverage and gap reduction are appreciably lower than those for conventional libraries, as predicted by standard theory. Performance decreases as read length increases relative to island size. Although opposite of what happens in a conventional library, this apparent paradox is readily explained in terms of the edge effect. The model agrees well with prototype gene-tagging experiments for Zea mays and Sorghum bicolor. Moreover, the associated density function suggests well-defined probabilistic milestones for the number of reads necessary to capture a given fraction of the gene space. An exception for applying standard theory arises if sequence redundancy is less than about 1-fold. Here, evolution of the random quantities is independent of library gaps and edge effects. This observation effectively validates the practice of using standard theory to estimate the genic enrichment of a library based on light shotgun sequencing.

CONCLUSION

Coverage performance using a filtered library is significantly lower than that for an equivalent-sized conventional library, suggesting that directed methods may be more critical for the former. The proposed model should be useful for analyzing future projects.

摘要

背景

传统DNA测序技术在高度重复基因组上取得成功的程度尚不清楚。因此，研究人员正在考虑各种过滤方法，以在DNA克隆文库中筛选出高拷贝序列。随机测序的标准模型，即兰德-沃特曼理论，并未考虑此类文库中的两个重要问题，即不连续性和基于位置的抽样偏差（所谓的“边缘效应”）。我们报告了该理论的一个扩展，用于分析此类结构。

结果

在大多数情况下，边缘效应不可忽视。具体而言，覆盖率和间隙减少率明显低于标准理论预测的传统文库。随着读长相对于片段大小增加，性能会下降。尽管这与传统文库中的情况相反，但这种明显的矛盾可以用边缘效应很容易地解释。该模型与玉米和高粱的原型基因标记实验结果非常吻合。此外，相关的密度函数为捕获给定比例的基因空间所需的读数数量提出了明确的概率里程碑。如果序列冗余度小于约1倍，则应用标准理论会出现例外情况。在此情况下，随机量的演变与文库间隙和边缘效应无关。这一观察结果有效地验证了基于轻度鸟枪法测序使用标准理论来估计文库基因富集度的做法。

结论

使用过滤文库的覆盖性能明显低于同等大小的传统文库，这表明定向方法对前者可能更为关键。所提出的模型应有助于分析未来的项目。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/138f/1280921/11b587a0bdbe/1471-2105-6-245-1.jpg

相似文献

Extension of Lander-Waterman theory for sequencing filtered DNA libraries.用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展

BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245.

A general coverage theory for shotgun DNA sequencing.一种用于鸟枪法DNA测序的通用覆盖理论。

J Comput Biol. 2006 Jul-Aug;13(6):1177-96. doi: 10.1089/cmb.2006.13.1177.

Generalized gap model for bacterial artificial chromosome clone fingerprint mapping and shotgun sequencing.用于细菌人工染色体克隆指纹图谱绘制和鸟枪法测序的广义间隙模型

Genome Res. 2002 Dec;12(12):1943-9. doi: 10.1101/gr.655102.

Enrichment of gene-coding sequences in maize by genome filtration.通过基因组过滤富集玉米中的基因编码序列。

Science. 2003 Dec 19;302(5653):2118-20. doi: 10.1126/science.1090047.

Theories and applications for sequencing randomly selected clones.对随机选择克隆进行测序的理论与应用

Genome Res. 2001 Feb;11(2):274-80. doi: 10.1101/gr.gr-1339r.

The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.亚麻（Linum usitatissimum）从头组装的短 shotgun 序列读取的基因组。

Plant J. 2012 Nov;72(3):461-73. doi: 10.1111/j.1365-313X.2012.05093.x. Epub 2012 Aug 14.

A Bayesian nonparametric method for prediction in EST analysis.一种用于EST分析预测的贝叶斯非参数方法。

BMC Bioinformatics. 2007 Sep 14;8:339. doi: 10.1186/1471-2105-8-339.

Construction of bacterial artificial chromosome libraries from the parasitic nematode Brugia malayi and physical mapping of the genome of its Wolbachia endosymbiont.来自寄生线虫马来布鲁线虫的细菌人工染色体文库构建及其沃尔巴克氏体共生菌基因组的物理图谱绘制。

Int J Parasitol. 2004 May;34(6):733-46. doi: 10.1016/j.ijpara.2004.02.001.

Ordering clone libraries in computational biology.计算生物学中的克隆文库排序

J Comput Biol. 1995 Summer;2(2):207-18. doi: 10.1089/cmb.1995.2.207.

Analysis of the quality and utility of random shotgun sequencing at low redundancies.低冗余度下随机鸟枪法测序的质量与效用分析。

Genome Res. 1998 Oct;8(10):1074-84. doi: 10.1101/gr.8.10.1074.

引用本文的文献

Tung Tree (Vernicia fordii) Genome Provides A Resource for Understanding Genome Evolution and Improved Oil Production.油桐（Vernicia fordii）基因组为理解基因组进化和提高产油提供资源。

Genomics Proteomics Bioinformatics. 2019 Dec;17(6):558-575. doi: 10.1016/j.gpb.2019.03.006. Epub 2020 Mar 26.

Sequana coverage: detection and characterization of genomic variations using running median and mixture models.Sequana 覆盖度：使用移动中位数和混合模型检测和描述基因组变异。

Gigascience. 2018 Dec 1;7(12):giy110. doi: 10.1093/gigascience/giy110.

Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens' theorem.基于史蒂文斯定理推广的宏基因组DNA测序覆盖理论。

J Math Biol. 2013 Nov;67(5):1141-61. doi: 10.1007/s00285-012-0586-x. Epub 2012 Sep 11.

PathScan: a tool for discerning mutational significance in groups of putative cancer genes.PathScan：一种用于辨别疑似癌症基因群中突变意义的工具。

Bioinformatics. 2011 Jun 15;27(12):1595-602. doi: 10.1093/bioinformatics/btr193. Epub 2011 Apr 14.

Efficient study design for next generation sequencing.下一代测序的高效研究设计

Genet Epidemiol. 2011 May;35(4):269-77. doi: 10.1002/gepi.20575.

Coverage statistics for sequence census methods.序列普查方法的覆盖统计。

BMC Bioinformatics. 2010 Aug 18;11:430. doi: 10.1186/1471-2105-11-430.

Occupancy modeling, maximum contig size probabilities and designing metagenomics experiments.占据模型、最大连续长度概率和宏基因组学实验设计。

PLoS One. 2010 Jul 29;5(7):e11652. doi: 10.1371/journal.pone.0011652.

The theory of discovering rare variants via DNA sequencing.通过 DNA 测序发现稀有变异的理论。

BMC Genomics. 2009 Oct 20;10:485. doi: 10.1186/1471-2164-10-485.

Aspects of coverage in medical DNA sequencing.医学DNA测序中的覆盖度方面

BMC Bioinformatics. 2008 May 16;9:239. doi: 10.1186/1471-2105-9-239.

Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17.从猪基因组初步测序中获得的经验教训：猪17号染色体8 Mb区域的比较分析。

Genome Biol. 2007;8(8):R168. doi: 10.1186/gb-2007-8-8-r168.

本文引用的文献

Sorghum genome sequencing by methylation filtration.通过甲基化过滤进行高粱基因组测序。

PLoS Biol. 2005 Jan;3(1):e13. doi: 10.1371/journal.pbio.0030013. Epub 2005 Jan 4.

Finishing the euchromatic sequence of the human genome.完成人类基因组的常染色质序列测定。

Nature. 2004 Oct 21;431(7011):931-45. doi: 10.1038/nature03001.

Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space.不同基因富集方法在鉴定和测序玉米基因空间方面的效用。

Plant Physiol. 2004 Oct;136(2):3023-33. doi: 10.1104/pp.104.043323. Epub 2004 Aug 6.

Genomic duplication, fractionation and the origin of regulatory novelty.基因组复制、基因成列与调控新特性的起源

Genetics. 2004 Feb;166(2):935-45. doi: 10.1534/genetics.166.2.935.

Sequencing the maize genome.对玉米基因组进行测序。

Curr Opin Plant Biol. 2004 Apr;7(2):102-7. doi: 10.1016/j.pbi.2004.01.010.

Gap statistics for whole genome shotgun DNA sequencing projects.全基因组鸟枪法DNA测序项目的缺口统计

Bioinformatics. 2004 Jul 10;20(10):1527-34. doi: 10.1093/bioinformatics/bth120. Epub 2004 Feb 12.

Enrichment of gene-coding sequences in maize by genome filtration.通过基因组过滤富集玉米中的基因编码序列。

Science. 2003 Dec 19;302(5653):2118-20. doi: 10.1126/science.1090047.

Maize genome sequencing by methylation filtration.通过甲基化过滤进行玉米基因组测序。

Science. 2003 Dec 19;302(5653):2115-7. doi: 10.1126/science.1091265.

Genes and transposons are differentially methylated in plants, but not in mammals.基因和转座子在植物中存在差异甲基化，但在哺乳动物中则不然。

Genome Res. 2003 Dec;13(12):2658-64. doi: 10.1101/gr.1784803.

Sequencing the Oxytricha trifallax macronuclear genome: a pilot project.测序三裂大草履虫的大核基因组：一个试点项目。

Trends Genet. 2003 Nov;19(11):603-7. doi: 10.1016/j.tig.2003.09.013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展

Extension of Lander-Waterman theory for sequencing filtered DNA libraries.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献