Suppr超能文献

用于对过滤后的DNA文库进行测序的兰德-沃特曼理论扩展

Extension of Lander-Waterman theory for sequencing filtered DNA libraries.

作者信息

Wendl Michael C, Barbazuk W Brad

机构信息

Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA.

出版信息

BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245.

Abstract

BACKGROUND

The degree to which conventional DNA sequencing techniques will be successful for highly repetitive genomes is unclear. Investigators are therefore considering various filtering methods to select against high-copy sequence in DNA clone libraries. The standard model for random sequencing, Lander-Waterman theory, does not account for two important issues in such libraries, discontinuities and position-based sampling biases (the so-called "edge effect"). We report an extension of the theory for analyzing such configurations.

RESULTS

The edge effect cannot be neglected in most cases. Specifically, rates of coverage and gap reduction are appreciably lower than those for conventional libraries, as predicted by standard theory. Performance decreases as read length increases relative to island size. Although opposite of what happens in a conventional library, this apparent paradox is readily explained in terms of the edge effect. The model agrees well with prototype gene-tagging experiments for Zea mays and Sorghum bicolor. Moreover, the associated density function suggests well-defined probabilistic milestones for the number of reads necessary to capture a given fraction of the gene space. An exception for applying standard theory arises if sequence redundancy is less than about 1-fold. Here, evolution of the random quantities is independent of library gaps and edge effects. This observation effectively validates the practice of using standard theory to estimate the genic enrichment of a library based on light shotgun sequencing.

CONCLUSION

Coverage performance using a filtered library is significantly lower than that for an equivalent-sized conventional library, suggesting that directed methods may be more critical for the former. The proposed model should be useful for analyzing future projects.

摘要

背景

传统DNA测序技术在高度重复基因组上取得成功的程度尚不清楚。因此,研究人员正在考虑各种过滤方法,以在DNA克隆文库中筛选出高拷贝序列。随机测序的标准模型,即兰德-沃特曼理论,并未考虑此类文库中的两个重要问题,即不连续性和基于位置的抽样偏差(所谓的“边缘效应”)。我们报告了该理论的一个扩展,用于分析此类结构。

结果

在大多数情况下,边缘效应不可忽视。具体而言,覆盖率和间隙减少率明显低于标准理论预测的传统文库。随着读长相对于片段大小增加,性能会下降。尽管这与传统文库中的情况相反,但这种明显的矛盾可以用边缘效应很容易地解释。该模型与玉米和高粱的原型基因标记实验结果非常吻合。此外,相关的密度函数为捕获给定比例的基因空间所需的读数数量提出了明确的概率里程碑。如果序列冗余度小于约1倍,则应用标准理论会出现例外情况。在此情况下,随机量的演变与文库间隙和边缘效应无关。这一观察结果有效地验证了基于轻度鸟枪法测序使用标准理论来估计文库基因富集度的做法。

结论

使用过滤文库的覆盖性能明显低于同等大小的传统文库,这表明定向方法对前者可能更为关键。所提出的模型应有助于分析未来的项目。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/138f/1280921/11b587a0bdbe/1471-2105-6-245-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验