Suppr超能文献

一种用于鸟枪法DNA测序的通用覆盖理论。

A general coverage theory for shotgun DNA sequencing.

作者信息

Wendl Michael C

机构信息

Genome Sequencing Center, Washington University, St. Louis, Missouri 63108, USA.

出版信息

J Comput Biol. 2006 Jul-Aug;13(6):1177-96. doi: 10.1089/cmb.2006.13.1177.

Abstract

The classical theory of shotgun DNA sequencing accounts for neither the placement dependencies that are a fundamental consequence of the forward-reverse sequencing strategy, nor the edge effect that arises for small to moderate-sized genomic targets. These phenomena are relevant to a number of sequencing scenarios, including large-insert BAC and fosmid clones, filtered genomic libraries, and macro-nuclear chromosomes. Here, we report a model that considers these two effects and provides both the expected value of coverage and its variance. Comparison to methyl-filtered maize data shows significant improvement over classical theory. The model is used to analyze coverage performance over a range of small to moderately-sized genomic targets. We find that the read pairing effect and the edge effect interact in a non-trivial fashion. Shorter reads give superior coverage per unit sequence depth relative to longer ones. In principle, end-sequences can be optimized with respect to template insert length; however, optimal performance is unlikely to be realized in most cases because of inherent size variation in any set of targets. Conversely, single-stranded reads exhibit roughly the same coverage attributes as optimized end-reads. Although linking information is lost, single-stranded data should not pose a significant assembly liability if the target represents predominantly low-copy sequence. We also find that random sequencing should be halted at substantially lower redundancies than those now associated with larger projects. Given the enormous amount of data generated per cycle on pyro-sequencing instruments, this observation suggests devising schemes to split each run cycle between twoor more projects. This would prevent over-sequencing and would further leverage the pyrosequencing method.

摘要

鸟枪法DNA测序的经典理论既没有考虑到作为正反测序策略基本结果的位置依赖性,也没有考虑到中小规模基因组靶标所产生的边缘效应。这些现象与许多测序情况相关,包括大插入片段的BAC和fosmid克隆、经过筛选的基因组文库以及大核染色体。在此,我们报告了一个考虑这两种效应的模型,该模型给出了覆盖度的期望值及其方差。与甲基化筛选的玉米数据进行比较表明,该模型相对于经典理论有显著改进。该模型用于分析一系列中小规模基因组靶标的覆盖性能。我们发现读段配对效应和边缘效应以一种复杂的方式相互作用。相对于较长读段,较短读段每单位序列深度具有更好的覆盖度。原则上,末端序列可以根据模板插入长度进行优化;然而,由于任何一组靶标中固有的大小变异,在大多数情况下不太可能实现最佳性能。相反,单链读段表现出与优化后的末端读段大致相同的覆盖属性。尽管连接信息丢失了,但如果靶标主要代表低拷贝序列,单链数据不应给组装带来重大负担。我们还发现,随机测序应在比目前与大型项目相关的冗余度低得多的情况下停止。鉴于焦磷酸测序仪器每个循环产生的海量数据,这一观察结果建议设计方案,将每个运行循环分配给两个或更多项目。这将防止过度测序,并进一步利用焦磷酸测序方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验