Suppr超能文献

使用配对缺口读段进行人类基因组重测序的计算技术。

Computational techniques for human genome resequencing using mated gapped reads.

作者信息

Carnevali Paolo, Baccash Jonathan, Halpern Aaron L, Nazarenko Igor, Nilsen Geoffrey B, Pant Krishna P, Ebert Jessica C, Brownley Anushka, Morenzoni Matt, Karpinchyk Vitali, Martin Bruce, Ballinger Dennis G, Drmanac Radoje

机构信息

Complete Genomics Inc., Mountain View, California 94043, USA.

出版信息

J Comput Biol. 2012 Mar;19(3):279-92. doi: 10.1089/cmb.2011.0201. Epub 2011 Dec 16.

Abstract

Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.

摘要

基于自组装DNA纳米阵列的非连锁碱基读数最近已成为一种有前景的方法,用于人类基因组的低成本、高质量重测序。由于这些配对读数的独特特性,现有的重测序组装计算方法,如基于图谱一致调用的方法,不足以进行准确的变异调用。我们描述了为准确调用单核苷酸多态性(SNP)、短替换和插入缺失(<100bp)而开发的新型计算方法;同样的方法适用于评估假设的更大的结构变异。我们使用一种优化过程,该过程迭代调整基因组序列,以在给定观察到的读数的情况下最大化其后验概率。对于每个候选序列,使用贝叶斯统计和简单的读数生成模型以及简化假设来计算该概率,这些假设使问题在计算上易于处理。优化过程迭代应用单碱基替换、插入和缺失,直到收敛到最优二倍体序列。一种基于De Bruijn图的局部从头组装程序被用于为优化过程提供种子,以减少收敛到局部最优的机会。最后,应用基于相关性的过滤器来降低由参考基因组中重复区域的存在导致的假阳性率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验