Suppr超能文献

PyroHMMsnp:一种用于 Ion Torrent 和 454 测序数据的 SNP 调用程序。

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data.

机构信息

Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.

出版信息

Nucleic Acids Res. 2013 Jul;41(13):e136. doi: 10.1093/nar/gkt372. Epub 2013 May 21.

Abstract

Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/).

摘要

454 和 Ion Torrent 测序仪都能够产生大量的长高质量测序读段。然而,由于这两种方法在一个循环中都对同聚物进行测序,因此它们都受到同聚物不确定性和掺入不同步的影响。在映射过程中,这些测序错误可能会在同聚物周围移动比对,从而导致不正确的错配,这已成为准确检测单核苷酸多态性(SNP)的关键障碍。在本文中,我们提出了一个隐马尔可夫模型(HMM),通过过度调用、调用不足、插入和删除来统计和明确地描述同聚物测序错误。我们使用层次模型来描述测序和碱基调用过程,并通过期望最大化算法从重测序数据中估计 HMM 的参数。基于 HMM,我们开发了一种基于重排的 SNP 调用程序,称为 PyroHMMsnp,它根据错误模型重新排列同聚物周围的读序列,然后通过贝叶斯方法推断潜在的基因型。模拟实验表明,与其他工具相比,PyroHMMsnp 在各种测序覆盖率下的敏感性、特异性和 F1 度量方面的性能都非常出色。对人类重测序数据的分析表明,PyroHMMsnp 比 Samtools 预测出 12.9%更多的 SNP,同时特异性更高。(http://code.google.com/p/pyrohmmsnp/)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3db4/3711422/89800449c585/gkt372f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验