使用隐马尔可夫模型分析纳米孔数据。

Analysis of nanopore data using hidden Markov models.

作者信息

Schreiber Jacob, Karplus Kevin

机构信息

Nanopore Group, Department of Biomolecular Engineering, University of California Santa Cruz, CA 95064, USA.

出版信息

Bioinformatics. 2015 Jun 15;31(12):1897-903. doi: 10.1093/bioinformatics/btv046. Epub 2015 Feb 3.

DOI:10.1093/bioinformatics/btv046

PMID:25649617

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4553831/

Abstract

MOTIVATION

Nanopore-based sequencing techniques can reconstruct properties of biosequences by analyzing the sequence-dependent ionic current steps produced as biomolecules pass through a pore. Typically this involves alignment of new data to a reference, where both reference construction and alignment have been performed by hand.

RESULTS

We propose an automated method for aligning nanopore data to a reference through the use of hidden Markov models. Several features that arise from prior processing steps and from the class of enzyme used can be simply incorporated into the model. Previously, the M2MspA nanopore was shown to be sensitive enough to distinguish between cytosine, methylcytosine and hydroxymethylcytosine. We validated our automated methodology on a subset of that data by automatically calculating an error rate for the distinction between the three cytosine variants and show that the automated methodology produces a 2-3% error rate, lower than the 10% error rate from previous manual segmentation and alignment.

AVAILABILITY AND IMPLEMENTATION

The data, output, scripts and tutorials replicating the analysis are available at https://github.com/UCSCNanopore/Data/tree/master/Automation.

摘要

动机

基于纳米孔的测序技术可以通过分析生物分子穿过孔时产生的与序列相关的离子电流步骤来重建生物序列的特性。通常，这涉及将新数据与参考序列进行比对，而参考序列的构建和比对都是手动完成的。

结果

我们提出了一种通过使用隐马尔可夫模型将纳米孔数据与参考序列进行比对的自动化方法。先前处理步骤和所用酶类产生的几个特征可以简单地纳入模型。此前，已证明M2MspA纳米孔足够灵敏，能够区分胞嘧啶、甲基胞嘧啶和羟甲基胞嘧啶。我们通过自动计算三种胞嘧啶变体之间区分的错误率，在该数据的一个子集上验证了我们的自动化方法，并表明该自动化方法产生的错误率为2% - 3%，低于先前手动分割和比对的10%的错误率。

可用性和实现方式

复制该分析的数据、输出、脚本和教程可在https://github.com/UCSCNanopore/Data/tree/master/Automation获取。

相似文献

Analysis of nanopore data using hidden Markov models.

Bioinformatics. 2015 Jun 15;31(12):1897-903. doi: 10.1093/bioinformatics/btv046. Epub 2015 Feb 3.

Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands.

Proc Natl Acad Sci U S A. 2013 Nov 19;110(47):18910-5. doi: 10.1073/pnas.1310615110. Epub 2013 Oct 28.

Mapping DNA methylation with high-throughput nanopore sequencing.

Nat Methods. 2017 Apr;14(4):411-413. doi: 10.1038/nmeth.4189. Epub 2017 Feb 20.

Detecting DNA cytosine methylation using nanopore sequencing.

Nat Methods. 2017 Apr;14(4):407-410. doi: 10.1038/nmeth.4184. Epub 2017 Feb 20.

QAlign: aligning nanopore reads accurately using current-level modeling.

Bioinformatics. 2021 May 5;37(5):625-633. doi: 10.1093/bioinformatics/btaa875.

Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA.

Proc Natl Acad Sci U S A. 2013 Nov 19;110(47):18904-9. doi: 10.1073/pnas.1310240110. Epub 2013 Oct 28.

Bisulfite methylation profiling of large genomes.

Epigenomics. 2010 Apr;2(2):209-20. doi: 10.2217/epi.10.6.

Identification of epigenetic DNA modifications with a protein nanopore.

Chem Commun (Camb). 2010 Nov 21;46(43):8195-7. doi: 10.1039/c0cc02864a. Epub 2010 Oct 6.

Nonenzymatic labeling of 5-hydroxymethylcytosine in nanopore sequencing.

Chembiochem. 2013 Jul 22;14(11):1289-90. doi: 10.1002/cbic.201300342. Epub 2013 Jun 18.

T-S2Inet: Transformer-based sequence-to-image network for accurate nanopore sequence recognition.

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae083.

引用本文的文献

The Potential of Nanopore Technologies in Peptide and Protein Sensing for Biomarker Detection.

Biosensors (Basel). 2025 Aug 16;15(8):540. doi: 10.3390/bios15080540.

A generalized protein identification method for novel and diverse sequencing technologies.

NAR Genom Bioinform. 2024 Sep 18;6(3):lqae126. doi: 10.1093/nargab/lqae126. eCollection 2024 Sep.

Machine learning empowered next generation DNA sequencing: perspective and prospectus.

Chem Sci. 2024 Jul 8;15(31):12169-12188. doi: 10.1039/d4sc01714e. eCollection 2024 Aug 7.

T-S2Inet: Transformer-based sequence-to-image network for accurate nanopore sequence recognition.

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae083.

Unidirectional single-file transport of full-length proteins through a nanopore.

Nat Biotechnol. 2023 Aug;41(8):1130-1139. doi: 10.1038/s41587-022-01598-3. Epub 2023 Jan 9.

Active learning for efficient analysis of high-throughput nanopore data.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac764.

Calling the amino acid sequence of a protein/peptide from the nanospectrum produced by a sub-nanometer diameter pore.

Sci Rep. 2022 Oct 25;12(1):17853. doi: 10.1038/s41598-022-22305-x.

Discrimination of RNA fiber structures using solid-state nanopores.

Nanoscale. 2022 May 16;14(18):6866-6875. doi: 10.1039/d1nr08002d.

[Unsupervised deep learning for identifying the O -carboxymethyl guanine by nanopore sequencing].

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2022 Feb 25;39(1):139-148. doi: 10.7507/1001-5515.202104068.

A Guide to Signal Processing Algorithms for Nanopore Sensors.

ACS Sens. 2021 Oct 22;6(10):3536-3555. doi: 10.1021/acssensors.1c01618. Epub 2021 Oct 4.

本文引用的文献

Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands.

Proc Natl Acad Sci U S A. 2013 Nov 19;110(47):18910-5. doi: 10.1073/pnas.1310615110. Epub 2013 Oct 28.

Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA.

Proc Natl Acad Sci U S A. 2013 Nov 19;110(47):18904-9. doi: 10.1073/pnas.1310240110. Epub 2013 Oct 28.

DNA base-calling from a nanopore using a Viterbi algorithm.

Biophys J. 2012 May 16;102(10):L37-9. doi: 10.1016/j.bpj.2012.04.009. Epub 2012 May 15.

Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision.

Nat Biotechnol. 2012 Feb 14;30(4):344-8. doi: 10.1038/nbt.2147.

Nucleotide discrimination with DNA immobilized in the MspA nanopore.

PLoS One. 2011;6(10):e25723. doi: 10.1371/journal.pone.0025723. Epub 2011 Oct 4.

Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine.

Science. 2011 Sep 2;333(6047):1300-3. doi: 10.1126/science.1210597. Epub 2011 Jul 21.

Processive replication of single DNA molecules in a nanopore catalyzed by phi29 DNA polymerase.

J Am Chem Soc. 2010 Dec 22;132(50):17961-72. doi: 10.1021/ja1087612. Epub 2010 Dec 1.

SAM-T08, HMM-based protein structure prediction.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W492-7. doi: 10.1093/nar/gkp403. Epub 2009 May 29.

Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis.

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-8-S7-S12.

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

J Mol Biol. 2001 Jan 19;305(3):567-80. doi: 10.1006/jmbi.2000.4315.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用隐马尔可夫模型分析纳米孔数据。

Analysis of nanopore data using hidden Markov models.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献