Suppr超能文献

RNA 测序的概率错误纠正。

Probabilistic error correction for RNA sequencing.

机构信息

Machine Learning Department, Carnegie Mellon University, 5000 Forbes Avenue Pittsburgh, PA 15217, USA.

出版信息

Nucleic Acids Res. 2013 May 1;41(10):e109. doi: 10.1093/nar/gkt215. Epub 2013 Apr 4.

Abstract

Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

摘要

RNA 测序(RNA-Seq)技术极大地推动了转录组学领域的发展,但获得的reads 通常存在错误。读段错误校正对我们准确组装转录本的能力有很大的影响。对于缺乏参考基因组的从头转录组分析尤其如此。目前开发的用于 DNA 序列数据的读段错误校正方法无法处理非均匀丰度、多态性和可变剪接的重叠效应。在这里,我们提出了一种基于隐马尔可夫模型(HMM)的方法 SEquencing Error CorrEction in Rna-seq data(SEECER),它是第一个成功解决这些问题的方法。SEECER 可以有效地学习数以十万计的 HMM,并利用这些 HMM 来校正测序错误。使用人类 RNA-Seq 数据,我们表明 SEECER 在基因组读对齐和组装准确性方面大大优于以前的方法。为了说明 SEECER 在从头转录组研究中的有用性,我们生成了新的 RNA-Seq 数据来研究海参 Parastichopus parvimensis 的发育。我们校正组装的转录本为海参发育的两个重要阶段提供了新的见解。将组装的转录本与其他物种已知的转录本进行比较,还揭示了海参特有的新转录本,其中一些我们已经通过实验验证。支持网站:http://sb.cs.cmu.edu/seecer/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/3664804/2bac2c247fba/gkt215f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验