Suppr超能文献

一种在 GRO-Seq 中检测新生 RNA 转录本的无注释算法。

An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26.

Abstract

We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

摘要

我们提出了一种快速而简单的算法,用于检测全局核 RNA 捕获测序(GRO-seq)中的新生 RNA 转录。GRO-seq 是一种相对较新的方案,可从活跃的聚合酶中捕获新生转录本,直接提供真实转录的读数。大多数传统的检测方法,如 RNA-seq,测量的是稳定状态的 RNA 水平,这些水平受到转录、转录后加工和 RNA 稳定性的影响。然而,GRO-seq 数据提出了独特的分析挑战,这些挑战才刚刚开始得到解决。在这里,我们描述了一种新的算法,Fast Read Stitcher(FStitch),它利用两种流行的机器学习技术,隐马尔可夫模型和逻辑回归,来对基因组的哪些区域进行转录进行分类。给定一个小的用户定义的训练集,我们的算法是准确的,对不同的读深具有鲁棒性,与注释无关,而且速度很快。在没有先验注释的情况下对 GRO-seq 数据进行分析,揭示了转录过程几个方面的惊人新见解。

相似文献

1
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.一种在 GRO-Seq 中检测新生 RNA 转录本的无注释算法。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26.
3
Global Run-on Sequencing (GRO-Seq).全球 RNA 延伸测序(GRO-Seq)。
Methods Mol Biol. 2021;2351:25-39. doi: 10.1007/978-1-0716-1597-3_2.
4
Global Run-On Sequencing (GRO-Seq).全球连续转录测序(GRO-Seq)。
Methods Mol Biol. 2017;1468:111-20. doi: 10.1007/978-1-4939-4035-6_9.
7
Nascent RNA sequencing reveals distinct features in plant transcription.新生RNA测序揭示了植物转录中的独特特征。
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12316-12321. doi: 10.1073/pnas.1603217113. Epub 2016 Oct 11.
10
Multi-Genome Annotation with AUGUSTUS.使用AUGUSTUS进行多基因组注释。
Methods Mol Biol. 2019;1962:139-160. doi: 10.1007/978-1-4939-9173-0_8.

引用本文的文献

本文引用的文献

4
UniProt: a hub for protein information.通用蛋白质数据库(UniProt):蛋白质信息中心。
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.
5
Determination of in vivo RNA kinetics using RATE-seq.使用RATE-seq测定体内RNA动力学。
RNA. 2014 Oct;20(10):1645-52. doi: 10.1261/rna.045104.114. Epub 2014 Aug 26.
10
RefSeq: an update on mammalian reference sequences.RefSeq:哺乳动物参考序列的更新。
Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. doi: 10.1093/nar/gkt1114. Epub 2013 Nov 19.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验