一种在 GRO-Seq 中检测新生 RNA 转录本的无注释算法。

An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26.

DOI:10.1109/TCBB.2016.2520919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5667649/

Abstract

We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

摘要

我们提出了一种快速而简单的算法，用于检测全局核 RNA 捕获测序（GRO-seq）中的新生 RNA 转录。GRO-seq 是一种相对较新的方案，可从活跃的聚合酶中捕获新生转录本，直接提供真实转录的读数。大多数传统的检测方法，如 RNA-seq，测量的是稳定状态的 RNA 水平，这些水平受到转录、转录后加工和 RNA 稳定性的影响。然而，GRO-seq 数据提出了独特的分析挑战，这些挑战才刚刚开始得到解决。在这里，我们描述了一种新的算法，Fast Read Stitcher（FStitch），它利用两种流行的机器学习技术，隐马尔可夫模型和逻辑回归，来对基因组的哪些区域进行转录进行分类。给定一个小的用户定义的训练集，我们的算法是准确的，对不同的读深具有鲁棒性，与注释无关，而且速度很快。在没有先验注释的情况下对 GRO-seq 数据进行分析，揭示了转录过程几个方面的惊人新见解。

相似文献

An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.一种在 GRO-Seq 中检测新生 RNA 转录本的无注释算法。

IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1070-1081. doi: 10.1109/TCBB.2016.2520919. Epub 2016 Jan 26.

Vespucci: a system for building annotated databases of nascent transcripts.维斯帕西：一种用于构建新生转录本注释数据库的系统。

Nucleic Acids Res. 2014 Feb;42(4):2433-47. doi: 10.1093/nar/gkt1237. Epub 2013 Dec 4.

Global Run-on Sequencing (GRO-Seq).全球 RNA 延伸测序（GRO-Seq）。

Methods Mol Biol. 2021;2351:25-39. doi: 10.1007/978-1-0716-1597-3_2.

Global Run-On Sequencing (GRO-Seq).全球连续转录测序（GRO-Seq）。

Methods Mol Biol. 2017;1468:111-20. doi: 10.1007/978-1-4939-4035-6_9.

GRO-seq, A Tool for Identification of Transcripts Regulating Gene Expression.GRO-seq：一种用于鉴定调控基因表达的转录本的工具

Methods Mol Biol. 2017;1543:45-55. doi: 10.1007/978-1-4939-6716-2_3.

Computational Approaches for Mining GRO-Seq Data to Identify and Characterize Active Enhancers.挖掘GRO-Seq数据以识别和表征活性增强子的计算方法

Methods Mol Biol. 2017;1468:121-38. doi: 10.1007/978-1-4939-4035-6_10.

Nascent RNA sequencing reveals distinct features in plant transcription.新生RNA测序揭示了植物转录中的独特特征。

Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12316-12321. doi: 10.1073/pnas.1603217113. Epub 2016 Oct 11.

Protocol for affordable and efficient profiling of nascent RNAs in bread wheat using GRO-seq.使用 GRO-seq 对小麦中新生 RNA 进行经济高效的分析方案。

STAR Protoc. 2022 Sep 16;3(3):101657. doi: 10.1016/j.xpro.2022.101657. Epub 2022 Sep 2.

CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.CodingQuarry：利用RNA测序转录本对真菌基因组进行高精度隐马尔可夫模型基因预测。

BMC Genomics. 2015 Mar 11;16(1):170. doi: 10.1186/s12864-015-1344-4.

Multi-Genome Annotation with AUGUSTUS.使用AUGUSTUS进行多基因组注释。

Methods Mol Biol. 2019;1962:139-160. doi: 10.1007/978-1-4939-9173-0_8.

引用本文的文献

eNRSA: a faster and more powerful approach for nascent transcriptome analysis.eNRSA：一种用于新生转录组分析的更快、更强大的方法。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf071.

Atlas of nascent RNA transcripts reveals tissue-specific enhancer to gene linkages.新生RNA转录图谱揭示了组织特异性增强子与基因的联系。

BMC Genomics. 2025 Apr 25;26(1):406. doi: 10.1186/s12864-025-11568-z.

LIET model: capturing the kinetics of RNA polymerase from loading to termination.LIET模型：捕捉RNA聚合酶从装载到终止的动力学过程。

Nucleic Acids Res. 2025 Apr 10;53(7). doi: 10.1093/nar/gkaf246.

TF Profiler: a transcription factor inference method that broadly measures transcription factor activity and identifies mechanistically distinct networks.转录因子分析器：一种广泛测量转录因子活性并识别机制上不同网络的转录因子推断方法。

Genome Biol. 2025 Apr 9;26(1):92. doi: 10.1186/s13059-025-03545-2.

Atlas of nascent RNA transcripts reveals enhancer to gene linkages.新生RNA转录图谱揭示增强子与基因的联系。

bioRxiv. 2023 Dec 8:2023.12.07.570626. doi: 10.1101/2023.12.07.570626.

Deconvolution of multiplexed transcriptional responses to wood smoke particles defines rapid aryl hydrocarbon receptor signaling dynamics.解卷积对木烟尘颗粒的多重转录反应定义了快速芳基烃受体信号转导动力学。

J Biol Chem. 2021 Oct;297(4):101147. doi: 10.1016/j.jbc.2021.101147. Epub 2021 Sep 11.

Liver Transcriptome Dynamics During Hibernation Are Shaped by a Shifting Balance Between Transcription and RNA Stability.冬眠期间肝脏转录组动力学受转录与RNA稳定性之间平衡变化的影响。

Front Physiol. 2021 May 21;12:662132. doi: 10.3389/fphys.2021.662132. eCollection 2021.

PEPPRO: quality control and processing of nascent RNA profiling data.PEPPRO：新生 RNA 谱数据的质量控制和处理。

Genome Biol. 2021 May 15;22(1):155. doi: 10.1186/s13059-021-02349-4.

Global Analyses to Identify Direct Transcriptional Targets of p53.全球分析鉴定 p53 的直接转录靶标。

Methods Mol Biol. 2021;2267:19-56. doi: 10.1007/978-1-0716-1217-0_3.

Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq data.结合信号和序列检测 ATAC-seq 数据中的 RNA 聚合酶起始。

PLoS One. 2020 Apr 30;15(4):e0232332. doi: 10.1371/journal.pone.0232332. eCollection 2020.

本文引用的文献

groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data.groHMM：一种用于从全局运行测序数据中识别未注释和细胞类型特异性转录单元的计算工具。

BMC Bioinformatics. 2015 Jul 16;16:222. doi: 10.1186/s12859-015-0656-3.

Identification of active transcriptional regulatory elements from GRO-seq data.从基因表达连续性分析（GRO-seq）数据中鉴定活性转录调控元件

Nat Methods. 2015 May;12(5):433-8. doi: 10.1038/nmeth.3329. Epub 2015 Mar 23.

Comparative overview of RNA polymerase II and III transcription cycles, with focus on RNA polymerase III termination and reinitiation.RNA聚合酶II和III转录周期的比较概述，重点关注RNA聚合酶III的终止和重新起始。

Transcription. 2014;5(1):e27639. doi: 10.4161/trns.27369.

UniProt: a hub for protein information.通用蛋白质数据库（UniProt）：蛋白质信息中心。

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

Determination of in vivo RNA kinetics using RATE-seq.使用RATE-seq测定体内RNA动力学。

RNA. 2014 Oct;20(10):1645-52. doi: 10.1261/rna.045104.114. Epub 2014 Aug 26.

Global analysis of p53-regulated transcription identifies its direct targets and unexpected regulatory mechanisms.p53 调控转录的全局分析确定了其直接靶点及意外的调控机制。

Elife. 2014 May 27;3:e02200. doi: 10.7554/eLife.02200.

Transcriptional enhancers: from properties to genome-wide predictions.转录增强子：从特性到全基因组预测。

Nat Rev Genet. 2014 Apr;15(4):272-86. doi: 10.1038/nrg3682. Epub 2014 Mar 11.

Active enhancer positions can be accurately predicted from chromatin marks and collective sequence motif data.活性增强子的位置可以根据染色质标记和集体序列基序数据准确预测。

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S16. doi: 10.1186/1752-0509-7-S6-S16. Epub 2013 Dec 13.

Vespucci: a system for building annotated databases of nascent transcripts.维斯帕西：一种用于构建新生转录本注释数据库的系统。

Nucleic Acids Res. 2014 Feb;42(4):2433-47. doi: 10.1093/nar/gkt1237. Epub 2013 Dec 4.

RefSeq: an update on mammalian reference sequences.RefSeq：哺乳动物参考序列的更新。

Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. doi: 10.1093/nar/gkt1114. Epub 2013 Nov 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验