使用 Bambu 从长读 RNA-seq 数据中进行上下文感知的转录本定量。

Context-aware transcript quantification from long-read RNA-seq data with Bambu.

机构信息

Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.

Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore.

出版信息

Nat Methods. 2023 Aug;20(8):1187-1195. doi: 10.1038/s41592-023-01908-w. Epub 2023 Jun 12.

DOI:10.1038/s41592-023-01908-w

PMID:37308696

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10448944/

Abstract

Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.

摘要

大多数转录本定量方法都依赖于固定的参考注释; 然而，转录组是动态的，并且根据上下文，这些静态注释对于某些基因包含非活性异构体，而对于其他基因则不完整。在这里，我们介绍了 Bambu，这是一种基于机器学习的转录本发现方法，可使用长读 RNA 测序实现针对感兴趣上下文的定量分析。为了识别新的转录本，Bambu 估计了新的发现率，该方法用一个可解释的、经过精确校准的参数替代了任意的每个样本阈值。Bambu 保留了全长和唯一的读取计数，可在存在非活性异构体的情况下实现准确的定量。与现有的转录本发现方法相比，Bambu 在不牺牲敏感性的情况下实现了更高的精度。我们表明，上下文感知注释可提高新型和已知转录本的定量分析。我们应用 Bambu 对人类胚胎干细胞中重复的 HERVH-LTR7 逆转录转座子的异构体进行定量，展示了针对特定上下文的转录本表达分析的能力。

相似文献

Context-aware transcript quantification from long-read RNA-seq data with Bambu.使用 Bambu 从长读 RNA-seq 数据中进行上下文感知的转录本定量。

Nat Methods. 2023 Aug;20(8):1187-1195. doi: 10.1038/s41592-023-01908-w. Epub 2023 Jun 12.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.RSEM：有或无参考基因组的 RNA-Seq 数据的准确转录本定量。

BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.

ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data.ESPRESSO：从易错的长读 RNA-seq 数据中稳健地发现和定量转录本异构体。

Sci Adv. 2023 Jan 20;9(3):eabq5072. doi: 10.1126/sciadv.abq5072.

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity.全长异构体拼接测序解析癌症转录组复杂性。

BMC Genomics. 2024 Jan 29;25(1):122. doi: 10.1186/s12864-024-10021-x.

TEQUILA-seq: a versatile and low-cost method for targeted long-read RNA sequencing.龙舌兰测序：一种用于靶向长读 RNA 测序的多功能且低成本的方法。

Nat Commun. 2023 Aug 8;14(1):4760. doi: 10.1038/s41467-023-40083-6.

AtRTD - a comprehensive reference transcript dataset resource for accurate quantification of transcript-specific expression in Arabidopsis thaliana.AtRTD——一个用于准确量化拟南芥转录本特异性表达的全面参考转录本数据集资源。

New Phytol. 2015 Oct;208(1):96-101. doi: 10.1111/nph.13545. Epub 2015 Jun 25.

Transcript Identification Through Long-Read Sequencing.通过长读测序进行转录本鉴定。

Methods Mol Biol. 2021;2284:531-541. doi: 10.1007/978-1-0716-1307-8_29.

Transcript Profiling Using Long-Read Sequencing Technologies.使用长读长测序技术进行转录本分析

Methods Mol Biol. 2018;1783:121-147. doi: 10.1007/978-1-4939-7834-2_6.

A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis.利用 Iso-seq 分析的新方法进行高分辨率的单个分子测序的拟南芥转录组。

Genome Biol. 2022 Jul 7;23(1):149. doi: 10.1186/s13059-022-02711-0.

Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data.通过长度对 RNA 进行分区可提高基于短读长 RNA-seq 数据的转录组重构。

Nat Biotechnol. 2022 May;40(5):741-750. doi: 10.1038/s41587-021-01136-7. Epub 2022 Jan 10.

引用本文的文献

DKC1-mediated pseudouridylation of rRNA targets hnRNP A1 to sustain IRES-dependent translation and ATF4-driven metabolic adaptation.DKC1介导的核糖体RNA假尿苷化作用将不均一核糖核蛋白A1作为靶点，以维持内部核糖体进入位点依赖性翻译及激活转录因子4驱动的代谢适应。

Sci Adv. 2025 Aug 29;11(35):eadv9401. doi: 10.1126/sciadv.adv9401.

Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.通过使用TranSigner准确分配长RNA测序读数来增强转录组表达定量。

Genome Biol. 2025 Aug 28;26(1):257. doi: 10.1186/s13059-025-03723-2.

Activation of Pvt1b isoform contributes to local Pvt1 abundance to repress Myc during stress.Pvt1b亚型的激活有助于在应激期间使局部Pvt1丰度增加，从而抑制Myc。

PLoS Genet. 2025 Jul 31;21(7):e1011790. doi: 10.1371/journal.pgen.1011790. eCollection 2025 Jul.

Long-read RNA-sequencing reveals transcript-specific regulation in human-derived cortical neurons.长读长RNA测序揭示了人源皮质神经元中特定转录本的调控机制。

Open Biol. 2025 Jul;15(7):250200. doi: 10.1098/rsob.250200. Epub 2025 Jul 30.

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中：进展、挑战与未来方向。

Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.

Single cell and spatial alternative splicing analysis with Nanopore long read sequencing.利用纳米孔长读长测序进行单细胞和空间可变剪接分析。

Nat Commun. 2025 Jul 19;16(1):6654. doi: 10.1038/s41467-025-60902-2.

Long-read RNA sequencing unveils a novel cryptic exon in MNAT1 along with its full-length transcript structure in TDP-43 proteinopathy.长读长RNA测序揭示了MNAT1中一个新的隐蔽外显子及其在TDP-43蛋白病中的全长转录本结构。

Commun Biol. 2025 Jul 16;8(1):1056. doi: 10.1038/s42003-025-08463-4.

APALORD: An R-based tool for differential alternative polyadenylation analysis of long-read RNA-seq data.APALORD：一种用于长读长RNA测序数据差异可变聚腺苷酸化分析的基于R的工具。

bioRxiv. 2025 Jun 17:2025.06.11.658931. doi: 10.1101/2025.06.11.658931.

Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.皇带鱼：增强的概率模型可提高长读长转录组定量的准确性。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i304-i313. doi: 10.1093/bioinformatics/btaf240.

Quantitative isoform profiling using deep coverage long-read RNA sequencing across early endothelial differentiation.使用深度覆盖长读长RNA测序对早期内皮细胞分化进行定量异构体分析。

bioRxiv. 2025 Jun 2:2025.05.30.656561. doi: 10.1101/2025.05.30.656561.

本文引用的文献

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.系统评估长读 RNA-seq 方法在转录本鉴定和定量中的应用。

Nat Methods. 2024 Jul;21(7):1349-1363. doi: 10.1038/s41592-024-02298-3. Epub 2024 Jun 7.

Accurate isoform discovery with IsoQuant using long reads.利用长读长 IsoQuant 进行准确的异构体发现。

Nat Biotechnol. 2023 Jul;41(7):915-918. doi: 10.1038/s41587-022-01565-y. Epub 2023 Jan 2.

Accurate expression quantification from nanopore direct RNA sequencing with NanoCount.利用 NanoCount 从纳米孔直接 RNA 测序中进行准确的表达定量。

Nucleic Acids Res. 2022 Feb 28;50(4):e19. doi: 10.1093/nar/gkab1129.

Locus-specific expression of transposable elements in single cells with CELLO-seq.CELLO-seq 技术在单细胞中对转座元件的基因座特异性表达分析

Nat Biotechnol. 2022 Apr;40(4):546-554. doi: 10.1038/s41587-021-01093-1. Epub 2021 Nov 15.

Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing.利用长读测序技术全面描述人类和小鼠单细胞全长异构体。

Genome Biol. 2021 Nov 11;22(1):310. doi: 10.1186/s13059-021-02525-6.

Identification of high-confidence human poly(A) RNA isoform scaffolds using nanopore sequencing.利用纳米孔测序鉴定高可信度的人类 poly(A) RNA 异构体支架。

RNA. 2022 Feb;28(2):162-176. doi: 10.1261/rna.078703.121. Epub 2021 Nov 2.

LIQA: long-read isoform quantification and analysis.LIQA：长读 isoform 定量分析。

Genome Biol. 2021 Jun 17;22(1):182. doi: 10.1186/s13059-021-02399-8.

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome.长读 cDNA 测序鉴定人类转录组中的功能假基因。

Genome Biol. 2021 May 10;22(1):146. doi: 10.1186/s13059-021-02369-0.

Illuminating the dark side of the human transcriptome with long read transcript sequencing.利用长读转录组测序揭示人类转录组的暗面。

BMC Genomics. 2020 Oct 30;21(1):751. doi: 10.1186/s12864-020-07123-7.

Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data.Terminus 能够从 RNA-seq 数据中发现数据驱动的、稳健的转录组。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i102-i110. doi: 10.1093/bioinformatics/btaa448.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验