IntAPT：从多个 RNA-seq 谱中整合表型特异转录本的组装。

IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles.

机构信息

Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.

出版信息

Bioinformatics. 2021 May 5;37(5):650-658. doi: 10.1093/bioinformatics/btaa852.

DOI:10.1093/bioinformatics/btaa852

PMID:33016988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8097681/

Abstract

MOTIVATION

High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure.

RESULTS

We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance.

AVAILABILITY AND IMPLEMENTATION

The IntAPT package is available at http://github.com/henryxushi/IntAPT.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量 RNA 测序技术彻底改变了转录组分析的范围和深度。由于 RNA-seq 数据的噪声和可变性，准确重建表型特异性转录组具有挑战性。这需要在给定潜在共识转录结构的情况下，从同一表型的多个样本中计算识别转录本。

结果

我们提出了一种贝叶斯方法，即集成表型特异性转录本组装（IntAPT），该方法可从多个 RNA-seq 图谱中识别表型特异性异构体。IntAPT 具有新颖的两层贝叶斯模型，可在组层捕获异构体的存在，并在样本层量化异构体的丰度。使用 Spike-and-slab 先验来对异构体表达进行建模，并强制表达异构体的稀疏性。明确建模异构体的存在与其表达之间的依赖性，以促进参数估计。使用 Gibbs 抽样迭代估计模型参数，以推断联合后验分布，从中可以可靠地确定异构体的存在和丰度。使用模拟和真实数据集的研究表明，IntAPT 在 IntAPT 中始终优于现有方法。实验结果表明，尽管存在测序错误，但 IntAPT 在多个样本中表现稳健，从而显著提高了低丰度表达异构体的识别能力。

可用性和实施

IntAPT 软件包可在 http://github.com/henryxushi/IntAPT 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles.IntAPT：从多个 RNA-seq 谱中整合表型特异转录本的组装。

Bioinformatics. 2021 May 5;37(5):650-658. doi: 10.1093/bioinformatics/btaa852.

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data.SparseIso：一种从 RNA-seq 数据中识别选择性剪接异构体的新型贝叶斯方法。

Bioinformatics. 2018 Jan 1;34(1):56-63. doi: 10.1093/bioinformatics/btx557.

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.TIGAR：一种通过变分贝叶斯推断进行 RNA-Seq 数据缺口对齐的转录本丰度估计方法。

Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.

BADGE: a novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data.标记：一种用于 RNA-Seq 数据精确丰度定量和差异分析的新型贝叶斯模型。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-15-S9-S6. Epub 2014 Sep 10.

DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates.DEIsoM：一种基于层次贝叶斯模型的方法，用于使用生物学重复样本识别差异表达的异构体。

Bioinformatics. 2017 Oct 1;33(19):3018-3027. doi: 10.1093/bioinformatics/btx357.

Accurate inference of isoforms from multiple sample RNA-Seq data.从多个样本RNA测序数据中准确推断异构体

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。

Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.

AGTAR: A novel approach for transcriptome assembly and abundance estimation using an adapted genetic algorithm from RNA-seq data.AGTAR：一种利用 RNA-seq 数据中经过改编的遗传算法进行转录组组装和丰度估计的新方法。

Comput Biol Med. 2021 Aug;135:104646. doi: 10.1016/j.compbiomed.2021.104646. Epub 2021 Jul 10.

Bayesian transcriptome assembly.贝叶斯转录组组装

Genome Biol. 2014;15(10):501. doi: 10.1186/s13059-014-0501-4.

Platform-integrated mRNA isoform quantification.平台整合的 mRNA 异构体定量。

Bioinformatics. 2020 Apr 15;36(8):2466-2473. doi: 10.1093/bioinformatics/btz932.

引用本文的文献

Long noncoding RNA study: Genome-wide approaches.长链非编码RNA研究：全基因组方法。

Genes Dis. 2022 Nov 29;10(6):2491-2510. doi: 10.1016/j.gendis.2022.10.024. eCollection 2023 Nov.

本文引用的文献

De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.从头转录组组装：短读 RNA-Seq 组装器的全面跨物种比较。

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.

Bayesian nonparametric discovery of isoforms and individual specific quantification.贝叶斯非参数发现同种型和个体特异性定量。

Nat Commun. 2018 Apr 27;9(1):1681. doi: 10.1038/s41467-018-03402-w.

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data.SparseIso：一种从 RNA-seq 数据中识别选择性剪接异构体的新型贝叶斯方法。

Bioinformatics. 2018 Jan 1;34(1):56-63. doi: 10.1093/bioinformatics/btx557.

Mechanistic insights into precursor messenger RNA splicing by the spliceosome.剪接体对前体信使 RNA 剪接的机制见解。

Nat Rev Mol Cell Biol. 2017 Nov;18(11):655-670. doi: 10.1038/nrm.2017.86. Epub 2017 Sep 27.

TACO produces robust multisample transcriptome assemblies from RNA-seq.TACO可从RNA测序中生成强大的多样本转录组组装。

Nat Methods. 2017 Jan;14(1):68-70. doi: 10.1038/nmeth.4078. Epub 2016 Nov 21.

A survey of best practices for RNA-seq data analysis.RNA测序数据分析的最佳实践调查。

Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.

Ensembl 2016.Ensembl 2016。

Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6. doi: 10.1093/nar/gkv1157. Epub 2015 Dec 19.

PacBio Sequencing and Its Applications.PacBio测序技术及其应用。

Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2.

A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples.一种用于从多个RNA测序样本中联合进行RNA异构体检测和定量的凸优化公式。

BMC Bioinformatics. 2015 Aug 19;16:262. doi: 10.1186/s12859-015-0695-9.

Polyester: simulating RNA-seq datasets with differential transcript expression.聚酯：模拟具有差异转录本表达的RNA测序数据集。

Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。