Suppr超能文献

ACTOR:一种潜在狄利克雷模型,用于比较表达的同工型比例与参考面板。

ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel.

机构信息

Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA.

Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 318 Hanes Hall, Chapel Hill, NC 27599-3260, USA and Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA.

出版信息

Biostatistics. 2023 Apr 14;24(2):388-405. doi: 10.1093/biostatistics/kxab013.

Abstract

The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.

摘要

对于给定基因表达的 RNA 异构体的相对比例与癌症、视网膜疾病和神经紊乱等疾病状态有关。对相对异构体比例的检查有助于确定生物学机制,但此类分析通常需要对剪接模式进行逐个基因的研究。利用基因组联盟生成的大型公共数据集作为参考,可以将感兴趣的数据集中的剪接模式与参考面板中的剪接模式进行比较,参考面板中的样本被分为不同的组,如组织来源或疾病状态。我们提出了一种潜在狄利克雷模型来比较表达异构体比例与参考面板(ACTOR),这是一种具有狄利克雷多项观测值的潜在狄利克雷模型,用于比较数据集和独立参考面板中表达异构体比例。我们使用变分贝叶斯过程来估计一个或多个样本的组归属的后验分布。使用基因型-组织表达项目作为参考数据集,我们在模拟和真实 RNA-seq 数据集上评估 ACTOR,以确定基因的组织类型分类。ACTOR 作为一个 R 包在 https://github.com/mccabes292/actor 上公开提供。

相似文献

1
ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel.
Biostatistics. 2023 Apr 14;24(2):388-405. doi: 10.1093/biostatistics/kxab013.
2
Bayesian nonparametric discovery of isoforms and individual specific quantification.
Nat Commun. 2018 Apr 27;9(1):1681. doi: 10.1038/s41467-018-03402-w.
3
NSMAP: a method for spliced isoforms identification and quantification from RNA-Seq.
BMC Bioinformatics. 2011 May 16;12:162. doi: 10.1186/1471-2105-12-162.
4
TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.
Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2.
5
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.
BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.
6
DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates.
Bioinformatics. 2017 Oct 1;33(19):3018-3027. doi: 10.1093/bioinformatics/btx357.
7
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.
mSphere. 2017 Dec 13;2(6). doi: 10.1128/mSphereDirect.00536-17. eCollection 2017 Nov-Dec.
8
Towards reliable isoform quantification using RNA-SEQ data.
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-11-S3-S6.
9
Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model.
J Bioinform Comput Biol. 2015 Dec;13(6):1542001. doi: 10.1142/S0219720015420019. Epub 2015 Aug 11.
10
cloudrnaSPAdes: isoform assembly using bulk barcoded RNA sequencing data.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btad781.

本文引用的文献

1
SplicingFactory-splicing diversity analysis for transcriptome data.
Bioinformatics. 2022 Jan 3;38(2):384-390. doi: 10.1093/bioinformatics/btab648.
2
Detection of aberrant splicing events in RNA-seq data using FRASER.
Nat Commun. 2021 Jan 22;12(1):529. doi: 10.1038/s41467-020-20573-7.
3
Transcriptomic signatures across human tissues identify functional rare genetic variation.
Science. 2020 Sep 11;369(6509). doi: 10.1126/science.aaz5900. Epub 2020 Sep 10.
5
MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data.
Genome Biol. 2020 May 11;21(1):111. doi: 10.1186/s13059-020-02015-1.
7
Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference.
Genome Biol. 2019 Dec 13;20(1):273. doi: 10.1186/s13059-019-1865-2.
8
MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease.
Cell Syst. 2019 May 22;8(5):380-394.e4. doi: 10.1016/j.cels.2019.04.003.
10
cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data.
Nat Methods. 2019 May;16(5):397-400. doi: 10.1038/s41592-019-0367-1. Epub 2019 Apr 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验