Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA.
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 318 Hanes Hall, Chapel Hill, NC 27599-3260, USA and Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA.
Biostatistics. 2023 Apr 14;24(2):388-405. doi: 10.1093/biostatistics/kxab013.
The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.
对于给定基因表达的 RNA 异构体的相对比例与癌症、视网膜疾病和神经紊乱等疾病状态有关。对相对异构体比例的检查有助于确定生物学机制,但此类分析通常需要对剪接模式进行逐个基因的研究。利用基因组联盟生成的大型公共数据集作为参考,可以将感兴趣的数据集中的剪接模式与参考面板中的剪接模式进行比较,参考面板中的样本被分为不同的组,如组织来源或疾病状态。我们提出了一种潜在狄利克雷模型来比较表达异构体比例与参考面板(ACTOR),这是一种具有狄利克雷多项观测值的潜在狄利克雷模型,用于比较数据集和独立参考面板中表达异构体比例。我们使用变分贝叶斯过程来估计一个或多个样本的组归属的后验分布。使用基因型-组织表达项目作为参考数据集,我们在模拟和真实 RNA-seq 数据集上评估 ACTOR,以确定基因的组织类型分类。ACTOR 作为一个 R 包在 https://github.com/mccabes292/actor 上公开提供。