Rouanet Anaïs, Johnson Rob, Strauss Magdalena, Richardson Sylvia, Tom Brian D, White Simon R, Kirk Paul D W
MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, U.K.
EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
Methodology (Gott). 2024 Mar 11;73(2):314-339. doi: 10.1093/jrsssc/qlad097. Epub 2023 Nov 8.
The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous, categorical, and count outcomes. In this work, we extend Bayesian profile regression to cases where the outcome is longitudinal (or multivariate continuous) and provide PReMiuMlongi, an updated version of PReMiuM, the R package for profile regression. We consider multivariate normal and Gaussian process regression response models and provide proof of principle applications to four simulation studies. The model is applied on budding yeast data to identify groups of genes co-regulated during the cell cycle. We identify 4 distinct groups of genes associated with specific patterns of gene expression trajectories, along with the bound transcriptional factors, likely involved in their co-regulation process.
识别具有共同功能的共调控基因集是现代基因组学的一个关键问题。贝叶斯轮廓回归是一种半监督混合建模方法,它利用响应来引导对相关聚类的推断。轮廓回归以前的应用考虑了单变量连续、分类和计数结果。在这项工作中,我们将贝叶斯轮廓回归扩展到结果为纵向(或多变量连续)的情况,并提供了PReMiuMlongi,它是PReMiuM(用于轮廓回归的R包)的更新版本。我们考虑多变量正态和高斯过程回归响应模型,并为四个模拟研究提供原理证明应用。该模型应用于芽殖酵母数据,以识别细胞周期中共同调控的基因群。我们识别出与基因表达轨迹的特定模式相关的4个不同的基因群,以及可能参与其共同调控过程的结合转录因子。