Inference Group, Department of Computing Science, University of Glasgow, and Gartnavel General Hospital, 1053 Great Western Road, Glasgow G12 0YN, UK.
Nucleic Acids Res. 2010 Nov;38(20):6831-40. doi: 10.1093/nar/gkq550. Epub 2010 Jun 22.
This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes--including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets. Expression data are available at ArrayExpress (accession number E-MEXP-2514) and code is available at http://www.dcs.gla.ac.uk/inference/metacovariateanalysis/.
本文描述并说明了一种新的微阵列数据分析方法,该方法将基于模型的聚类和二进制分类相结合,形成与“响应相关”基因的聚类;即,在区分响应的不同值时具有信息性的基因。随后,使用每个基因簇的适当统计摘要(我们称之为簇的“元协变量”表示),在概率回归模型中进行预测。我们首先通过分析白血病表达数据集来说明这种方法,然后在盐敏感高血压大鼠模型的肾基因表达数据的元协变量分析中密切关注该方法。我们探讨了我们对这些数据进行分析所提供的生物学见解。特别是,我们确定了一个高度有影响力的 13 个基因簇-包括三个转录因子(Arntl、Bhlhe41 和 Npas2)-这表明在对增加的饮食钠有反应时对高血压具有保护作用。使用 IPA 对该簇进行的功能和经典途径分析分别涉及转录激活和昼夜节律信号。尽管我们仅使用表达数据来说明我们的方法,但该方法适用于任何高维数据集。表达数据可在 ArrayExpress(访问号 E-MEXP-2514)上获得,代码可在 http://www.dcs.gla.ac.uk/inference/metacovariateanalysis/ 上获得。