Vinciotti Veronica, Liu Xiaohui, Turk Rolf, de Meijer Emile J, 't Hoen Peter A C
Department of Information Systems and Computing, Brunel University, Uxbridge UB8 3PH, UK.
BMC Bioinformatics. 2006 Apr 3;7:183. doi: 10.1186/1471-2105-7-183.
The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other.
We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections.
The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523.
在时间表达谱数据集中识别具有生物学意义的基因具有挑战性,且由于高水平的实验噪声而变得复杂。文献中使用的大多数统计方法没有充分利用数据集中的时间顺序,不适用于针对多种不同生物学条件测量时间谱的情况。我们提出了一种统计检验方法,通过对每个基因在每个生物学条件下的时间谱拟合多项式函数,明确利用数据中的时间顺序。推导了一个霍特林T2统计量来检测这些多项式参数彼此有显著差异的基因。
我们在来自四种不同年龄的小鼠品系(肌营养不良蛋白、β-肌聚糖和γ-肌聚糖缺陷小鼠以及野生型小鼠)的肌肉基因表达数据上验证了时间霍特林T2检验。前三种是不同肌肉营养不良的动物模型。广泛的生物学验证表明,该方法能够找到在四种品系中时间谱有显著差异的基因,并识别每种疾病形式的潜在生物标志物。通过模拟研究以及通过定量PCR实验确认所选基因的表达谱,证明了时间检验相对于不利用时间顺序的相同检验的附加值。所提出的方法在最大限度地检测具有生物学意义的基因的同时,将错误检测最小化。
时间霍特林T2检验能够找到相对较小且稳健的基因集,这些基因在感兴趣的条件之间显示出不同的时间谱。该检验简单,可以用于从任何实验设计生成的基因表达数据以及任何数量的条件,并且允许快速解释基因的时间行为。R代码可从V.V.获取。微阵列数据已提交到GEO,序列号为GSE1574和GSE3523。