Guillot Gilles, Olsson Maja, Benson Mikael, Rudemo Mats
INRA, Applied Mathematics Department, Paris, France.
Math Biosci. 2007 Feb;205(2):195-203. doi: 10.1016/j.mbs.2006.08.007. Epub 2006 Aug 24.
Comparison of gene expression for two groups of individuals form an important subclass of microarray experiments. We study multivariate procedures, in particular use of Hotelling's T2 for discrimination between the groups with a special emphasis on methods based on few genes only. We apply the methods to data from an experiment with a group of atopic dermatitis patients compared with a control group. We also compare our methodology to other recently proposed methods on publicly available datasets. It is found that (i) use of several genes gives a much improved discrimination of the groups as compared to one gene only, (ii) the genes that play the most important role in the multivariate analysis are not necessarily those that rank first in univariate comparisons of the groups, (iii) Linear Discriminant Analysis carried out with sets of 2-5 genes selected according to their Hotelling T2 give results comparable to state-of-the-art methods using many more genes, a feature of our method which might be crucial in clinical applications. Finding groups of genes that together give optimal multivariate discrimination (given the size of the group) can identify crucial pathways and networks of genes responsible for a disease. The computer code that we developed to make computations is available as an R package.
两组个体的基因表达比较构成了微阵列实验的一个重要子类。我们研究多变量方法,特别是使用霍特林T2来区分两组,特别强调仅基于少数基因的方法。我们将这些方法应用于一组特应性皮炎患者与对照组的实验数据。我们还将我们的方法与公开可用数据集上最近提出的其他方法进行比较。结果发现:(i)与仅使用一个基因相比,使用多个基因能大大提高对两组的区分度;(ii)在多变量分析中起最重要作用的基因不一定是在两组单变量比较中排名第一的基因;(iii)根据霍特林T2选择的2 - 5个基因集进行的线性判别分析,其结果与使用更多基因的现有最佳方法相当,我们方法的这一特点在临床应用中可能至关重要。找到能共同提供最佳多变量区分度(给定基因集大小)的基因组,可以识别出导致疾病的关键基因途径和网络。我们开发的用于计算的计算机代码可作为一个R包获取。