Yan Xin, Zheng Tian
Russell Investments, Tacoma, WA, USA.
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S14. doi: 10.1186/1471-2164-9-S2-S14.
Gene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.
In this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.
We illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.
从微阵列实验中提取的基因表达数据已被用于研究不同条件下基因的mRNA丰度差异。在其中一项此类实验中,数千个基因被同时测量,这为区分不同样本类别提供了一个高维特征空间。然而,这些维度中的大多数对于类间差异并无信息价值,反而会给判别分析增加噪声。
在本文中,我们提出并研究了评估一组基因“信息价值”的特征选择方法。基于多基因表达谱考虑了两种信息度量,用于一种反向信息驱动的筛选方法来选择重要的基因特征。通过考虑多基因表达谱,我们能够利用这些基因之间的相互作用信息。使用乳腺癌数据,我们阐述了我们的方法并将其与现有方法的性能进行比较。
我们在本文中表明,考虑基因 - 基因相互作用的方法在基因表达分析中具有更好的分类能力。在我们的结果中,我们从单基因测试中识别出具有相对较大p值的重要基因。这表明这些是具有弱边际信息但强相互作用信息的基因,而仅检查单个基因的策略会忽略这些基因。