Department of Mathematics, The University of Mississippi, University, MS 38677, USA.
BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S19. doi: 10.1186/1471-2105-10-S11-S19.
Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.
We propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.
The gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.
微阵列技术使得在单个实验中同时监测数千个基因的表达水平成为可能。然而,大量的基因大大增加了分析、理解和解释由此产生的大量数据的挑战。选择一组重要的基因是解决这一挑战的必然选择。在过去的十年中,基因选择已经得到了广泛的研究。然而,大多数选择程序都不足以进行准确的生物学推断,因为生物学意义不一定具有统计学意义。需要将额外的生物学知识整合到基因选择过程中。
我们提出了一种通用的基因排序框架。我们从基因本体论(GO)和基因表达数据构建了一个二分图。该图描述了基因与其相关分子功能之间的关系。在物种条件下,图的边权重被分配为基因表达水平。这样的图提供了一种数学方法来表示既不依赖于物种又依赖于物种的生物学信息。我们还开发了一种新的排序算法,通过核空间深度(KSD)方法分析加权图。因此,基因和分子功能的重要性可以通过一个整合了图的全局和局部结构的实值度量 KSD 来同时排序。过表达和下调的基因也可以分别排序。
基因-功能二分图将分子功能注释集成到基因表达数据中。基因的相关性在图中描述(通过共同的功能)。所提出的方法为基因数据分析提供了一个探索性的框架。