Rosa Bruce A, Oh Sookyung, Montgomery Beronda L, Chen Jin, Qin Wensheng
Int J Biochem Mol Biol. 2010;1(1):51-68. Epub 2010 Jun 15.
Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.
微阵列实验中收集的基因表达数据的计算分析方法可用于识别先前未研究基因的功能。虽然获取表达数据并非难事,但从数据集中解释和提取信息却具有挑战性。在本研究中,一种基于知识的方法被用于研究光调节,该方法在基于变异性和倍数变化差异进行过滤之前,先识别并保存重要的功能基因。使用两种聚类方法对过滤后的数据集进行聚类,并定位包含关键光调节基因的簇。确定了这两个簇的共同基因,并根据它们与关键基因的共表达对共同簇中的基因进行排序。对3种处理组合中的11个关键基因重复了这一过程。初始过滤方法将数据集大小从22,814个探针减少到平均1134个基因,最终得到的共同簇列表平均仅包含14个基因。这些共同簇列表的基因富集分数高于两种单独的聚类方法。此外,过滤方法将数据集中光响应基因的比例从1.8%提高到15.2%,而簇列表将这一比例提高到18.4%。与通过典型聚类方法或共表达网络生成的基因组相比,这些共同簇列表的长度相对较短,这在增加新功能基因与生物学相关性可能性的同时,缩小了对新功能基因的搜索范围。