Bessonov Kyrylo, Walkey Christopher J, Shelp Barry J, van Vuuren Hennie J J, Chiu David, van der Merwe George
Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada ; School of Computer Science, University of Guelph, Guelph, Ontario, Canada.
PLoS One. 2013 Oct 9;8(10):e77192. doi: 10.1371/journal.pone.0077192. eCollection 2013.
Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples.
分析微阵列数据集中捕获的时间进程表达数据是一项复杂的工作,因为与数千个可用基因相比,相对较少的样本代表了庞大而复杂的数据空间。在这里,我们开发了相互依赖相关聚类(ICC)方法,以分析基于微阵列数据中特定靶基因表达的基因之间存在的关系。基于相关聚类,ICC方法分析了从给定微阵列数据集中提取的与基因表达谱相关的大量相关值。ICC可应用于任何微阵列数据集和任何靶基因。我们将此方法应用于葡萄酒发酵产生的微阵列数据,并选择编码C2H2锌指型转录因子的NSF1作为靶基因。通过准确鉴定NSF1先前已知的功能作用,验证了该方法的有效性。此外,我们鉴定并验证了该基因的潜在新功能;具体而言,NSF1是硫代谢基因表达的负调节因子,Nsf1蛋白(Nsf1p)的核定位以硫依赖性方式控制,NSF1的转录受Met4p调节,Met4p是硫代谢基因的重要转录激活因子。这里采用的跨学科方法突出了ICC方法在使用样本数量有限的复杂微阵列数据集挖掘新基因功能方面的准确性和相关性。