Nikkilä Janne, Roos Christophe, Savia Eerika, Kaski Samuel
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland.
Int J Neural Syst. 2005 Aug;15(4):237-46. doi: 10.1142/S0129065705000220.
We model dependencies between m multivariate continuous-valued information sources by a combination of (i) a generalized canonical correlations analysis (gCCA) to reduce dimensionality while preserving dependencies in m - 1 of them, and (ii) summarizing dependencies with the remaining one by associative clustering. This new combination of methods avoids multiway associative clustering which would require a multiway contingency table and hence suffer from curse of dimensionality of the table. The method is applied to summarizing properties of yeast stress by searching for dependencies (commonalities) between expression of genes of baker's yeast Saccharomyces cerevisiae in various stressful treatments, and summarizing stress regulation by finally adding data about transcription factor binding sites.
我们通过以下组合对m个多元连续值信息源之间的依赖性进行建模:(i)广义典型相关分析(gCCA),以降低维度同时保留其中m - 1个信息源之间的依赖性;(ii)通过关联聚类用剩余的一个信息源总结依赖性。这种新的方法组合避免了需要多元列联表的多元关联聚类,因此不会受到表格维度灾难的影响。该方法应用于总结酵母应激特性,通过搜索酿酒酵母在各种应激处理中基因表达之间的依赖性(共性),并最终通过添加转录因子结合位点的数据来总结应激调控。