Laboratory for Health Protection Research, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.
Stem Cells Dev. 2011 Jan;20(1):115-26. doi: 10.1089/scd.2010.0181. Epub 2010 Aug 18.
A combined analysis of data from a series of literature studies can lead to more reliable results than that based on a single study. A common problem in performing combined analyses of literature microarray gene expression data is that the original raw data are not always available and not always easy to combine in one analysis. We propose an approach that does not require analyzing original raw data, but instead takes literature gene sets derived from (supplementary) tables as input and uses gene co-occurrence in these sets for mapping a co-regulation network. An algorithm for this method was applied to a collection of literature-derived gene sets related to embryonic stem cell (ESC) differentiation. In the resulting network, genes involved in similar biological processes or expressed at similar time points during differentiation were found to cluster together. Using this information, we identified 43 genes not previously associated with cardiac ESC differentiation for which we were able to assign a putative novel biological function. For 6 of these genes (Apobec2, Cth, Ptges, Rrad, Zfp57, and 2410146L05Rik), literature data on mouse knockout phenotypes support their putative function. Three other genes (Rcor2, Zfp503, and Hspb3) are part of major pathways within the network and therefore likely mechanistically relevant candidate genes. We anticipate that these 43 genes can help to improve the understanding of the molecular events underlying ESC differentiation. Moreover, the approach introduced here can be more widely applied to identify possible novel gene functions in biological processes.
对一系列文献研究数据进行综合分析,得出的结果比基于单个研究的结果更为可靠。在对文献微阵列基因表达数据进行综合分析时,一个常见的问题是原始的原始数据并不总是可用,也不容易在一个分析中进行组合。我们提出了一种方法,不需要分析原始的原始数据,而是将文献基因集作为输入,这些基因集来源于(补充)表,并使用这些集中的基因共现进行映射共调控网络。该方法的算法应用于一系列与胚胎干细胞 (ESC) 分化相关的文献衍生基因集。在得到的网络中,与相似的生物学过程相关或在分化过程中相似时间点表达的基因被发现聚集在一起。利用这些信息,我们确定了 43 个以前与心脏 ESC 分化无关的基因,并能够为其分配一个新的假定生物学功能。对于其中的 6 个基因 (Apobec2、Cth、Ptges、Rrad、Zfp57 和 2410146L05Rik),关于小鼠敲除表型的文献数据支持它们的假定功能。另外三个基因 (Rcor2、Zfp503 和 Hspb3) 是网络内主要途径的一部分,因此可能是具有机制相关性的候选基因。我们预计这些 43 个基因可以帮助更好地理解 ESC 分化背后的分子事件。此外,这里介绍的方法可以更广泛地应用于识别生物学过程中可能的新基因功能。