Costa Ivan G, Roepcke Stefan, Schliep Alexander
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
BMC Immunol. 2007 Oct 9;8:25. doi: 10.1186/1471-2172-8-25.
The regulatory processes that govern cell proliferation and differentiation are central to developmental biology. Particularly well studied in this respect is the lymphoid system due to its importance for basic biology and for clinical applications. Gene expression measured in lymphoid cells in several distinguishable developmental stages helps in the elucidation of underlying molecular processes, which change gradually over time and lock cells in either the B cell, T cell or Natural Killer cell lineages. Large-scale analysis of these gene expression trees requires computational support for tasks ranging from visualization, querying, and finding clusters of similar genes, to answering detailed questions about the functional roles of individual genes.
We present the first statistical framework designed to analyze gene expression data as it is collected in the course of lymphoid development through clusters of co-expressed genes and additional heterogeneous data. We introduce dependence trees for continuous variates, which model the inherent dependencies during the differentiation process naturally as gene expression trees. Several trees are combined in a mixture model to allow inference of potentially overlapping clusters of co-expressed genes. Additionally, we predict microRNA targets.
Computational results for several data sets from the lymphoid system demonstrate the relevance of our framework. We recover well-known biological facts and identify promising novel regulatory elements of genes and their functional assignments. The implementation of our method (licensed under the GPL) is available at http://algorithmics.molgen.mpg.de/Supplements/ExpLym/.
调控细胞增殖和分化的过程是发育生物学的核心内容。由于淋巴系统对基础生物学和临床应用的重要性,在这方面对其进行了特别深入的研究。在几个可区分的发育阶段对淋巴细胞中基因表达的测量有助于阐明潜在的分子过程,这些过程会随着时间逐渐变化,并将细胞锁定在B细胞、T细胞或自然杀伤细胞谱系中。对这些基因表达树进行大规模分析需要计算支持,以完成从可视化、查询、寻找相似基因簇到回答有关单个基因功能作用的详细问题等一系列任务。
我们提出了第一个统计框架,旨在通过共表达基因簇和其他异质数据来分析在淋巴发育过程中收集的基因表达数据。我们引入了连续变量的依赖树,它将分化过程中的内在依赖自然地建模为基因表达树。几个树在一个混合模型中组合,以允许推断共表达基因的潜在重叠簇。此外,我们预测了 microRNA 靶标。
来自淋巴系统的几个数据集的计算结果证明了我们框架的相关性。我们重现了众所周知的生物学事实,并确定了有前景的新基因调控元件及其功能分配。我们方法的实现(根据 GPL许可)可在http://algorithmics.molgen.mpg.de/Supplements/ExpLym/获得。