Costa Ivan G, Krause Roland, Opitz Lennart, Schliep Alexander
Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-8-S10-S3.
Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.
Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.
Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.
在果蝇黑腹果蝇发育过程中进行基因表达测量,通常用于寻找时间上共表达基因的功能模块。果蝇胚胎不同阶段的原位RNA杂交图像的补充大数据集阐明了空间表达模式。
使用半监督方法,即混合模型的约束聚类,我们可以找到在表达上表现出时空相似性的基因簇,即共表达。时间基因表达测量作为主要数据,成对约束从原始原位图像以自动化方式计算,无需手动注释。我们研究了这些成对约束在聚类中的影响,并讨论了我们结果的生物学相关性。
空间信息有助于对时间基因表达数据进行详细的、具有生物学意义的分析。半监督学习为整合不同质量和丰度的数据源提供了一个灵活、稳健且高效的框架。