基于稀疏贝叶斯因子模型的空间表达模式自动标注。

Automatic annotation of spatial expression patterns via sparse Bayesian factor models.

机构信息

Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America.

出版信息

PLoS Comput Biol. 2011 Jul;7(7):e1002098. doi: 10.1371/journal.pcbi.1002098. Epub 2011 Jul 21.

DOI:10.1371/journal.pcbi.1002098

PMID:21814502

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3140966/

Abstract

Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

摘要

基因表达报告人的进展使得在 2D-4D 中记录和量化表达模式成为可能。与提供许多基因数据但平均化和/或分辨率低的微阵列不同，图像揭示了基因表达的高空间动态。考虑到可用数据正在迅速增加，开发基于图像比较、注释和建模基因表达的计算方法是当务之急。我们已经开发了一种稀疏贝叶斯因子分析模型，其中通过少量隐藏的共同因子来模拟大量高维图像中的观察到的表达多样性。我们将这种方法应用于来自果蝇 RNA 原位图像数据库的胚胎表达模式，并表明自动推断的因子提供了有意义的分解，并代表共同的共调控或生物学功能。低维因子混合权重集进一步用作分类器的特征，以将表达模式注释为功能类别。在人类策划的注释中，与使用数千个难以解释的特征的其他自动图像注释方法相比，我们的稀疏方法在不同的发育阶段达到了相似或更好的表达模式分类。因此，我们的研究概述了一个用于大型显微镜数据集的通用框架，其中生成模型本身及其在诸如自动注释等分析任务中的应用都可以为生物学问题提供深入的了解。