Wu Siqi, Joseph Antony, Hammonds Ann S, Celniker Susan E, Yu Bin, Frise Erwin
Department of Statistics, University of California, Berkeley, CA 94720; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720;
Department of Statistics, University of California, Berkeley, CA 94720; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; Walmart Labs, San Bruno, CA 94066;
Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):4290-5. doi: 10.1073/pnas.1521171113. Epub 2016 Apr 6.
Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set ofDrosophilaearly embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation ofDrosophilaexpression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with theDrosophiladata suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.
空间基因表达模式能够检测局部协变性,对于识别正常发育过程中的局部基因相互作用极为有用。近年来丰富的空间表达数据推动了调控网络的建模与分析。此类数据固有的复杂性使得提取生物学信息成为一项挑战。我们开发了staNMF,这是一种将非负矩阵分解(NMF)的可扩展实现与新的稳定性驱动模型选择标准相结合的方法。当应用于一组果蝇早期胚胎空间基因表达图像(此类最大的数据集之一)时,staNMF识别出21种主要模式(PP)。PP提供了果蝇表达模式的紧凑且具有生物学可解释性的表示,与通过激光消融实验生成的命运图谱相当,并显示出作为手动注释的数据驱动替代方案的巨大潜力。我们的分析将基因映射到细胞命运程序,并为未表征的基因赋予假定的生物学作用。最后,我们使用PP生成局部转录因子调控网络。针对沿胚胎前后轴分布的六种PP构建了空间局部相关网络。使用相关性的双尾5%截止值,我们重现了经过充分研究的间隙基因网络中11个链接中的10个。PP在果蝇数据上的表现表明,staNMF提供了信息丰富的分解,并构成了一个有用的计算视角,通过它可以从复杂且通常有噪声的基因表达数据中提取生物学见解。