基于谱系的细胞状态和表达程序鉴定。

Lineage-based identification of cellular states and expression programs.

机构信息

Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i250-7. doi: 10.1093/bioinformatics/bts204.

DOI:10.1093/bioinformatics/bts204

PMID:22689769

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3371836/

Abstract

We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L(1) that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.

摘要

我们提出了一种方法，LineageProgram，它利用观察到的基因表达测量的发育谱系关系来改善与发育相关的细胞状态和表达程序的学习。我们发现，纳入谱系信息可以显著提高从体外分化实验的表达测量中得出的表达程序的预测能力和可解释性。分化实验的谱系树是一个树图，其节点描述输入表达测量中所有独特的表达状态，边缘描述应用于细胞的实验扰动。我们的方法 LineageProgram 基于一个带有参数的对数线性模型，这些参数反映了沿谱系树的变化。基于 L(1)的正则化以三种不同的方式控制参数：两个细胞状态之间的基因变化数量、独特细胞状态的数量以及负责细胞状态变化的潜在因素的数量。该模型使用近端算子进行估计，可以快速发现少量关键的细胞状态和基因集。与现有的因子分解技术（如奇异值分解和非负矩阵分解）的比较表明，我们的方法在保留和外部测试中提供了更高的预测能力，同时诱导了稀疏和生物学上相关的基因集。