Alter Orly, Golub Gene H
Department of Biomedical Engineering and Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.
Proc Natl Acad Sci U S A. 2004 Nov 23;101(47):16577-82. doi: 10.1073/pnas.0406767101. Epub 2004 Nov 15.
We describe an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the "basis" set. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles. Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis and gives a global picture of the correlations and possibly also causal coordination of these two sets of states. We illustrate this framework with an integration of yeast genome-scale proteins' DNA-binding data with cell cycle mRNA expression time course data. Novel correlation between DNA replication initiation and RNA transcription during the yeast cell cycle, which might be due to a previously unknown mechanism of regulation, is predicted.
我们描述了一个整合的数据驱动数学框架,该框架根据一组选定的数据样本,或从数据样本中数学提取的概况(指定为“基础”集)来构建任意数量的基因组规模分子生物学数据集。通过使用伪逆投影,数据样本的分子生物学概况被最小二乘近似为基础概况的叠加。在基础中对数据进行重建模拟了仅对数据中与基础相对应的细胞状态的实验观察。根据数据样本在基础中的重建而非其整体测量概况对数据样本进行分类,将数据的细胞状态映射到基础的细胞状态,并给出这两组状态的相关性以及可能的因果协调的全局图景。我们通过整合酵母基因组规模蛋白质的DNA结合数据与细胞周期mRNA表达时间进程数据来说明这个框架。预测了酵母细胞周期中DNA复制起始与RNA转录之间的新相关性,这可能是由于一种先前未知的调控机制所致。