Ben-Dor Amir, Chor Benny, Karp Richard, Yakhini Zohar
Agilent Laboratories, 12741 NE 39th Street, Bellevue, WA 98005, USA.
J Comput Biol. 2003;10(3-4):373-84. doi: 10.1089/10665270360688075.
This paper concerns the discovery of patterns in gene expression matrices, in which each element gives the expression level of a given gene in a given experiment. Most existing methods for pattern discovery in such matrices are based on clustering genes by comparing their expression levels in all experiments, or clustering experiments by comparing their expression levels for all genes. Our work goes beyond such global approaches by looking for local patterns that manifest themselves when we focus simultaneously on a subset G of the genes and a subset T of the experiments. Specifically, we look for order-preserving submatrices (OPSMs), in which the expression levels of all genes induce the same linear ordering of the experiments (we show that the OPSM search problem is NP-hard in the worst case). Such a pattern might arise, for example, if the experiments in T represent distinct stages in the progress of a disease or in a cellular process and the expression levels of all genes in G vary across the stages in the same way. We define a probabilistic model in which an OPSM is hidden within an otherwise random matrix. Guided by this model, we develop an efficient algorithm for finding the hidden OPSM in the random matrix. In data generated according to the model, the algorithm recovers the hidden OPSM with a very high success rate. Application of the methods to breast cancer data seem to reveal significant local patterns.
本文关注基因表达矩阵中模式的发现,其中每个元素给出给定基因在给定实验中的表达水平。大多数现有的在此类矩阵中发现模式的方法是基于通过比较所有实验中基因的表达水平来对基因进行聚类,或者通过比较所有基因的表达水平来对实验进行聚类。我们的工作超越了此类全局方法,通过寻找当我们同时关注基因的一个子集G和实验的一个子集T时出现的局部模式。具体而言,我们寻找保序子矩阵(OPSM),其中所有基因的表达水平诱导实验的相同线性排序(我们表明在最坏情况下OPSM搜索问题是NP难的)。例如,如果T中的实验代表疾病进展或细胞过程中的不同阶段,并且G中所有基因的表达水平以相同方式在各阶段变化,就可能出现这样的模式。我们定义一个概率模型,其中一个OPSM隐藏在一个其他方面随机的矩阵中。在这个模型的指导下,我们开发了一种在随机矩阵中找到隐藏OPSM的有效算法。在根据该模型生成的数据中,该算法以非常高的成功率恢复隐藏的OPSM。将这些方法应用于乳腺癌数据似乎揭示了显著的局部模式。