Shi Yanxin, Klustein Michael, Simon Itamar, Mitchell Tom, Bar-Joseph Ziv
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Bioinformatics. 2007 Jul 1;23(13):i459-67. doi: 10.1093/bioinformatics/btm218.
When analyzing expression experiments, researchers are often interested in identifying the set of biological processes that are up- or down-regulated under the experimental condition studied. Current approaches, including clustering expression profiles and averaging the expression profiles of genes known to participate in specific processes, fail to provide an accurate estimate of the activity levels of many biological processes.
We introduce a probabilistic continuous hidden process Model (CHPM) for time series expression data. CHPM can simultaneously determine the most probable assignment of genes to processes and the level of activation of these processes over time. To estimate model parameters, CHPM uses multiple time series datasets and incorporates prior biological knowledge. Applying CHPM to yeast expression data, we show that our algorithm produces more accurate functional assignments for genes compared to other expression analysis methods. The inferred process activity levels can be used to study the relationships between biological processes. We also report new biological experiments confirming some of the process activity levels predicted by CHPM.
A Java implementation is available at http:\www.cs.cmu.edu\~yanxins\chpm.
Supplementary data are available at Bioinformatics online.
在分析表达实验时,研究人员通常希望确定在所研究的实验条件下上调或下调的生物过程集。当前的方法,包括对表达谱进行聚类以及对已知参与特定过程的基因的表达谱求平均值,都无法准确估计许多生物过程的活性水平。
我们为时间序列表达数据引入了一种概率连续隐藏过程模型(CHPM)。CHPM可以同时确定基因到过程的最可能分配以及这些过程随时间的激活水平。为了估计模型参数,CHPM使用多个时间序列数据集并纳入先验生物学知识。将CHPM应用于酵母表达数据,我们表明与其他表达分析方法相比,我们的算法为基因产生了更准确的功能分配。推断出的过程活性水平可用于研究生物过程之间的关系。我们还报告了新的生物学实验,证实了CHPM预测的一些过程活性水平。
可在http:\www.cs.cmu.edu\~yanxins\chpm获得Java实现。
补充数据可在《生物信息学》在线获取。