Pratanwanich Naruemon, Lio Pietro
Computer Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0FD, United Kingdom.
Comput Biol Chem. 2014 Dec;53 Pt A:144-52. doi: 10.1016/j.compbiolchem.2014.08.019. Epub 2014 Aug 24.
Analysis of cellular responses to diverse stimuli enables the exploration in the complexity of functional genomics. Typically, high-throughput microarray data allow us to identify genes that are differentially expressed under a phenomenon of interest. To extract the meanings from the long list of those differentially expressed genes, we present a new method "pathway-based LDA" to determine pathways/gene sets that are perturbed after exposure to different chemicals. In this study, a pathway is defined as a group of functionally related genes. Specifically, we have implemented a probabilistic Latent Dirichlet Allocation (LDA) model to learn drug-pathway-gene relations by taking known gene-pathway memberships as prior knowledge. We applied the pathway-based LDA model and 236 known pathways in order to determine pathway responsiveness to gene expression data of 1169 drugs. Our method yielded a better predictive performance on pathway responsiveness to drug treatments than the existing methods. Moreover, the pathway-based LDA also revealed genes contributing the most in each pre-defined pathway through a probabilistic distribution of genes. In achieving that, our method could provide a useful estimator of the pathway complexity of a genome.
对细胞对多种刺激的反应进行分析有助于探索功能基因组学的复杂性。通常,高通量微阵列数据使我们能够识别在感兴趣的现象下差异表达的基因。为了从那些差异表达基因的长列表中提取意义,我们提出了一种新方法“基于通路的LDA”,以确定暴露于不同化学物质后受到干扰的通路/基因集。在本研究中,通路被定义为一组功能相关的基因。具体而言,我们实现了一种概率潜在狄利克雷分配(LDA)模型,以已知的基因-通路成员关系作为先验知识来学习药物-通路-基因关系。我们应用基于通路的LDA模型和236条已知通路,以确定对1169种药物的基因表达数据的通路反应性。我们的方法在对药物治疗的通路反应性预测性能上比现有方法更好。此外,基于通路的LDA还通过基因的概率分布揭示了在每个预定义通路中贡献最大的基因。通过这样做,我们的方法可以为基因组的通路复杂性提供一个有用的估计器。