Suppr超能文献

使用分类方法预测基因调控反应。

Predicting genetic regulatory response using classification.

作者信息

Middendorf Manuel, Kundaje Anshul, Wiggins Chris, Freund Yoav, Leslie Christina

机构信息

Department of Physics, Columbia University, NY, NY 10027, USA.

出版信息

Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

Abstract

MOTIVATION

Studying gene regulatory mechanisms in simple model organisms through analysis of high-throughput genomic data has emerged as a central problem in computational biology. Most approaches in the literature have focused either on finding a few strong regulatory patterns or on learning descriptive models from training data. However, these approaches are not yet adequate for making accurate predictions about which genes will be up- or down-regulated in new or held-out experiments. By introducing a predictive methodology for this problem, we can use powerful tools from machine learning and assess the statistical significance of our predictions.

RESULTS

We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences ('motifs') in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment ('parents'). Thus, our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurements to a classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S.cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. We also show how to extract significant regulators, motifs and motif-regulator pairs from the learned models for various stress responses. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks.

AVAILABILITY

The MLJava package is available upon request to the authors. Supplementary: Additional results are available from http://www.cs.columbia.edu/compbio/geneclass

摘要

动机

通过高通量基因组数据分析来研究简单模式生物中的基因调控机制,已成为计算生物学的核心问题。文献中的大多数方法要么侧重于寻找一些强调控模式,要么侧重于从训练数据中学习描述性模型。然而,这些方法仍不足以对新的或保留实验中哪些基因将上调或下调做出准确预测。通过为这个问题引入一种预测方法,我们可以使用机器学习的强大工具并评估预测的统计显著性。

结果

我们提出了一种基于分类的新方法来学习预测基因调控反应。我们的方法基于这样一个假设:在诸如酿酒酵母这样的简单生物中,我们可以基于以下两点学习一个决策规则,以预测特定实验中一个基因是上调还是下调:(1)基因调控区域中结合位点子序列(“基序”)的存在,以及(2)实验中调节因子(如转录因子)的表达水平(“亲本”)。因此,我们的学习任务整合了两种性质不同的数据源:跨多个扰动和突变实验的全基因组cDNA微阵列数据以及来自调控序列的基序谱数据。我们将预测实值基因表达测量的回归任务转换为预测 +1 和 -1 标签的分类任务,分别对应于高于微阵列测量中生物学和测量噪声水平的上调和下调。所采用的学习算法是基于决策树的基于间隔的泛化的提升算法,交替决策树。这种大间隔分类器足够灵活以允许复杂的逻辑函数,但又足够简单以深入了解基因调控的组合机制。我们在基于Gasch酿酒酵母数据集的实验中观察到令人鼓舞的预测准确性,并且我们表明我们可以在保留实验中准确预测上调和下调。我们还展示了如何从学习到的模型中提取针对各种应激反应的重要调节因子、基序和基序 - 调节因子对。因此,我们的方法提供了预测假设,建议了生物学实验,并对遗传调控网络的结构提供了可解释的见解。

可用性

可向作者请求获取MLJava包。补充材料:其他结果可从http://www.cs.columbia.edu/compbio/geneclass获取

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验