使用分类方法预测基因调控反应。

Middendorf Manuel, Kundaje Anshul, Wiggins Chris, Freund Yoav, Leslie Christina

Department of Physics, Columbia University, NY, NY 10027, USA.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

MOTIVATION

Studying gene regulatory mechanisms in simple model organisms through analysis of high-throughput genomic data has emerged as a central problem in computational biology. Most approaches in the literature have focused either on finding a few strong regulatory patterns or on learning descriptive models from training data. However, these approaches are not yet adequate for making accurate predictions about which genes will be up- or down-regulated in new or held-out experiments. By introducing a predictive methodology for this problem, we can use powerful tools from machine learning and assess the statistical significance of our predictions.

RESULTS

We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences ('motifs') in the gene's regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment ('parents'). Thus, our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. We convert the regression task of predicting real-valued gene expression measurements to a classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. The learning algorithm employed is boosting with a margin-based generalization of decision trees, alternating decision trees. This large-margin classifier is sufficiently flexible to allow complex logical functions, yet sufficiently simple to give insight into the combinatorial mechanisms of gene regulation. We observe encouraging prediction accuracy on experiments based on the Gasch S.cerevisiae dataset, and we show that we can accurately predict up- and down-regulation on held-out experiments. We also show how to extract significant regulators, motifs and motif-regulator pairs from the learned models for various stress responses. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks.

AVAILABILITY

The MLJava package is available upon request to the authors. Supplementary: Additional results are available from http://www.cs.columbia.edu/compbio/geneclass

动机

通过高通量基因组数据分析来研究简单模式生物中的基因调控机制，已成为计算生物学的核心问题。文献中的大多数方法要么侧重于寻找一些强调控模式，要么侧重于从训练数据中学习描述性模型。然而，这些方法仍不足以对新的或保留实验中哪些基因将上调或下调做出准确预测。通过为这个问题引入一种预测方法，我们可以使用机器学习的强大工具并评估预测的统计显著性。

结果

我们提出了一种基于分类的新方法来学习预测基因调控反应。我们的方法基于这样一个假设：在诸如酿酒酵母这样的简单生物中，我们可以基于以下两点学习一个决策规则，以预测特定实验中一个基因是上调还是下调：（1）基因调控区域中结合位点子序列（“基序”）的存在，以及（2）实验中调节因子（如转录因子）的表达水平（“亲本”）。因此，我们的学习任务整合了两种性质不同的数据源：跨多个扰动和突变实验的全基因组cDNA微阵列数据以及来自调控序列的基序谱数据。我们将预测实值基因表达测量的回归任务转换为预测 +1 和 -1 标签的分类任务，分别对应于高于微阵列测量中生物学和测量噪声水平的上调和下调。所采用的学习算法是基于决策树的基于间隔的泛化的提升算法，交替决策树。这种大间隔分类器足够灵活以允许复杂的逻辑函数，但又足够简单以深入了解基因调控的组合机制。我们在基于Gasch酿酒酵母数据集的实验中观察到令人鼓舞的预测准确性，并且我们表明我们可以在保留实验中准确预测上调和下调。我们还展示了如何从学习到的模型中提取针对各种应激反应的重要调节因子、基序和基序 - 调节因子对。因此，我们的方法提供了预测假设，建议了生物学实验，并对遗传调控网络的结构提供了可解释的见解。

可用性

可向作者请求获取MLJava包。补充材料：其他结果可从http://www.cs.columbia.edu/compbio/geneclass获取

相似文献

Predicting genetic regulatory response using classification.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

Regulatory motif finding by logic regression.

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

Learning regulatory programs that accurately predict differential expression with MEDUSA.

Ann N Y Acad Sci. 2007 Dec;1115:178-202. doi: 10.1196/annals.1407.020. Epub 2007 Oct 12.

Computational discovery of transcriptional regulatory rules.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii101-7. doi: 10.1093/bioinformatics/bti1117.

A graph-based approach to systematically reconstruct human transcriptional regulatory modules.

Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227.

A classification-based framework for predicting and analyzing gene regulatory response.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-7-S1-S5.

Inferring quantitative models of regulatory networks from expression data.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i248-56. doi: 10.1093/bioinformatics/bth941.

An equilibrium partitioning model connecting gene expression and cis-motif content.

Bioinformatics. 2006 Jul 15;22(14):e368-74. doi: 10.1093/bioinformatics/btl253.

MotifCut: regulatory motifs finding with maximum density subgraphs.

Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.

Inferring gene regulatory networks from multiple microarray datasets.

Bioinformatics. 2006 Oct 1;22(19):2413-20. doi: 10.1093/bioinformatics/btl396. Epub 2006 Jul 24.

引用本文的文献

Predicting which genes will respond to transcription factor perturbations.

G3 (Bethesda). 2022 Jul 29;12(8). doi: 10.1093/g3journal/jkac144.

Identification and Characterization of Cis-Regulatory Elements for Photoreceptor-Type-Specific Transcription in ZebraFish.

Methods Mol Biol. 2020;2092:123-145. doi: 10.1007/978-1-0716-0175-4_10.

Toward point-of-care assessment of patient response: a portable tool for rapidly assessing cancer drug efficacy using multifrequency impedance cytometry and supervised machine learning.

Microsyst Nanoeng. 2019 Jul 15;5:34. doi: 10.1038/s41378-019-0073-2. eCollection 2019.

Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.

PLoS Comput Biol. 2019 Sep 11;15(9):e1007329. doi: 10.1371/journal.pcbi.1007329. eCollection 2019 Sep.

Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks.

PLoS Comput Biol. 2015 Sep 22;11(9):e1004504. doi: 10.1371/journal.pcbi.1004504. eCollection 2015.

A chromatin code for alternative splicing involving a putative association between CTCF and HP1α proteins.

BMC Biol. 2015 May 2;13:31. doi: 10.1186/s12915-015-0141-5.

Breast cancer prediction using genome wide single nucleotide polymorphism data.

BMC Bioinformatics. 2013;14 Suppl 13(Suppl 13):S3. doi: 10.1186/1471-2105-14-S13-S3. Epub 2013 Oct 1.

Mapping yeast transcriptional networks.

Genetics. 2013 Sep;195(1):9-36. doi: 10.1534/genetics.113.153262.

Statistical significance of combinatorial regulations.

Proc Natl Acad Sci U S A. 2013 Aug 6;110(32):12996-3001. doi: 10.1073/pnas.1302233110. Epub 2013 Jul 23.

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

BMC Bioinformatics. 2013 Feb 22;14:61. doi: 10.1186/1471-2105-14-61.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Predicting genetic regulatory response using classification.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

Regulatory motif finding by logic regression.

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

Learning regulatory programs that accurately predict differential expression with MEDUSA.

Ann N Y Acad Sci. 2007 Dec;1115:178-202. doi: 10.1196/annals.1407.020. Epub 2007 Oct 12.

Computational discovery of transcriptional regulatory rules.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii101-7. doi: 10.1093/bioinformatics/bti1117.

A graph-based approach to systematically reconstruct human transcriptional regulatory modules.

Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227.

A classification-based framework for predicting and analyzing gene regulatory response.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-7-S1-S5.

Inferring quantitative models of regulatory networks from expression data.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i248-56. doi: 10.1093/bioinformatics/bth941.

An equilibrium partitioning model connecting gene expression and cis-motif content.

Bioinformatics. 2006 Jul 15;22(14):e368-74. doi: 10.1093/bioinformatics/btl253.

MotifCut: regulatory motifs finding with maximum density subgraphs.

Bioinformatics. 2006 Jul 15;22(14):e150-7. doi: 10.1093/bioinformatics/btl243.

Inferring gene regulatory networks from multiple microarray datasets.

Bioinformatics. 2006 Oct 1;22(19):2413-20. doi: 10.1093/bioinformatics/btl396. Epub 2006 Jul 24.

引用本文的文献

Predicting which genes will respond to transcription factor perturbations.

G3 (Bethesda). 2022 Jul 29;12(8). doi: 10.1093/g3journal/jkac144.

Identification and Characterization of Cis-Regulatory Elements for Photoreceptor-Type-Specific Transcription in ZebraFish.

Methods Mol Biol. 2020;2092:123-145. doi: 10.1007/978-1-0716-0175-4_10.

Toward point-of-care assessment of patient response: a portable tool for rapidly assessing cancer drug efficacy using multifrequency impedance cytometry and supervised machine learning.

Microsyst Nanoeng. 2019 Jul 15;5:34. doi: 10.1038/s41378-019-0073-2. eCollection 2019.

Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.

PLoS Comput Biol. 2019 Sep 11;15(9):e1007329. doi: 10.1371/journal.pcbi.1007329. eCollection 2019 Sep.

Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks.

PLoS Comput Biol. 2015 Sep 22;11(9):e1004504. doi: 10.1371/journal.pcbi.1004504. eCollection 2015.

A chromatin code for alternative splicing involving a putative association between CTCF and HP1α proteins.

BMC Biol. 2015 May 2;13:31. doi: 10.1186/s12915-015-0141-5.

Breast cancer prediction using genome wide single nucleotide polymorphism data.

BMC Bioinformatics. 2013;14 Suppl 13(Suppl 13):S3. doi: 10.1186/1471-2105-14-S13-S3. Epub 2013 Oct 1.

Mapping yeast transcriptional networks.

Genetics. 2013 Sep;195(1):9-36. doi: 10.1534/genetics.113.153262.

Statistical significance of combinatorial regulations.

Proc Natl Acad Sci U S A. 2013 Aug 6;110(32):12996-3001. doi: 10.1073/pnas.1302233110. Epub 2013 Jul 23.

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

BMC Bioinformatics. 2013 Feb 22;14:61. doi: 10.1186/1471-2105-14-61.

Suppr
超能文献

Predicting genetic regulatory response using classification.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

Suppr超能文献

使用分类方法预测基因调控反应。

Predicting genetic regulatory response using classification.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

Suppr
超能文献