Suppr
超能文献

使用MEDUSA学习能够准确预测差异表达的调控程序。

Learning regulatory programs that accurately predict differential expression with MEDUSA.

作者信息

Kundaje Anshul, Lianoglou Steve, Li Xuejing, Quigley David, Arias Marta, Wiggins Chris H, Zhang Li, Leslie Christina

机构信息

Department of Computer Science, Center for Computational Learning Systems, Columbia University, New York, NY 10065, USA.

出版信息

Ann N Y Acad Sci. 2007 Dec;1115:178-202. doi: 10.1196/annals.1407.020. Epub 2007 Oct 12.

DOI:10.1196/annals.1407.020

PMID:17934055

Abstract

Inferring gene regulatory networks from high-throughput genomic data is one of the central problems in computational biology. In this paper, we describe a predictive modeling approach for studying regulatory networks, based on a machine learning algorithm called MEDUSA. MEDUSA integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, MEDUSA determines condition-specific regulators and discovers regulatory motifs that mediate the regulation of target genes. In this way, MEDUSA meaningfully models biological mechanisms of transcriptional regulation. MEDUSA solves the problem of predicting the differential (up/down) expression of target genes by using boosting, a technique from statistical learning, which helps to avoid overfitting as the algorithm searches through the high-dimensional space of potential regulators and sequence motifs. Experimental results demonstrate that MEDUSA achieves high prediction accuracy on held-out experiments (test data), that is, data not seen in training. We also present context-specific analysis of MEDUSA regulatory programs for DNA damage and hypoxia, demonstrating that MEDUSA identifies key regulators and motifs in these processes. A central challenge in the field is the difficulty of validating reverse-engineered networks in the absence of a gold standard. Our approach of learning regulatory programs provides at least a partial solution for the problem: MEDUSA's prediction accuracy on held-out data gives a concrete and statistically sound way to validate how well the algorithm performs. With MEDUSA, statistical validation becomes a prerequisite for hypothesis generation and network building rather than a secondary consideration.

摘要

从高通量基因组数据推断基因调控网络是计算生物学的核心问题之一。在本文中，我们描述了一种基于名为MEDUSA的机器学习算法来研究调控网络的预测建模方法。MEDUSA整合启动子序列、mRNA表达和转录因子占用数据，以学习预测靶基因差异表达的基因调控程序。MEDUSA不是使用表达谱的聚类或相关性来推断调控关系，而是确定特定条件下的调节因子，并发现介导靶基因调控的调控基序。通过这种方式，MEDUSA有意义地模拟了转录调控的生物学机制。MEDUSA通过使用统计学习中的一种技术——提升法，解决了预测靶基因差异（上调/下调）表达的问题，这有助于在算法搜索潜在调节因子和序列基序的高维空间时避免过拟合。实验结果表明，MEDUSA在留出实验（测试数据）中，即在训练中未见过的数据上，实现了较高的预测准确率。我们还对MEDUSA在DNA损伤和缺氧情况下的调控程序进行了特定背景分析，表明MEDUSA识别出了这些过程中的关键调节因子和基序。该领域的一个核心挑战是在没有金标准的情况下验证反向工程网络的难度。我们学习调控程序的方法至少为该问题提供了部分解决方案：MEDUSA在留出数据上的预测准确率为验证算法的性能提供了一种具体且具有统计学意义的方法。有了MEDUSA，统计验证成为假设生成和网络构建的先决条件，而不是次要考虑因素。

相似文献

Learning regulatory programs that accurately predict differential expression with MEDUSA.

Ann N Y Acad Sci. 2007 Dec;1115:178-202. doi: 10.1196/annals.1407.020. Epub 2007 Oct 12.

Predicting genetic regulatory response using classification.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

A framework for elucidating regulatory networks based on prior information and expression data.

Ann N Y Acad Sci. 2007 Dec;1115:240-8. doi: 10.1196/annals.1407.002. Epub 2007 Oct 9.

Reconstruction of metabolic networks from high-throughput metabolite profiling data: in silico analysis of red blood cell metabolism.

Ann N Y Acad Sci. 2007 Dec;1115:102-15. doi: 10.1196/annals.1407.013. Epub 2007 Oct 9.

Reverse engineering of dynamic networks.

Ann N Y Acad Sci. 2007 Dec;1115:168-77. doi: 10.1196/annals.1407.012. Epub 2007 Oct 9.

LICORN: learning cooperative regulation networks from gene expression data.

Bioinformatics. 2007 Sep 15;23(18):2407-14. doi: 10.1093/bioinformatics/btm352. Epub 2007 Aug 24.

Validating module network learning algorithms using simulated data.

BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-8-S2-S5.

Alternative pathway approach for automating analysis and validation of cell perturbation networks and design of perturbation experiments.

Ann N Y Acad Sci. 2007 Dec;1115:267-85. doi: 10.1196/annals.1407.011. Epub 2007 Oct 9.

Algorithmic issues in reverse engineering of protein and gene networks via the modular response analysis method.

Ann N Y Acad Sci. 2007 Dec;1115:132-41. doi: 10.1196/annals.1407.001. Epub 2007 Oct 9.

Data requirements of reverse-engineering algorithms.

Ann N Y Acad Sci. 2007 Dec;1115:142-53. doi: 10.1196/annals.1407.008. Epub 2007 Oct 9.

引用本文的文献

Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana.

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae415.

Dynamic regulatory module networks for inference of cell type-specific transcriptional networks.

Genome Res. 2022 Jul;32(7):1367-1384. doi: 10.1101/gr.276542.121. Epub 2022 Jun 15.

Inference of cell type specific regulatory networks on mammalian lineages.

Curr Opin Syst Biol. 2017 Apr;2:130-139. doi: 10.1016/j.coisb.2017.04.001. Epub 2017 Apr 17.

The recurrent architecture of tumour initiation, progression and drug sensitivity.

Nat Rev Cancer. 2017 Feb;17(2):116-130. doi: 10.1038/nrc.2016.124. Epub 2016 Dec 15.

Functional characterization of somatic mutations in cancer using network-based inference of protein activity.

Nat Genet. 2016 Aug;48(8):838-47. doi: 10.1038/ng.3593. Epub 2016 Jun 20.

Development of a novel prediction method of cis-elements to hypothesize collaborative functions of cis-element pairs in iron-deficient rice.

Rice (N Y). 2013 Sep 22;6(1):22. doi: 10.1186/1939-8433-6-22.

Genomic analysis of immune response against Vibrio cholerae hemolysin in Caenorhabditis elegans.

PLoS One. 2012;7(5):e38200. doi: 10.1371/journal.pone.0038200. Epub 2012 May 31.

G = MAT: linking transcription factor expression and DNA binding data.

PLoS One. 2011 Jan 31;6(1):e14559. doi: 10.1371/journal.pone.0014559.

FastMEDUSA: a parallelized tool to infer gene regulatory networks.

Bioinformatics. 2010 Jul 15;26(14):1792-3. doi: 10.1093/bioinformatics/btq275. Epub 2010 May 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

使用MEDUSA学习能够准确预测差异表达的调控程序。

Learning regulatory programs that accurately predict differential expression with MEDUSA.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译