用于基因调控网络推断的集成随机森林

Integrative random forest for gene regulatory network inference.

作者信息

Petralia Francesca, Wang Pei, Yang Jialiang, Tu Zhidong

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Bioinformatics. 2015 Jun 15;31(12):i197-205. doi: 10.1093/bioinformatics/btv268.

DOI:10.1093/bioinformatics/btv268

PMID:26072483

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4542785/

Abstract

MOTIVATION

Gene regulatory network (GRN) inference based on genomic data is one of the most actively pursued computational biological problems. Because different types of biological data usually provide complementary information regarding the underlying GRN, a model that integrates big data of diverse types is expected to increase both the power and accuracy of GRN inference. Towards this goal, we propose a novel algorithm named iRafNet: integrative random forest for gene regulatory network inference.

RESULTS

iRafNet is a flexible, unified integrative framework that allows information from heterogeneous data, such as protein-protein interactions, transcription factor (TF)-DNA-binding, gene knock-down, to be jointly considered for GRN inference. Using test data from the DREAM4 and DREAM5 challenges, we demonstrate that iRafNet outperforms the original random forest based network inference algorithm (GENIE3), and is highly comparable to the community learning approach. We apply iRafNet to construct GRN in Saccharomyces cerevisiae and demonstrate that it improves the performance in predicting TF-target gene regulations and provides additional functional insights to the predicted gene regulations.

AVAILABILITY AND IMPLEMENTATION

The R code of iRafNet implementation and a tutorial are available at: http://research.mssm.edu/tulab/software/irafnet.html

摘要

动机

基于基因组数据的基因调控网络（GRN）推断是计算生物学中最活跃的研究问题之一。由于不同类型的生物数据通常能提供关于潜在GRN的互补信息，因此一个整合多种类型大数据的模型有望提高GRN推断的能力和准确性。为实现这一目标，我们提出了一种名为iRafNet的新算法：用于基因调控网络推断的整合随机森林算法。

结果

iRafNet是一个灵活、统一的整合框架，它允许在GRN推断中共同考虑来自异构数据的信息，如蛋白质-蛋白质相互作用、转录因子（TF）-DNA结合、基因敲除等。使用来自DREAM4和DREAM5挑战赛的测试数据，我们证明iRafNet优于基于原始随机森林的网络推断算法（GENIE3），并且与社区学习方法高度可比。我们应用iRafNet构建酿酒酵母中的GRN，并证明它在预测TF-靶基因调控方面提高了性能，并为预测的基因调控提供了额外的功能见解。

可用性和实现

iRafNet实现的R代码和教程可在以下网址获取：http://research.mssm.edu/tulab/software/irafnet.html

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aacb/4542785/7b6fed724d6e/btv268f1p.jpg

相似文献

Integrative random forest for gene regulatory network inference.

Bioinformatics. 2015 Jun 15;31(12):i197-205. doi: 10.1093/bioinformatics/btv268.

BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference.

BMC Bioinformatics. 2015 Nov 4;16:368. doi: 10.1186/s12859-015-0754-2.

NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

PLoS One. 2014 Mar 25;9(3):e92709. doi: 10.1371/journal.pone.0092709. eCollection 2014.

Enhancing gene regulatory networks inference through hub-based data integration.

Comput Biol Chem. 2021 Dec;95:107589. doi: 10.1016/j.compbiolchem.2021.107589. Epub 2021 Oct 6.

SIN-KNO: A method of gene regulatory network inference using single-cell transcription and gene knockout data.

J Bioinform Comput Biol. 2019 Dec;17(6):1950035. doi: 10.1142/S0219720019500355.

BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement.

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):850-860. doi: 10.1109/TCBB.2017.2688355. Epub 2017 Mar 28.

Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.

Interdiscip Sci. 2024 Jun;16(2):318-332. doi: 10.1007/s12539-024-00604-3. Epub 2024 Feb 11.

PEAK: Integrating Curated and Noisy Prior Knowledge in Gene Regulatory Network Inference.

J Comput Biol. 2017 Sep;24(9):863-873. doi: 10.1089/cmb.2016.0199. Epub 2017 Mar 15.

GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.

Bioinformatics. 2019 Jun 1;35(12):2159-2161. doi: 10.1093/bioinformatics/bty916.

HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model.

Int J Mol Sci. 2018 Oct 15;19(10):3178. doi: 10.3390/ijms19103178.

引用本文的文献

Uncovering key biomarkers, potential therapeutic targets and development of deep learning model in heart failure.

PLoS One. 2025 Sep 3;20(9):e0330780. doi: 10.1371/journal.pone.0330780. eCollection 2025.

Identification of neutrophil extracellular traps-related genes for the diagnosis of acute myocardial infarction based on bioinformatics and experimental verification.

J Inflamm (Lond). 2025 Aug 27;22(1):35. doi: 10.1186/s12950-025-00462-w.

Perspective on recent developments and challenges in regulatory and systems genomics.

Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.

Integrating bioinformatics and machine learning to investigate the mechanisms by which three major respiratory infectious diseases exacerbate heart failure.

Sci Rep. 2025 Jul 2;15(1):23526. doi: 10.1038/s41598-025-07090-7.

Bulk and single-cell RNA-sequencing analyses revealed potential key genes and the role of CCL19/CCL21-CCR7 axis in hidradenitis suppurativa.

PLoS One. 2025 Jun 2;20(6):e0322565. doi: 10.1371/journal.pone.0322565. eCollection 2025.

Machine learning based identification of anoikis related gene classification patterns and immunoinfiltration characteristics in diabetic nephropathy.

Sci Rep. 2025 May 1;15(1):15271. doi: 10.1038/s41598-025-99395-w.

Identification of hub genes for the diagnosis associated with heart failure using multiple cell death patterns.

ESC Heart Fail. 2025 Aug;12(4):2898-2908. doi: 10.1002/ehf2.15299. Epub 2025 Apr 10.

Explainable artificial intelligence of DNA methylation-based brain tumor diagnostics.

Nat Commun. 2025 Feb 20;16(1):1787. doi: 10.1038/s41467-025-57078-0.

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping.

BioData Min. 2025 Feb 15;18(1):15. doi: 10.1186/s13040-025-00430-3.

Leveraging prior knowledge to infer gene regulatory networks from single-cell RNA-sequencing data.

Mol Syst Biol. 2025 Mar;21(3):214-230. doi: 10.1038/s44320-025-00088-3. Epub 2025 Feb 12.

本文引用的文献

Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles.

PLoS One. 2014 Feb 28;9(2):e82393. doi: 10.1371/journal.pone.0082393. eCollection 2014.

Predicting disease risk using bootstrap ranking and classification algorithms.

PLoS Comput Biol. 2013;9(8):e1003200. doi: 10.1371/journal.pcbi.1003200. Epub 2013 Aug 22.

Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.

PLoS Comput Biol. 2013;9(5):e1003068. doi: 10.1371/journal.pcbi.1003068. Epub 2013 May 23.

The BioGRID interaction database: 2013 update.

Nucleic Acids Res. 2013 Jan;41(Database issue):D816-23. doi: 10.1093/nar/gks1158. Epub 2012 Nov 30.

Wisdom of crowds for robust gene network inference.

Nat Methods. 2012 Jul 15;9(8):796-804. doi: 10.1038/nmeth.2016.

pROC: an open-source package for R and S+ to analyze and compare ROC curves.

BMC Bioinformatics. 2011 Mar 17;12:77. doi: 10.1186/1471-2105-12-77.

Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.

PLoS Comput Biol. 2010 Dec 2;6(12):e1001014. doi: 10.1371/journal.pcbi.1001014.

DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models.

PLoS One. 2010 Oct 25;5(10):e13397. doi: 10.1371/journal.pone.0013397.

Multigenic modeling of complex disease by random forests.

Adv Genet. 2010;72:73-99. doi: 10.1016/B978-0-12-380862-2.00004-7.

From knockouts to networks: establishing direct cause-effect relationships through graph analysis.

PLoS One. 2010 Oct 11;5(10):e12912. doi: 10.1371/journal.pone.0012912.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于基因调控网络推断的集成随机森林

Integrative random forest for gene regulatory network inference.

作者信息

Petralia Francesca, Wang Pei, Yang Jialiang, Tu Zhidong

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.