Geurts Pierre, Touleimat Nizar, Dutreix Marie, d'Alché-Buc Florence
IBISC FRE CNRS 2873 & Epigenomics project, GENOPOLE, Evry, France.
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S4. doi: 10.1186/1471-2105-8-S2-S4.
Elucidating biological networks between proteins appears nowadays as one of the most important challenges in systems biology. Computational approaches to this problem are important to complement high-throughput technologies and to help biologists in designing new experiments. In this work, we focus on the completion of a biological network from various sources of experimental data.
We propose a new machine learning approach for the supervised inference of biological networks, which is based on a kernelization of the output space of regression trees. It inherits several features of tree-based algorithms such as interpretability, robustness to irrelevant variables, and input scalability. We applied this method to the inference of two kinds of networks in the yeast S. cerevisiae: a protein-protein interaction network and an enzyme network. In both cases, we obtained results competitive with existing approaches. We also show that our method provides relevant insights on input data regarding their potential relationship with the existence of interactions. Furthermore, we confirm the biological validity of our predictions in the context of an analysis of gene expression data.
Output kernel tree based methods provide an efficient tool for the inference of biological networks from experimental data. Their simplicity and interpretability should make them of great value for biologists.
如今,阐明蛋白质之间的生物网络似乎是系统生物学中最重要的挑战之一。针对这个问题的计算方法对于补充高通量技术以及帮助生物学家设计新实验而言至关重要。在这项工作中,我们专注于从各种实验数据源完成生物网络。
我们提出了一种用于生物网络监督推理的新机器学习方法,该方法基于回归树输出空间的核化。它继承了基于树的算法的几个特征,如可解释性、对无关变量的鲁棒性和输入可扩展性。我们将此方法应用于酿酒酵母中两种网络的推理:蛋白质 - 蛋白质相互作用网络和酶网络。在这两种情况下,我们都获得了与现有方法相竞争的结果。我们还表明,我们的方法在输入数据与其相互作用存在的潜在关系方面提供了相关见解。此外,在基因表达数据分析的背景下,我们证实了我们预测的生物学有效性。
基于输出核树的方法为从实验数据推理生物网络提供了一种有效工具。它们的简单性和可解释性应该使其对生物学家具有很大价值。