IBISC EA 4526, Université d'Évry-Val d'Essonne, 23 Boulevard de France, 91037, Évry, France.
BMC Bioinformatics. 2013 Sep 12;14:273. doi: 10.1186/1471-2105-14-273.
Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules.
We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions.
The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.
尽管已经提出了许多方法,但基因调控网络推断仍然是系统生物学中的一个具有挑战性的问题。当已经有大量关于基因调控网络的知识时,监督网络推断是合适的。这种方法构建了一个能够将类(调节/无调节)分配给基因有序对的二进制分类器。一旦学习,成对分类器就可以用于预测新的调节。在这项工作中,我们探索了马尔可夫逻辑网络(MLN)的框架,该框架将概率图形模型的特征与一阶逻辑规则的表达能力相结合。
我们提出从参与角质形成细胞增殖/分化开关的已知基因调控网络、一组实验转录组数据以及各种用一阶逻辑编码的基因描述开始,学习一个马尔可夫逻辑网络,例如一组关于谓词“调节”的加权规则。由于训练数据不平衡,我们使用不对称装袋来学习一组 MLN。然后可以通过平均单个 MLN 的预测来获得新调节的预测。作为附带贡献,我们提出了三个计算机模拟测试来评估任何成对分类器在各种真实数据集上的网络推断任务中的性能。第一个测试是衡量在平衡边缘预测问题上的平均性能;第二个测试涉及分类器的能力,一旦通过不对称装袋增强,就可以更新给定的网络。最后,我们的主要结果涉及第三个测试,该测试衡量方法在新基因集上预测调节的能力。不出所料,MLN 在仅提供数值离散化基因表达数据的情况下,在 AUPR 方面的性能不如成对 SVM。然而,当通过异构源提供基因属性的更完整描述时,MLN 可以实现与成对 SVM 等黑盒模型相同的性能,同时提供有关预测的相关见解。
数值研究表明,MLN 实现了非常好的预测性能,同时为决策提供了一定的可解释性。除了能够提出新的调节外,这种方法还允许用现有知识交叉验证实验数据。