Grafahrend-Belau Eva, Schreiber Falk, Heiner Monika, Sackmann Andrea, Junker Björn H, Grunwald Stefanie, Speer Astrid, Winder Katja, Koch Ina
Technical University of Applied Sciences Berlin, FB VI/FB V, Bioinformatics/Biotechnology, Seestr, 64, 13347 Berlin, Germany.
BMC Bioinformatics. 2008 Feb 8;9:90. doi: 10.1186/1471-2105-9-90.
Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior.With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system.
Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied.
We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability.
We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis.
生化网络的结构分析是生物信息学和系统生物学中一个不断发展的领域。来自分子生物学网络的生物数据量不断增加,这有望带来更深入的理解,但也使研究人员面临组合爆炸的问题。定性网络数据的增长速度比定量数据(如酶动力学数据)快得多。在许多情况下,由于实验方法的限制或伦理原因,甚至无法测量定量数据。因此,有大量的定性数据(如相互作用数据)可用,但到目前为止,这些数据尚未充分用于建模目的。虽然已经开发了新的方法,但数据的复杂性常常限制了许多方法的应用。生化Petri网使探索静态和动态定性系统特性成为可能。一种Petri网方法是基于系统不变性质的计算进行模型验证,重点是t不变量。t不变量对应于描述基本系统行为的子网。随着系统复杂性的增加,基本行为只能用大量的t不变量来表示。根据我们对生化Petri网的验证标准,通过手动解释每个子网(t不变量)来对生物学意义进行必要的验证已不再可行。因此,一种自动的、具有生物学意义的分类方法将有助于分析t不变量,并支持对所考虑生物系统基本行为的理解。
在此,我们引入一种新方法来自动对t不变量进行分类,以应对网络复杂性。我们应用聚类技术,如UPGMA、完全连锁、单连锁和邻接法,并结合不同的距离度量,以获得具有生物学意义的聚类(t聚类),这些聚类可解释为模块。为了找到用于解释的最佳t聚类数量,应用了聚类有效性度量——轮廓宽度。
我们以两个不同的案例研究为例:一个小的信号转导途径(酿酒酵母中的信息素反应途径)和一个中等规模的基因调控网络(杜氏肌营养不良症的基因调控)。我们将t不变量自动分类为功能不同的t聚类,这些聚类在生物学上可解释为网络中的功能模块。我们发现了各种距离度量以及聚类方法在适用性上的差异。就t不变量的生物学意义分类而言,使用Tanimoto距离度量可获得最佳结果。考虑聚类方法,所得结果表明,就生物学可解释性而言,UPGMA和完全连锁适用于对t不变量进行聚类。
我们提出了一种基于聚类分析的Petri网t不变量生物学分类新方法。由于对网络过程进行了具有生物学意义的数据约简和结构化,大量的t不变量可以得到评估,从而实现定性生化Petri网的模型验证。这种方法也可应用于基本模式分析。