Dehmer Matthias M, Barbarini Nicola N, Varmuza Kurt K, Graber Armin A
Institute for Bioinformatics and Translational Research, UMIT, Eduard Wallnoefer Zentrum 1, Hall in Tyrol, Austria.
BMC Struct Biol. 2010 Jun 17;10:18. doi: 10.1186/1472-6807-10-18.
Topological descriptors, other graph measures, and in a broader sense, graph-theoretical methods, have been proven as powerful tools to perform biological network analysis. However, the majority of the developed descriptors and graph-theoretical methods does not have the ability to take vertex- and edge-labels into account, e.g., atom- and bond-types when considering molecular graphs. Indeed, this feature is important to characterize biological networks more meaningfully instead of only considering pure topological information.
In this paper, we put the emphasis on analyzing a special type of biological networks, namely bio-chemical structures. First, we derive entropic measures to calculate the information content of vertex- and edge-labeled graphs and investigate some useful properties thereof. Second, we apply the mentioned measures combined with other well-known descriptors to supervised machine learning methods for predicting Ames mutagenicity. Moreover, we investigate the influence of our topological descriptors - measures for only unlabeled vs. measures for labeled graphs - on the prediction performance of the underlying graph classification problem.
Our study demonstrates that the application of entropic measures to molecules representing graphs is useful to characterize such structures meaningfully. For instance, we have found that if one extends the measures for determining the structural information content of unlabeled graphs to labeled graphs, the uniqueness of the resulting indices is higher. Because measures to structurally characterize labeled graphs are clearly underrepresented so far, the further development of such methods might be valuable and fruitful for solving problems within biological network analysis.
拓扑描述符、其他图测度,以及更广义地说,图论方法,已被证明是进行生物网络分析的有力工具。然而,大多数已开发的描述符和图论方法无法考虑顶点和边的标签,例如在考虑分子图时的原子类型和键类型。实际上,这一特征对于更有意义地表征生物网络很重要,而不仅仅是考虑纯粹的拓扑信息。
在本文中,我们着重分析一种特殊类型的生物网络,即生化结构。首先,我们推导熵测度以计算带顶点和边标签的图的信息含量,并研究其一些有用的性质。其次,我们将上述测度与其他知名描述符相结合,应用于监督机器学习方法来预测埃姆斯致突变性。此外,我们研究了我们的拓扑描述符——仅针对无标签图的测度与针对带标签图的测度——对基础图分类问题预测性能的影响。
我们的研究表明,将熵测度应用于表示图的分子,有助于有意义地表征此类结构。例如,我们发现,如果将用于确定无标签图结构信息含量的测度扩展到带标签图,所得指标的唯一性会更高。由于目前用于在结构上表征带标签图的测度明显不足,进一步开发此类方法可能对解决生物网络分析中的问题有价值且富有成果。