Hayashida Morihiro, Akutsu Tatsuya
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.
BMC Syst Biol. 2010 Sep 13;4 Suppl 2(Suppl 2):S13. doi: 10.1186/1752-0509-4-S2-S13.
Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges.
This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae, and B. subtilis, and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks.
Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.
各类生物数据的比较是生物信息学和系统生物学中的主要问题之一。数据压缩方法已应用于大序列数据和蛋白质结构数据的比较。由于比较大型生物网络的全局结构仍然困难,尝试将数据压缩方法应用于生物网络的比较是合理的。在现有的压缩方法中,由于重叠边的选择存在一些模糊性,压缩结果的唯一性无法得到保证。
本文提出了用于比较大型生物网络的新颖高效方法CompressEdge和CompressVertices。在所提出的方法中,通过迭代收缩相同边和相连边集来压缩原始网络结构。然后,通过级联网络的压缩率来衡量两个网络的相似度。所提出的方法应用于几种生物(智人、小家鼠、拟南芥、黑腹果蝇、秀丽隐杆线虫、大肠杆菌﹑酿酒酵母和枯草芽孢杆菌)代谢网络的比较,并与现有方法进行比较。这些结果表明我们的方法能够有效地测量代谢网络之间的相似度。
我们提出的压缩带节点标记网络的算法,对于测量大型生物网络的相似度是有用的。