Kojima Kaname, Nagasaki Masao, Jeong Euna, Kato Mitsuru, Miyano Satoru
Human Genome Center, Institute of Medical Science, University of Tokyo, Shirokanedai, Minato-ku, Tokyo, Japan.
BMC Bioinformatics. 2007 Mar 6;8:76. doi: 10.1186/1471-2105-8-76.
Clearly visualized biopathways provide a great help in understanding biological systems. However, manual drawing of large-scale biopathways is time consuming. We proposed a grid layout algorithm that can handle gene-regulatory networks and signal transduction pathways by considering edge-edge crossing, node-edge crossing, distance measure between nodes, and subcellular localization information from Gene Ontology. Consequently, the layout algorithm succeeded in drastically reducing these crossings in the apoptosis model. However, for larger-scale networks, we encountered three problems: (i) the initial layout is often very far from any local optimum because nodes are initially placed at random, (ii) from a biological viewpoint, human layouts still exceed automatic layouts in understanding because except subcellular localization, it does not fully utilize biological information of pathways, and (iii) it employs a local search strategy in which the neighborhood is obtained by moving one node at each step, and automatic layouts suggest that simultaneous movements of multiple nodes are necessary for better layouts, while such extension may face worsening the time complexity.
We propose a new grid layout algorithm. To address problem (i), we devised a new force-directed algorithm whose output is suitable as the initial layout. For (ii), we considered that an appropriate alignment of nodes having the same biological attribute is one of the most important factors of the comprehension, and we defined a new score function that gives an advantage to such configurations. For solving problem (iii), we developed a search strategy that considers swapping nodes as well as moving a node, while keeping the order of the time complexity. Though a naïve implementation increases by one order, the time complexity, we solved this difficulty by devising a method that caches differences between scores of a layout and its possible updates.
Layouts of the new grid layout algorithm are compared with that of the previous algorithm and human layout in an endothelial cell model, three times as large as the apoptosis model. The total cost of the result from the new grid layout algorithm is similar to that of the human layout. In addition, its convergence time is drastically reduced (40% reduction).
清晰可视化的生物通路有助于理解生物系统。然而,手动绘制大规模生物通路非常耗时。我们提出了一种网格布局算法,该算法通过考虑边与边交叉、节点与边交叉、节点间距离度量以及来自基因本体的亚细胞定位信息,来处理基因调控网络和信号转导通路。因此,该布局算法成功地大幅减少了凋亡模型中的这些交叉。然而,对于更大规模的网络,我们遇到了三个问题:(i)初始布局通常离任何局部最优解都很远,因为节点最初是随机放置的;(ii)从生物学角度来看,人工布局在理解方面仍优于自动布局,因为除了亚细胞定位外,它没有充分利用通路的生物学信息;(iii)它采用局部搜索策略,其中邻域是通过每次移动一个节点获得的,而自动布局表明多个节点的同时移动对于更好的布局是必要的,而这种扩展可能会面临时间复杂度恶化的问题。
我们提出了一种新的网格布局算法。为了解决问题(i),我们设计了一种新的力导向算法,其输出适合作为初始布局。对于问题(ii),我们认为具有相同生物学属性的节点的适当对齐是理解的最重要因素之一,并且我们定义了一个新的评分函数,该函数对这种配置给予优势。为了解决问题(iii),我们开发了一种搜索策略,该策略在考虑移动节点的同时也考虑交换节点,同时保持时间复杂度的阶数。虽然简单的实现会使时间复杂度增加一阶,但我们通过设计一种缓存布局分数与其可能更新之间差异的方法解决了这个难题。
在一个比凋亡模型大三倍的内皮细胞模型中,将新网格布局算法的布局与先前算法的布局以及人工布局进行了比较。新网格布局算法结果的总成本与人工布局的总成本相似。此外,其收敛时间大幅减少(减少了40%)。