BioGraphLayout

Motivation

Visualization of biopathways is one of the important tasks for understanding biological systems. A lot of biopathway modeling and simulation applications subsequently have been developed, e.g., Cell Illustrator, Cytoscape, PATIKA, CADLIVE and Pajek. However, it is pretty difficult to manually draw the biopathways containing a large amount of entities and various types of components. Thus, automatic drawing of biopathways is considerd as the essential approach for systems biology.

Biological pathways can be categorized into metabolic pathways, signal transduction pathways, and gene-reguratory networks. While metabolic pathways have several layout algorithms, just a few layout algorithms have been proposed for signal transduction pathways and gene-regulatory networks. However, the layout algorithms for signal transuduction pathways and gene-reguratory networks are supposed to be more efficient and give more understandable layouts than those for metabolic pathways because signal transduction pathways and gene-reguratory networks handle various kinds of the biological reactions and activities and tend to be complicated and confusing whereas only chemical reactions are represented on metabolic pathways . Thus our approach is mainly focused on the signal transduction pathways and gene-reguratory networks.

Project Members

  • Masao Nagasaki
  • Atsushi Doi
  • Euna Jeong
  • Mitsuru Kato(alumnus)
  • Kaname Kojima

Grid layout algorithms

Our graph layout algorithm is based on the grid layout. With the grid layout framework nodes are allowed to be placed only on the grid points. The first application of the grid layout algorithm for the biopathway was given by Li et al. We invented a new layout algorithm for biopathways based on the concept of their grid layout algorithm.

The CB-grid layout algorithm

The CB-grid layout algorithm is one of the greedy algorithms. At each step, the algorithm updates the layout by moving a node to a vacant point such that the movement reduces its cost function the most among the all possible combination of nodes and vacant points. This update process is repeated until the cost function cannot be reduced, and the algorithm finally outputs the optimal layout.

For visualization, a layout with less edge-edge crossings, node-edge crossings, and overlaps between nodes than any other layout is desirable. Overlaps between nodes can be avoidable with ease on the grid layout algorithm since nodes are placed on the different grid points. Thus we proposed to add two types of costs, the edge-edge crossing cost and the node-edge crossing cost to the cost function, which are propotional to the number of edge-edge crossings and the number of node-edge crossings, respectively while the cost function proposed by Li et al. considers only the distance between nodes.

In order to find the pair of the node and the vacant point, which gives the largest reduction to the cost function we need to calculate the cost differences for all the pair of the nodes and the vacant points at each step. In addition, even for the cost difference with respect to the single node movement the costs between the node and all other nodes are needed since the cost is defined between the nodes. Accordingly, the time complexity of the calculation at each step is quite tough with the naive solution. Li et al. then poposed an efficient calculation strategy, which utilizes the cost differences stored at the previous step. The main concept of this strategy is that we can reuse the cost differences calculated at the previous step to calculate the current cost difference efficiently since only one node is moved at each step. To devise the concept they introduced the delta matrix in which the cost differences are stored at each step for the calculation of the cost differences at the next step. The delta matrix proposed by Li et al. is focused only on the distance between nodes and subsequently independent of the configulation of edges, although our cost function is influenced by the configulation of edges as well as the configulation of nodes. Thus we invented new calculation paradigm of the delta matrix to cope with our cost function formalism.

Since our algorithm is for biopathways and the entities and processes in the biopathways are subsequently dealt with as nodes, we put the restrictions on the movement of the nodes to make the nodes statisfed with their subcelluar informations such as extracelluar, membrane and nucleus. Due to the restriction we can reduce the calculation time of our algorithm since the number of combination of the nodes and the vacant points are decreased.

We applied the CB-grid layout algorithm to a pathway model of the apoptosis induced by fas ligand to evaluate its performance. The model consists of 117 nodes and 126 edges. We generated the initial layout by placing nodes randomly as shown in Figure 1 and applied the CB-grid layout algorithm to it. From the result as shown in Figure 2, we see the CB-grid layout algorithm provides clear and understandable layouts due to the drastic reduction of the crossings.

Figure 1: an initial layout for the CB-grid layout algorithm, in which nodes are placed randomly, satisfying their sub-celluar conditions. 1687 edge-edge crossings and 113 node-edge crossings occur in the layout.

Figure 2: the result of the CB-grid layout algorithm, which has 19 edge-edge crossings and 40 node-edge crossings.

The multi-step CB-grid layout algorithm

As is stated above, the CB-grid layout algorithm is one of greedy algorithms and subsequently generates the local optima. Li et al. then proposed to combine the annealing method with their layout algorithm so that the layout algorithm converges on the local optima that is far from the global optima. In order to obtain the better local optima without use of the annealing method we proposed the multi-step CB-grid layout algorithm. Given an integer k > 0, movement of nodes up to next k th step is searched in the multi-step CB-grid layout: on the other hand, only the next step is searched in the CB-grid layout. The CB-grid layout algorithm updates delta matrix from the previous delta matrix and selects the movement whose value on the matrix is the minimum, and this procedure is repeated until convergence. In contrast, the multi-step CB-grid layout calculate all pattern of the delta matrices up to the next k the step recursively, and selects the movement whose value on the k th delta matrices is the minimum. Like the CB-grid layout algorithm, the multi-step CB-grid layout algorithm repeats each step until the algorithm stops. Since the search space is extended by searching multi-steps, the result of the multi-step CB-grid layout algorithm can be better local optima than that of the CB-grid layout algorithm.

Publications

  1. Kojima K, Nagasaki M, Miyano S. Fast grid layout algorithm for biological networks with sweep calculation. Bioinformatics. 2008; 24(12):1433-41.
  2. Kojima K, Nagasaki M, Jeong E, Kato M, Miyano S. An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics 2007;8:76.(.html)
  3. Kato M, Nagasaki M, Doi A, Miyano S. Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform. 2005;16(2):22-31.(.pdf)

References

  1. Kurata H, Matoba N, and Shimizu N. CADLIVE for constructing a large-scale biochemical network based on a simulation-directed notation and its application to yeast cell cycle. Nucleic Acids Res. 2003;31(14):4071-84.
  2. Kurata H, Masaki K, Sumida Y, Iwasaki R. CADLIVE dynamic simulator: Direct link of biochemical networks to dynamic models. Genome Res. 2005;15(4):590-600.
  3. Li W, Kurata H. A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics 2005;21(9):2036-42.
  4. Becker M, Rojas I. A graph layout algorithm for drawing metabolic pathways. Bioinformatics 2001;17(5):461-7.
  5. Nagasaki M, Doi A, Matsuno H, Miyano S. Genomic Object Net: I. A platform for modelling and simulating biopathways. Appl Bioinformatics 2003;2(3):181-4.
  6. Shannon P, Markiel A, Ozier O, Baliga, NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498-504.
  7. Sirava M, Schafer T, Eiglsperger M, Kaufmann M, Kohlgacher O, Bornberg-Bauer E, Lenhof HP. BioMiner-modeling, analyzing, and visualizing biochemical pathways and networks. Bioinformatics 2002;18:219-30.
  8. Demir E, Babur O, Dogrusoz U, Gursoy A, Nisanci G, Atalay RC, Ozturk M. PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics 2002;18(7):996-1003.
  9. Dogrusoz U, Erson EZ,Giral E, Demir E, Babur O, Cetintas A, Colak R. PATIKAweb: a Web interface for analyzing biological pathways through advanced querying and visualization Bioinformatics 2006;22(3):374-5.

Application

  • BioGraphLayout online