{"id":491,"date":"2007-08-15T14:37:25","date_gmt":"2007-08-15T14:37:25","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2007\/08\/15\/representing-graphs\/"},"modified":"2007-08-15T14:37:25","modified_gmt":"2007-08-15T14:37:25","slug":"representing-graphs","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2007\/08\/15\/representing-graphs\/","title":{"rendered":"Representing Graphs"},"content":{"rendered":"<p>One of the reasons that I like about graph theory so much is because of<br \/>\nhow it ties into computer science. Graphs are fundamental to many problems in<br \/>\ncomputer science, and a lot of the work in graph theory has direct implications for<br \/>\nalgorithm design.  It also has a lot of resonance for me, because the work I do involves<br \/>\ntons and tons of graphs: I don&#8217;t think I&#8217;ve gotten through a week of work in the last decade without<br \/>\nimplementing some kind of graph code.<\/p>\n<p> Since I&#8217;ve described a lot of graph algorithms, and I&#8217;m going to<br \/>\ndescribe even more, today I&#8217;m going to talk a bit about how to represent graphs<br \/>\nin programs, and some of the tradeoffs between different representations.<\/p>\n<p><!--more--><\/p>\n<p> There are two different basic graph representations: matrix-based representations, and list<br \/>\nbased representations.<\/p>\n<p> The matrix-based representation is called an <em>adjacency matrix<\/em>. For a graph G with N nodes, an adjacency matrix is an N&times;N matrix of true\/false values, where entry [a,b] is true if and only if there is an edge between a and b. If G is an undirected graph, then the matrix is (obviously)<br \/>\nsymmetric: [a,b]=[b,a]. If the graph is directed, then [a,b]=true means that there is an edge <em>from<\/em> a <em>to<\/em> b.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" alt=\"graph-rep-ex.png\" src=\"https:\/\/i0.wp.com\/scientopia.org\/img-archive\/goodmath\/img_250.png?resize=216%2C184\" width=\"216\" height=\"184\" \/><\/p>\n<p> As usual, things are clearer with an example. So, over the right, we have a simple<br \/>\nten node undirected graph, with vertices A through J. Right below, there&#8217;s a 10&times;10 boolean<br \/>\ntable showing how the graph would be represented as an adjacency matrix. In the matrix,<br \/>\na value of &#8220;1&#8221; in a cell means that there&#8217;s an edge connected the corresponding vertex indices; a &#8220;0&#8221; indicates that there is not. <\/p>\n<p>represented as an adjacency matrix.<\/p>\n<table border=\"1\">\n<tr>\n<th> <\/th>\n<th>A<\/th>\n<th>B<\/th>\n<th>C<\/th>\n<th>D<\/th>\n<th>E<\/th>\n<th>F<\/th>\n<th>G<\/th>\n<th>H<\/th>\n<th>I<\/th>\n<th>J<\/th>\n<\/tr>\n<tr>\n<th>A<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>B<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>C<\/th>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>D<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>E<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<th>F<\/th>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>G<\/th>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<th>H<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<th>I<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<tr>\n<th>J<\/th>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>0<\/td>\n<td>0<\/td>\n<\/tr>\n<\/table>\n<p> The list-based representation, commonly called an <em>adjacency list<\/em>, has a list for each vertex in the graph. For<br \/>\na vertex v, that list contains identifiers for the vertices which are adjacent to v. So again for our example:<\/p>\n<table border=\"1\">\n<tr>\n<th>Vertex<\/th>\n<th>Neighbors<\/th>\n<\/tr>\n<tr>\n<td>A<\/td>\n<td>C <\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>F,G <\/td>\n<\/tr>\n<tr>\n<td>C<\/td>\n<td>A,G <\/td>\n<\/tr>\n<tr>\n<td>D<\/td>\n<td>G,I <\/td>\n<\/tr>\n<tr>\n<td>E<\/td>\n<td>G,I,J <\/td>\n<\/tr>\n<tr>\n<td>F<\/td>\n<td>B,G <\/td>\n<\/tr>\n<tr>\n<td>G<\/td>\n<td>B,D,E,F,H,J <\/td>\n<\/tr>\n<tr>\n<td>H<\/td>\n<td>G,I,J <\/td>\n<\/tr>\n<tr>\n<td>I<\/td>\n<td>D,E,H <\/td>\n<\/tr>\n<tr>\n<td>J<\/td>\n<td>E,G,H <\/td>\n<\/tr>\n<\/table>\n<p> A common optimization of the adjacency list is to use a data object to represent a vertex, and to use<br \/>\npointers\/references to vertex objects as the vertex identifiers in the adjacency list for a vertex. Depending<br \/>\non the application, the adjacency lists could be either arrays, linked lists, or some other similar structure.<\/p>\n<p> To give you some sense of what a simple bit of code looks like, here&#8217;s a pair of Python functions to check if two vertices x and y are<br \/>\nadjacent in a matrix. The first piece of code is for an adjacency matrix; the second for an adjacency list.<\/p>\n<pre>\ndef IsAdjacentInMatrix(matrix, x, y):\nreturn matrix[x][y]\n<\/pre>\n<pre>\ndef IsAdjacentInList(x, y):\nfor n in x.neighbors:\nif y == n:\nreturn true\nreturn false\n<\/pre>\n<p>When you&#8217;re writing a graph-based algorithm, choosing which representation &#8211; and which variation of which representation<br \/>\n&#8211; is one of your most important decisions. There&#8217;s no cut and dried correct answer: which representation is right is dependent on what you want to do, and what constraints you have. How fast you need things to be; how much memory you<br \/>\ncan afford to use; how big your graph is; how many edges you graph will have &#8211; all contribute to the choice of the best representation. <\/p>\n<p>The boolean adjacency matrix is very compact &#8211; you can get away with as little as N<sup>2<\/sup>\/2 bits of storage to<br \/>\nrepresent the edges in a graph. Certain algorithms &#8211; like transitive closure &#8211; are extremely efficient in the adjacency<br \/>\nmatrix. If you really only need to know the structure of the edges, and you don&#8217;t have other data that needs to be<br \/>\nassociated with the edges, then an adjacency matrix can be an extremely good choice. If your matrix is moderately sized,<br \/>\nyou can often get the entire adjacency matrix into the cache at once, which has a dramatic impact on performance.<\/p>\n<p>Even if you need to associate a little bit of information with an edge, that&#8217;s often easily doable using an adjacency<br \/>\nmatrix. For example, to represent a weighted graph, you can use the edge-weight as the value of the matrix entry if there<br \/>\nis an edge. (This does require that you be able to identify a value which will absolutely never be an edge weight, to use<br \/>\nto mark cells that are not edges. If you&#8217;re using floating point numbers for weight, then NaN is a good choice.) On the<br \/>\nother hand, if you have a very large matrix with very few edges, then an adjacency matrix will perform very poorly and<br \/>\nwaste an enormous amount of space. Consider a graph with 1 million vertices, and 1 million edges. That means that each<br \/>\nvertex has, on average, two edges &#8211; but you&#8217;re using one million bits per vertex. That&#8217;s very wasteful, and dealing with a<br \/>\nmatrix of 10<sup>12<\/sup> bits is not going to be particularly fast.<\/p>\n<p> Using an adjacency list works much better if you have a sparse matrix &#8211; it only uses space for edges that are actually<br \/>\npresent in the graph. It&#8217;s also often easier to associate data with vertices and edges in a good way using adjacency lists;<br \/>\nand it can often produce better performance in that case as well, due to good locality. (When you traverse an edge in an<br \/>\nadjacency list using the pointer implementation described above, you&#8217;re pulling the vertex into your cache; then when you<br \/>\ngo to look at the data for the vertex, it&#8217;s hot in the cache. In an adjacency matrix, you&#8217;d use a separate table for that,<br \/>\nwhich would mean fetching from a different area of memory.)<\/p>\n<p> The adjacency list is often better for certain kinds of graph traversals: if you need to walk through a graph<br \/>\ncollecting information about visited vertices, the locality properties of the adjacency list structure can have a great<br \/>\neffect on performance. But if you want to know something about connectivity, or simple pathfinding in a relatively dense<br \/>\ngraph, then the adjacency matrix is usually the right choice.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the reasons that I like about graph theory so much is because of how it ties into computer science. Graphs are fundamental to many problems in computer science, and a lot of the work in graph theory has direct implications for algorithm design. It also has a lot of resonance for me, because [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[25],"tags":[],"class_list":["post-491","post","type-post","status-publish","format-standard","hentry","category-graph-theory"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-7V","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=491"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/491\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=491"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=491"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}