Representing Graphs

One of the reasons that I like about graph theory so much is because of
how it ties into computer science. Graphs are fundamental to many problems in
computer science, and a lot of the work in graph theory has direct implications for
algorithm design. It also has a lot of resonance for me, because the work I do involves
tons and tons of graphs: I don’t think I’ve gotten through a week of work in the last decade without
implementing some kind of graph code.

Since I’ve described a lot of graph algorithms, and I’m going to
describe even more, today I’m going to talk a bit about how to represent graphs
in programs, and some of the tradeoffs between different representations.

There are two different basic graph representations: matrix-based representations, and list
based representations.

The matrix-based representation is called an adjacency matrix. For a graph G with N nodes, an adjacency matrix is an N×N matrix of true/false values, where entry [a,b] is true if and only if there is an edge between a and b. If G is an undirected graph, then the matrix is (obviously)
symmetric: [a,b]=[b,a]. If the graph is directed, then [a,b]=true means that there is an edge from a to b.

As usual, things are clearer with an example. So, over the right, we have a simple
ten node undirected graph, with vertices A through J. Right below, there’s a 10×10 boolean
table showing how the graph would be represented as an adjacency matrix. In the matrix,
a value of “1” in a cell means that there’s an edge connected the corresponding vertex indices; a “0” indicates that there is not.

A B C D E F G H I J
A 0 0 1 0 0 0 0 0 0 0
B 0 0 0 0 0 1 1 0 0 0
C 1 0 0 0 0 0 1 0 0 0
D 0 0 0 0 0 0 1 0 1 0
E 0 0 0 0 0 0 1 0 1 1
F 0 1 0 0 0 0 1 0 0 0
G 0 1 1 1 1 1 0 1 0 1
H 0 0 0 0 0 0 1 0 1 1
I 0 0 0 1 1 0 0 1 0 0
J 0 0 0 0 1 0 1 1 0 0

The list-based representation, commonly called an adjacency list, has a list for each vertex in the graph. For
a vertex v, that list contains identifiers for the vertices which are adjacent to v. So again for our example:

Vertex Neighbors
A C
B F,G
C A,G
D G,I
E G,I,J
F B,G
G B,D,E,F,H,J
H G,I,J
I D,E,H
J E,G,H

A common optimization of the adjacency list is to use a data object to represent a vertex, and to use
pointers/references to vertex objects as the vertex identifiers in the adjacency list for a vertex. Depending
on the application, the adjacency lists could be either arrays, linked lists, or some other similar structure.

To give you some sense of what a simple bit of code looks like, here’s a pair of Python functions to check if two vertices x and y are
adjacent in a matrix. The first piece of code is for an adjacency matrix; the second for an adjacency list.

```def IsAdjacentInMatrix(matrix, x, y):
return matrix[x][y]
```
```def IsAdjacentInList(x, y):
for n in x.neighbors:
if y == n:
return true
return false
```

When you’re writing a graph-based algorithm, choosing which representation – and which variation of which representation
– is one of your most important decisions. There’s no cut and dried correct answer: which representation is right is dependent on what you want to do, and what constraints you have. How fast you need things to be; how much memory you
can afford to use; how big your graph is; how many edges you graph will have – all contribute to the choice of the best representation.

The boolean adjacency matrix is very compact – you can get away with as little as N2/2 bits of storage to
represent the edges in a graph. Certain algorithms – like transitive closure – are extremely efficient in the adjacency
matrix. If you really only need to know the structure of the edges, and you don’t have other data that needs to be
associated with the edges, then an adjacency matrix can be an extremely good choice. If your matrix is moderately sized,
you can often get the entire adjacency matrix into the cache at once, which has a dramatic impact on performance.

Even if you need to associate a little bit of information with an edge, that’s often easily doable using an adjacency
matrix. For example, to represent a weighted graph, you can use the edge-weight as the value of the matrix entry if there
is an edge. (This does require that you be able to identify a value which will absolutely never be an edge weight, to use
to mark cells that are not edges. If you’re using floating point numbers for weight, then NaN is a good choice.) On the
other hand, if you have a very large matrix with very few edges, then an adjacency matrix will perform very poorly and
waste an enormous amount of space. Consider a graph with 1 million vertices, and 1 million edges. That means that each
vertex has, on average, two edges – but you’re using one million bits per vertex. That’s very wasteful, and dealing with a
matrix of 1012 bits is not going to be particularly fast.

Using an adjacency list works much better if you have a sparse matrix – it only uses space for edges that are actually
present in the graph. It’s also often easier to associate data with vertices and edges in a good way using adjacency lists;
and it can often produce better performance in that case as well, due to good locality. (When you traverse an edge in an
adjacency list using the pointer implementation described above, you’re pulling the vertex into your cache; then when you
go to look at the data for the vertex, it’s hot in the cache. In an adjacency matrix, you’d use a separate table for that,
which would mean fetching from a different area of memory.)

The adjacency list is often better for certain kinds of graph traversals: if you need to walk through a graph
collecting information about visited vertices, the locality properties of the adjacency list structure can have a great
effect on performance. But if you want to know something about connectivity, or simple pathfinding in a relatively dense
graph, then the adjacency matrix is usually the right choice.

0 thoughts on “Representing Graphs”

1. Kyle

That’s awesome. I remember adjacency matrices from Combinatorics in college. They were the most interesting thing I have ever seen.
An interesting property of adjacency matrices is that if you square them (or take to the power of n, generally), then the resulting matrix will represent the matrix for all vertices which are connected by 2 edges (or n edges, if generally speaking). This flabbergasted me when I was in school because it doesn’t seem like it “follows” in any sense, until you really think about it.
Typically, you would take the result mod the number that n was in order to just get all 1s and 0s again, just to simplify it.
I wanted to ask: What types of programs have you had to write that involve this type of thing? I’m super-interested in it, but not really sure where to start (other than Maple, which sucks).

2. Kyle

Hmm, maybe I forgot something, but I think the mod n thing was wrong. Someone else know better than I? In any case, it still works well (without the modding). 🙂

3. Ben

I first saw the matrix representation of a graph when dealing with relations in a discrete math course.
We used them to decide certain properties, like whether a relation was reflexive, transitive and so on. Great stuff.

4. Mark C. Chu-Carroll

Kyle:
I haven’t used any existing packages like maple. In the kind of work I do, we just end up writing lots and lots and lots of graphs. In grad school, I did program analysis and compilation; the biggest contribution of my thesis was a graph-based intermediate representation for certain kinds of parallel programming.
After I got into the real world, I did work comparing programming language types in different languages for equivalence. We used a graph-based type representation. Then more compilers (program dependence graphs, call graphs, register interference graphs), software configuration management (version graphs, project topology graphs),
build tools (dependence graphs), more SCM, data marshalling for distributed systems (optimal graph flattening), and so on.
It’s all been very application specific, and implemented by hand on top of programming language libraries. Implementing graphs isn’t generally difficult – that’s part of why people like me like them so much.
For example, if you’ve got Java, putting together an adjacency-list graph using java collections is pretty trivial. I can whip together a basic adjacency-list-by-reference graph with visitors for traversal in Java in about an hour. But when I was at IBM working on the Stellation SCM system, writing the code
to efficiently compute and extract information from a version history graph for complex software system took me about 2 months!

5. Flaky

Adjacency list implemented as pointers/references is of course a quite natural way to represent a graph, especially if you’re not interested in easily iterating through the edges. But this doesn’t work really well with functional programming, since there are no mutable references. One would somehow have to create the whole graph at once. Is something like that even possible in some functional language? What about cyclic data structures in FP in general? (Of course quick googling sorted much of this. But I’m posting anyways.;) )

6. Philip Sternberg

A little note on the adjacency list implementation in python: for many applications, you might be better off using a set rather than a list as the member object that stores neighbors, giving O(1) lookup time.

7. Jolly Bloger

This is all very interesting. I wonder if you could talk about other areas where graph theory is useful. I am just listening to PZ Myers give a talk in Minnesota on the brain (link to the podcast through Pharyngula on Science Blogs). He just talked about how there are about 100 billion neurons, each with about 1000 connections in an adult brain, and this immediately reminded me of the graphs you have been posting about. Is graph theory used in neurology? Can we learn anything interesting about the process or complexity of the brain and thought by looking at neurons as nodes in a graph?

8. Beren

I find this stuff fascinating (:
Presumably, if you needed to associate large amounts of data with an edge in an adjacency matrix you could simply make the value an object reference of some sort instead of a numeric value. You could still test for null or non-null in constant time, and you could store arbitrary amounts of data for each edge.
Does the choice of representation affect the ease of implementing graph-related algorithms, or just the algorithms’ efficiency?

9. Nick

Adjacency matrices are also useful when doing ‘relaxation’ of edges, such as Dijkstra’s – you start with a weighted adjacency matrix, and end up with a matrix of distances from any node to any other.

10. Anonymous

Philip: O(log n), surely?
It’s occasionally useful to also have a collection of all edges, rather than going through all vertices and all of their edges to find, or going through the entire adjacency matrix. One can also represent the graph as just the collection of vertices and edges, but this is usually less useful.

11. Elia Diodati

I’ve always been interested to find out if anyone studies continuous generalizations of graphs, in the sense of the abstract object that corresponds to an “adjacency matrix” whose elements can take values between 0 and 1. Maybe it could be interpreted as an ensemble average of a set of graphs?

12. Mark C. Chu-Carroll

The continuous generalization of graphs are topological spaces. The discrete adjacency relationships become the topological neighborhoods.

13. sfingram

A related matrix to the adjacency matrix that has a zillion practical qualities is the discrete graph Laplacian matrix. It’s used in physics for simulation, machine learning for clustering, digital geometry for mesh smoothing and graph visualization. Also, speaking of continuous generalization of graphs, the discrete laplacian approximates (and converges to in the limit) the continuous Laplacian operator.

14. Anonymous

I’m using adjacency list implementation for direct acyclic graphs in my thesis. I think for DAG implementations, using adjacency list is a standard procedure 🙂 (note: DAG implementations for Task Scheduling)

15. Alper

I’m using adjacency list implementation for direct acyclic graphs in my thesis. I think for DAG implementations, using adjacency list is a standard procedure 🙂
(note: DAG implementations for Task Scheduling)
(note-2: forgot to write my name 🙂

Kyle:

This flabbergasted me when I was in school

I think I know what you mean – my mother studied sociology, and when I peeked into them as a child they discussed this, IIRC, in relationship matrices. (Which I think biologists may use as well.) It was, cool – especially for sociology.

Jolly Bloger:

Can we learn anything interesting about the process or complexity of the brain and thought by looking at neurons as nodes in a graph?

Dunno. Neural networks are used by neuroscientists to give toy models, which should give graphable nodes.
If you are interested, I recommend the paper discussed here, which has a biologically modeled network of the prefrontal cortex showing that symbol-like behavior is spontaneously created and avoids the overtraining problems of simpler networks. As you can see from the model, any graphs are probably best applicable to each of the different populations of neurons making up a working brain.
You could also look at the Blue Brain Project, that aims at similarly modeling a whole mammal brain cortex column. Currently they have modeled 10 000 rat neurons at a schematic level. (Need to complement with the full spectra of different neurotransmitters.)
Maybe their work can give perspective to the conclusions of the above work. They describe the cortical column as:

To model the neocortical column, it is essential to understand the composition, density and distribution of the numerous cortical cell types. Each class of cells is present in specific layers of the column. …

Each neuron is connected to thousands of its neighbors at points where their dendrites or axons touch, known as synapses. In a column with 10,000 neurons, this translates into trillions of possible connections. …

[Add to this that a new type of direct neuron-neuron connection through pores seems to been found, that speeds up communication over that using synapses with a factor 10-100. (Possibly explaining the fast oscillations observed in epilepsy.) These models are far from finished, even if they already hint at some functioning.]

Jolly Bloger:
Seems a sentence got dropped from my last comment. In the end it should say: “So if graphs are initially useful, seems the size of graphs needed for larger models could be bothersome.”
Which of course says the same as I started with: dunno.

19. JBL

There are lots of people interested in questions that begin, “So, I have a graph, but it’s too big to store in any format ….” For example, the blurb on the book “Complex Graphs and Networks,” by Fan Chung and Linyuan Lu, says:
“Through examples of large complex graphs in realistic networks, research in graph theory has been forging ahead into exciting new directions. Graph theory has emerged as a primary tool for detecting numerous hidden structures in various information networks, including Internet graphs, social networks, biological networks, or, more generally, any graph representing relations in massive data sets. [etc.]”

20. Ambitwistor

Jolly Bloger:
People have looked at the network structure of biological neurons before, e.g. this famous paper by Watts and Strogatz (for the nematode worm C. Elegans).

21. g

It seems to me that calling topological spaces *the* continuous generalization of graphs is a bit much. You could equally say that metric spaces fill that role (after all, a graph has a natural notion of distance on it), or something like the “fuzzy graphs” that Elia suggested (keeping the structure discrete but making the metric continuous, in some imprecise sense). I’m sure there are other options.

Ambitwistor:
OT, but while I can’t access the paper for free, I note that the abstracts claim of finding small-world clustering can be understood without graph theory by a simple complexity measure designed for the purpose. The paper finds that constructed networks that are too sparse or too dense in this measure doesn’t behave as the brain (see fig. 3).

23. Jonathan Vos Post

Any comments on the Graph Theoretic component of this fascinating paper posted today on the arXiv?
Fri, 17 Aug 07
[57] arXiv:0708.1362 (cross-list from cond-mat.stat-mech)
Title: Physical limits of inference
Authors: David H. Wolpert
Comments: 36 pages, 2007 CNLS conference on unconventional computation
Subjects: Statistical Mechanics (cond-mat.stat-mech); Computational Complexity (cs.CC); Information Theory (cs.IT); General Relativity and Quantum Cosmology (gr-qc)
We show that physical devices that perform observation, prediction, or recollection share a mathematical structure. We call devices with that structure “inference devices”. We present existence and impossibility results for inference devices. These results hold independent of the precise physical laws of our universe. The impossibility results establish that Laplace was wrong to claim that even in a classical, non-chaotic universe the future can be unerringly predicted. Alternatively, they can be viewed as a non-quantum mechanical “uncertainty principle”. The mathematics of inference devices is related to the theory of Turing Machines (TM’s), e.g., some impossibility results for inference devices are related to the Halting theorem for TM’s. Furthermore, one can define an analog of Universal TM’s (UTM’s) for inference devices, which we call “strong inference devices”. We use strong inference devices to define the “inference complexity” of an inference task, which is analogous to the Kolmogorov complexity of a string. A task-independent bound is derived on the difference in inference complexity of an inference task performed with two different inference devices. This is analogous to the “encoding” bound on the difference in Kolmogorov complexity of a string between two UTM’s. However whereas the Kolmogorov complexity of a string is arbitrary up to specification of the UTM, there is no such arbitrariness in the inference complexity of an inference task. We informally discuss philosophical implications of these results, e.g., for whether the universe “is” a TM. We also derive some graph-theoretic properties governing sets of multiple inference devices. Next we extend the framework to address physical devices used for control, and then to address probabilistic inference.