The next interesting variant on graphs is directed graphs, or digraphs for
short. A digraph is a graph where each edge distinguished between its source and its target – so an edge is from one node, and to another node. Unlike a simple graph, where if A is adjacent to B, then you can follow the edge either from A to B, or from B to A, in a directed graph, A to B and B to A are different edges.
One of the reasons that I like about graph theory so much is because of
how it ties into computer science. Graphs are fundamental to many problems in
computer science, and a lot of the work in graph theory has direct implications for
algorithm design. It also has a lot of resonance for me, because the work I do involves
tons and tons of graphs: I don’t think I’ve gotten through a week of work in the last decade without
implementing some kind of graph code.
Since I’ve described a lot of graph algorithms, and I’m going to
describe even more, today I’m going to talk a bit about how to represent graphs
in programs, and some of the tradeoffs between different representations.
One amusing thing in graph theory is graph traversal. Many of the interesting algorithms on graph
are ultimately based on the idea of iterating through the nodes of the graph in some order that is
related to the structure of the graph.
There are two fundamental orders of graph traversal, known as breadth-first and depth-first.
Another really fun and fundamental weighted graph problem is the shortest path problem. The
question in: given a connected weighted graph G, what is the shortest path between two vertices
v and w in G?
I got started on this series about graph theory by writing a post debunking something foolish that George Gilder said about network theory. Today I’m going to talk about one of the classic problems in graph theory, which happens to be a network problem. It’s called the maximum flow problem.
The idea is that you have a weighted graph, which is modeling a bunch of places connected by pipes of different sizes. The sizes of the pipes are represented by the weight labels on the edges, which
are called capacities. Pick two nodes in the graph, called the source and the sink, how much can you pump from source to sink without exceeding the capacities of the pipes? That value is called the maximum flow of the graph.
Moving on from simple graphs, one of the things you can do to make things more interesting
is to associate numeric values with the edges, called the weights of those edges. This variation
of a graph is called a weighted graph.
Weighted graphs are extremely useful buggers: many real-world optimization problems ultimately
reduce to some kind of weighted graph problem. A few examples include:
- The traveling salesman problem (TSP): given a set of cities, and information about travel-times
between pairs of cities, find the shortest route that visits all cities. This is actually an
incredibly commonly used problem; for example, in large data centers which manage backups,
a set of backup-restore requests is served by computing the seek time between pairs
of requests (where seek time includes finding the correct tape, loading it into a drive, and
scanning to the correct point on tape), and then finding the optimum order in which to fulfill
the requests by using a TSP algorithm
- Shortest path problem: a subset of the TSP; given a set of cities and information about
travel-time between cities with direct transportation routes, find the shortest path between
- The minimum spanning tree problem: given a weighted graph, find the spanning tree
with minimum total edge-weight. Airlines use minimum spanning trees to work out their basic
route system: flights between hubs are low-cost; other flights have varying prices depending on how many people fly them, what airplanes the airports can support, fuel transport costs, etc. The best
set of routes is the minimum spanning tree.
Often, we use graphs as a structured representation of some kind of information. Many interesting problems involving the things we’re representing can then by described in terms of operations over
graphs. Today, I’m going to talk about some of the basic operations that we can do on graphs, and in later posts, we’ll see how those operations are used to solve graph-based problems.
One application of graph theory that’s particularly interesting to me
is called Ramsey theory. It’s particularly interesting as someone who
debunks a lot of creationist nonsense, because Ramsey theory is in
direct opposition to some of the basic ideas used by bozos to
purportedly refute evolution. What Ramsey theory studies is when some
kind of ordered structure *must* appear, even in a pathologically
chaotic process. Ramsey theory is focused largely on *structures* that
exhibit particular properties, and those structures are usually
represented as graphs.
Today, I’m going to talk a bit about two closely related problems in graph theory: the maximal clique detection problem, and the maximal common subgraph problem. The two problems are interesting both on their own as easy-to-understand but hard-to-compute problems; and also because they have so many applications. In particular, the two are used extensively
in bioinformatics and pharmacology for the analysis of complex molecular structure.
One of the things that I find fascinating about graph theory is that it’s so simple, and
yet, it’s got so much depth. Even when we’re dealing with the simplest form of a graph – undirected graphs with no 1-cycles, there are questions that *seem* like that should be obvious, but which we don’t know the answer to.
For example, there’s something called the *reconstruction theorem*. We strongly suspect that it’s really a theorem, but it remains unproven. What it says is a very precise formal version of the idea that a graph is really fully defined by a canonical collection of its subgraphs.