The next interesting variant on graphs is directed graphs, or digraphs for
short. A digraph is a graph where each edge distinguished between its source and its target – so an edge is from one node, and to another node. Unlike a simple graph, where if A is adjacent to B, then you can follow the edge either from A to B, or from B to A, in a directed graph, A to B and B to A are different edges.
A lot of things change when you’re using digraphs instead of simple graphs. For example, the shortest
path: the shortest path from A to B is different than the shortest path from B to A. Look at the
following example of a weighted digraph. The shortest path from A to B is 2 (A→B); the shortest path
from B to A is 9 (B→D→C→E→A).
Digraphs come up in many places – basically anywhere where direction matters. Think, for example, of
New York City streets. Every intersection is a vertex; streets between intersections are edges. How do you
get from 96th street and Madison to 18th street and 4th ave? Since most of the streets in Manhattan are
one-way, you don’t want to try to do this with an undirected graph – you’ll get directions that you can’t
follow. You need to know the directions of the edges. (An important point, which this illustrates, is
that in a digraph, you can have both an edge from A to B and an edge from B to A.
Two way streets get modeled as two edges – one in each direction.)
For a less obvious example of this, we can look at something very close to home for me. There’s
a very famous software tool, first implemented at Bell Labs, called “make”. The purpose of make
is to provide an automated way of compiling software, which ensures that no more work is done compiling than necessary.
This is important, because there are often many steps to compiling a software system. For example, a program that I wrote to generate some fractals consists of:
- a graph description file (GDF).
- A python program which processes the GDF, and generates a C++ program.
- A library – that is, a set of other C++ sources that aren’t specific to one particular
image, but which know how to generate “pdf” image files – that is used by the C++ program to
generate the image.
- The C++ standard library – which is used both by the program generated from my GDF file,
and by the PDF library. I don’t have to compile this – but it’s a component that gets
included to turn anything into an executable.
So, to generate an image, I write a GDF. Then the python program runs on the GDF, generating
a C++ program. Then the C++ program is compiled. The library is compiled; and then the compiled
C++ program and the two libraries are linked and run, to generate the image.
Suppose I write a GDF file which generates a nice picture of the mandelbrot set. Then I run
make to compile it. It generates C++, compiles the C++ and the library, and runs the program
to generate the image.
Then I want to use a different GDF file – this time, I want to generate a Julia fractal. There’s no
reason to recompile the C++ library. I already did that when I was generating a mandelbrot, and
the library hasn’t changed. I can just re-use the library from last time.
I look at my Julia image, and it’s no good – it’s very pixelated. So I add some kind of
scaling factor to the library. But I’m not changing my gdf file, and I’m not changing the python script. I shouldn’t need to rerun the python script on the unchanged GDF; and that means that
the generated C++ program can’t change either. I need to recompile the library, and I need to
relink the compiled C++.
That’s what make is for. It figures out what needs to be redone, based an what changed. The way
that it does that is by using a dependency graph. A dependency graph is a digraph which
has a vertex for every input to, and every output from, any step of a build process. There is an edge
from a vertex A to a vertex B if changing A means that B needs to be recompiled. For example, below is
the digraph for the scenario I described above. In it, I’ve made C++ files red/orange, libraries and executables green, python source blue, and my gdf file purple. Source files are rounded boxes, libraries are ovals, object files are diamonds, and executables are rectangles.
To build the executable for my fractal generator
(“foo.exe”), I need to have the generated source file “foo.cc”, the pdf library (“libpdf.a”), and
the C++ standard library (“libstdc++.a”); to build “foo.cc”, I need
both an input file, “foo.gdf”, and the python script “gen.py”, and so on.
So, now, suppose I edit “foo.gdf”. That means I need to rebuild everything on any path
from “foo.gdf” up to “foo.exe”: so I need to run “gen.py” to build “foo.cc”; then I need to compile
“foo.cc”, and link it against the two libraries to build “foo.exe”.
If I updated the version of my C++ library, then I would just need to recompile and link “foo.cc”.