Directed Graphs

weighted-digraph.png

The next interesting variant on graphs is directed graphs, or digraphs for
short. A digraph is a graph where each edge distinguished between its source and its target – so an edge is from one node, and to another node. Unlike a simple graph, where if A is adjacent to B, then you can follow the edge either from A to B, or from B to A, in a directed graph, A to B and B to A are different edges.

A lot of things change when you’re using digraphs instead of simple graphs. For example, the shortest
path: the shortest path from A to B is different than the shortest path from B to A. Look at the
following example of a weighted digraph. The shortest path from A to B is 2 (A→B); the shortest path
from B to A is 9 (B→D→C→E→A).

Digraphs come up in many places – basically anywhere where direction matters. Think, for example, of
New York City streets. Every intersection is a vertex; streets between intersections are edges. How do you
get from 96th street and Madison to 18th street and 4th ave? Since most of the streets in Manhattan are
one-way, you don’t want to try to do this with an undirected graph – you’ll get directions that you can’t
follow. You need to know the directions of the edges. (An important point, which this illustrates, is
that in a digraph, you can have both an edge from A to B and an edge from B to A.
Two way streets get modeled as two edges – one in each direction.)

For a less obvious example of this, we can look at something very close to home for me. There’s
a very famous software tool, first implemented at Bell Labs, called “make”. The purpose of make
is to provide an automated way of compiling software, which ensures that no more work is done compiling than necessary.

This is important, because there are often many steps to compiling a software system. For example, a program that I wrote to generate some fractals consists of:

  1. a graph description file (GDF).
  2. A python program which processes the GDF, and generates a C++ program.
  3. A library – that is, a set of other C++ sources that aren’t specific to one particular
    image, but which know how to generate “pdf” image files – that is used by the C++ program to
    generate the image.
  4. The C++ standard library – which is used both by the program generated from my GDF file,
    and by the PDF library. I don’t have to compile this – but it’s a component that gets
    included to turn anything into an executable.

So, to generate an image, I write a GDF. Then the python program runs on the GDF, generating
a C++ program. Then the C++ program is compiled. The library is compiled; and then the compiled
C++ program and the two libraries are linked and run, to generate the image.

Suppose I write a GDF file which generates a nice picture of the mandelbrot set. Then I run
make to compile it. It generates C++, compiles the C++ and the library, and runs the program
to generate the image.

Then I want to use a different GDF file – this time, I want to generate a Julia fractal. There’s no
reason to recompile the C++ library. I already did that when I was generating a mandelbrot, and
the library hasn’t changed. I can just re-use the library from last time.

I look at my Julia image, and it’s no good – it’s very pixelated. So I add some kind of
scaling factor to the library. But I’m not changing my gdf file, and I’m not changing the python script. I shouldn’t need to rerun the python script on the unchanged GDF; and that means that
the generated C++ program can’t change either. I need to recompile the library, and I need to
relink the compiled C++.

That’s what make is for. It figures out what needs to be redone, based an what changed. The way
that it does that is by using a dependency graph. A dependency graph is a digraph which
has a vertex for every input to, and every output from, any step of a build process. There is an edge
from a vertex A to a vertex B if changing A means that B needs to be recompiled. For example, below is
the digraph for the scenario I described above. In it, I’ve made C++ files red/orange, libraries and executables green, python source blue, and my gdf file purple. Source files are rounded boxes, libraries are ovals, object files are diamonds, and executables are rectangles.

depgraph.png

To build the executable for my fractal generator
(“foo.exe”), I need to have the generated source file “foo.cc”, the pdf library (“libpdf.a”), and
the C++ standard library (“libstdc++.a”); to build “foo.cc”, I need
both an input file, “foo.gdf”, and the python script “gen.py”, and so on.

So, now, suppose I edit “foo.gdf”. That means I need to rebuild everything on any path
from “foo.gdf” up to “foo.exe”: so I need to run “gen.py” to build “foo.cc”; then I need to compile
“foo.cc”, and link it against the two libraries to build “foo.exe”.

If I updated the version of my C++ library, then I would just need to recompile and link “foo.cc”.

0 thoughts on “Directed Graphs

  1. Jonathan Vos Post

    My favorite subset of Digraphs are: (1) Functional graphs; (2) The functional graphs of prime endofunctions.
    To clarify:
    (1) A functional graph is a digraph in which each vertex has outdegree one, and can therefore be specified by a function mapping {1,…,n} onto itself.
    (2) By a prime endofunction, I mean a digraph from {1,…,n} to {1,..,n}, hence an endomorphism, which is neither the direct sum nor the categorical product of other prime endofunctions. The enumeration of these turns out to be rather tricky. My son and I have been working on a paper about this for a year, drafts of which were final projects for his last Math course at Cal State L.A., where he earned his double B.S. in Math and computer science. Yesterday, still just eighteen years old, he moved into the graduate dorms at USC and began Law School.
    (3) The number of different weakly connected digraphs (which may have loops) on n unlabeled nodes which have at least one node with outdegree greater than 1. A “loop” is an edge from a point to itself. “Simple” means that there cannot be two or more edges of the same direction connecting any given pair of points. A weakly connected digraph is [Weisstein]: “A directed graph in which it is possible to reach any node starting from any other node by traversing edges in some direction (i.e., not necessarily in the direction they point). The nodes in a weakly connected digraph therefore must all have either outdegree or indegree of at least 1.” A003085 is Number of connected [simple] digraphs with n nodes [and no loops]. Simple digraphs with loops = binary relations, so this sequence enumerates binary relations which are neither partial functions nor functions.
    (4) On OEIS, we find the enumeration
    A003085 Number of connected digraphs with n nodes.
    There are also partial endofunctions, and many other sets of some interest. But underlying it all, Digraphs. They rock!

    Reply
  2. Torbjörn Larsson, OM

    Great post, combining two of my favorite functional facilities, streets of Manhattan and make, both very slick.
    Of course, the almost regular digraph of Manhattan is still dynamic – as a frakking furreigner I initially didn’t know that “parallel parking” has another meaning in NY, but got stuck behind a parked 2nd lane in front of a truck. Those truck drivers sure know to honk their horns…

    Reply
  3. Mark C. Chu-Carroll

    Guy:
    All of the graph diagrams, and most of the diagrams used on the blog are drawn using OmniGraffle on the mac. It’s one of the best pieces of software that I’ve ever used: it’s extremely powerful and flexible, and it’s got a beautiful and easy to work with UI.

    Reply
  4. Jonathan Vos Post

    The connection between digraphs and biogenesis and evolution is suggested (I love the word “eigendynamics” which did not exist when I de facto used it in my PhD research mid 1970s) by:
    arXiv:0708.4212
    Title: Aggregate Dynamics in an Evolutionary Network Model
    Authors: Adrian M. Seufert, Frank Schweitzer
    Subjects: Populations and Evolution (q-bio.PE); Adaptation and Self-Organizing Systems (nlin.AO); Molecular Networks (q-bio.MN); Quantitative Methods (q-bio.QM)
    We analyze a model of interacting agents (e.g. prebiotic chemical species) which are represended by nodes of a network, whereas their interactions are mapped onto directed links between these nodes. On a fast time scale, each agent follows an eigendynamics based on catalytic support from other nodes, whereas on a much slower time scale the network evolves through selection and mutation of its nodes-agent. In the first part of the paper, we explain the dynamics of the model by means of characteristic snapshots of the network evolution and confirm earlier findings on crashes an recoveries in the network structure. In the second part, we focus on the aggregate behavior of the network dynamics. We show that the disruptions in the network structure are smoothed out, so that the average evolution can be described by a growth regime followed by a saturation regime, without an initial random regime. For the saturation regime, we obtain a logarithmic scaling between the average connectivity per node $mean{l}_{s}$ and a parameter $m$, describing the average incoming connectivity, which is independent of the system size $N$.

    Reply

Leave a Reply to Guy Eschemann Cancel reply