Monthly Archives: November 2006

Groups and Topology

I’m going to start moving the topology posts in the direction of algebraic topology, which is the part of topology that I’m most interested in. There’s lots more that can be said about homology, homotopy, manifolds, etc., and I may come back to it as some point, but for now, I feel like moving on.
There’s some fun stuff in algebraic topology which comes from the intersection between group theory
and topology. To be able to talk about that, you need the concept of a *topological group*.
First, I’ll run through a very quick review of groups. I wrote a series of posts on group theory for GM/BM when it was at blogger; if you’re interested in details, you might want to [pop over there, and take a skim.](http://goodmath.blogspot.com/2006/06/group-theory-index.html). There are also some excellent articles on group theory [at Wolfram’s mathworld](http://mathworld.wolfram.com/GroupTheory.html),
and [wikipedia](http://en.wikipedia.org/wiki/Group_theory). Then I’ll show you the beginnings of how group theory, abstract algebra, and topology can intersect.

Continue reading

Friday Pathological Programming: Bad Actors in Cruise

Today’s pathological language is a bit of a treat for me. I’m going to show you a twisted,
annoying, and thoroughly pointless language that *I* created.
The language is based on a model of computation called [Actors](http://en.wikipedia.org/wiki/Actor_model), which was originally proposed by Professor Gul Agha of UIUC. There’ve been some really nice languages built using ideas from Actors, but this is *not* one of them. And that’s exactly where the name comes from. What name comes to mind when you think of *really bad* actors with delusions of adequacy? For me, it’s “Cruise”.
You can get the code for Cruise on Google code, project “Cruise”, or you can grab a bundle containing the code, and a compiled binary in a jarfile [here](http://scienceblogs.com/goodmath/upload/2006/11/cruise.zip). To run it, just “java -jar Cruise.jar cruse-program-file”. Just so you know, the code *sucks*. It’s something I threw together in my spare time, so it’s sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

Continue reading

Friday Random Ten, November 10

1. **Porcupine Tree, “Prepare Yourself”**. Porcupine Tree is a strange bad, which started out as an elaborate joke. This is off of their most progressive album, “The Sky Moves Sideways”. It’s a brilliant piece of work.
2. **Dream Thater, “Blind Faith”**
3. **Dirty Three, “Dream Evie”**. Ah, Dirty Three, one of my favorite post-rock ensembles. Very classical sounding group, wonderful.
4. **Tortoise, “By Dawn”**. More post-rock; unfortunately, I find Tortoise rather dull.
5. **Harry Bradley, “Miss Thornton’s”. Traditional Irish music played in exquisite style by one of the great masters of the Irish flute. Not to be missed if you like Irish music.
6. **Tony Trischka Band, “Feed the Horse”**. Tony Trischka is one of the great masters of the banjo; he’s Bela Fleck’s banjo teacher. Tony was doing the jazz thing on the banjo long before Bela. This is off of Tony’s first album with his new band. It’s a terrific song, but it’s got some of the most nonsensical lyrics I’ve seen.
7. **The Flower Kings, “Bavarian Skies”**. A track from the latest album by the gods of neo-progressive rock.
8. **Flook, “Wrong Foot Forward”**. More Irish. Flook is one of the most amazingly fun, high energy, creating trad Irish bands around. Basically, take one of the best Irish tinwhistle players in the world; put him together with one of the top Bodhran players, a solid rhythm guitar player, and a terrific *tenor* flute player, set them loose and watch what happens. I’ve yet to find *anyone* who’s heard them who doesn’t love Flook.
9. **Rachel’s, “Last Things Last”**. Rachel’s… Another really great post-rock ensemble; like the Dirty Three, they’re very classical. But they’re even better at it than the DT or the Clogs. (And the name of the group is “Rachel’s”, not “The Rachels”, not “Rachels” plural, but the possessive of “Rachel”.)
10 **Marillion, “The Damage”**. Marillion’s my long-time favorite prog-rock band. “The Damage” is a track off of their latest album. Not an exception track by Marillion, but almost anything by Marillion is at least good.

Woo Math: Steiner and Theosophical Math

While waiting for I was innocently browsing around the net looking at elementary math curriculums. I want to be able to teach my kids some fun math, just like my dad did with me when I was a kid. So I was browsing around, looking at different ways of teaching math, trying to find fun stuff. In the process, I came across woo-math: that is, incredible crazy woo justified using crazy things derived from legitimate mathematics. And it’s not just a bit of flakiness with a mathematical gloss: it’s big-time, wacky, loonie-tunes grade woo-math: the [Rudolph Steiner Theosophical version of Mathematics](http://www.nct.anth.org.uk/counter.htm). And, well, how could I possibly resist that?

Continue reading

Building Interesting Shapes by Gluing

I thought it would be fun to do a couple of strange shapes to show you the interesting things that you can do with a a bit of glue in topology. There are a couple of standard *strange* manifolds, and I’m going to walk through some simple gluing constructions of them.

Continue reading

Better Glue for Manifolds

After my [initial post about manifolds](http://scienceblogs.com/goodmath/2006/10/manifolds_and_glue.php), I wanted to say a bit more about gluing.
You can form manifolds by gluing manifolds with an arbitrarily small overlap – as little as a single point along the point of contact between the manifolds. The example that I showed, forming a spherical shell out of two circles, used a minimal overlap. If all you want to do is show that the topology you form is a manifold, that kind of trivial gluing is sufficient, and it’s often the easiest way to splice things together.
But there are a lot of applications of manifolds where you need more than that. So today, I’m going to show you how to do proper gluing in a way that preserves things like metric properties of manifolds when they’re glued together.

Continue reading

Programming in Color (fixed)

Todays programming pathology is programs as art.
Start with a really simple stack based language, add in a crazy way of encoding instructions using color, and you end up with a masterpiece of beautiful insanity. It’s not too exciting from a purely computational point of view, but the programs are really great to look at. Yes, it’s a pathological language with truly beautiful source code!
*(The original version of this post had some trouble because I linked to the original images in-place,
which the owner of the Piet webpage had blocked. I didn’t realize he didn’t want links. I’ve since downloaded
the images, coverted them to jpegs, and posted them here. I initially thought that the problem with the images was formats, which is what I originally said in this explanation. It’s not the image format, but the linking; but converting the files to jpeg and uploading them removed the links that caused the problem.)*

Continue reading

Vote for Shelley! (Blogging Scholarships)

As you may have heard from some of the other ScienceBlogs, our SciBling Shelley Batts, of [Retrospectacle](http://www.scienceblogs.com/retrospectacle/) is competing for a scholarship being
given to bloggers. Shelley’s a great writer, and on her way to becoming a great scientist. Please head
over to the [Blogger Scholarships voting](http://www.scholarships-ar-us.org/blog/2006/10/31/vote-for-the-winner-of-the-blogging-scholarship/),
take a look at the finalist, and if you agree with us SBers that Shelley deserves to be the winner, put in a vote for her!

The "C is Efficient" Language Fallacy

I came across an article yesterday about programming languages, which hit on one of my major peeves, so I can’t resist responding. The article is at greythumb.org, and it’s called Programmer’s rant: what should and should not be added to C/C++. It’s a variation on the extremely common belief that C and C++ are the best languages to use when you need code to run fast. They’re not. They’re good at things that need to get very close to the hardware – not in the efficiency sense, but in the sense of needing to be able to fairly directly munge the stack, address specific hardware registers, etc. But they are *dreadful* languages for writing real scientific and/or numerical code.

To quote the part of the article that set me off:

First of all, these fears are nonsense. C and C++ are never going to disappear. Why? Because there
are classes of programming problems that are still and will always be CPU bound and there is still
no language as fast as C or C++ for these problems. I highly doubt that there ever will be.

I’m talking about things like: scientific number crunching, game/simulation physics, raytracing,
real-time 3d graphics, audio processing, codec implementation, high-speed network packet routing,
evolutionary computation (my personal favorite :), and of course implementing all these high-level
languages’ runtimes. There are also problems like OS and hardware driver implementation where you
need something “close to the metal” that can interact closely with and even embed assembly language.
C is basically shorthand for assembler, which is why it’s the preferred language for that kind of
thing.

For these tasks, premature optimization at the level of language and framework choice is not evil.
In some cases it’s a requirement. I predict that at least some of these tasks will still be done in
C, C++, or some language with similar characteristics 50 years from now. To give you an idea of just
how much faster C can be for tasks like this, I have found that evolvable instruction set based
evolutionary computation is almost twice as fast when competently implemented in C than a similar
competent implementation in Java.

Here’s the problem. C and C++ suck rocks as languages for numerical computing. They are not the fastest, not by a longshot. In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++. There’s a good reason that Fortran is still the language of choice for real, intense scientific applications that require the absolute best performance that can be drawn out of our machines – applications like computational fluid dynamics.

Making real applications run really fast is something that’s done with the help of a compiler. Modern architectures have reached the point where people can’t code effectively in assembler anymore – switching the order of two independent instructions can have a dramatic impact on performance in a modern machine, and the constraints that you need to optimize for are just more complicated than people can generally deal with.

So for modern systems, writing an efficient program is sort of a partnership. The human needs to carefully choose algorithms – the machine can’t possibly do that. And the machine needs to carefully compute instruction ordering, pipeline constraints, memory fetch delays, etc. The two together can build really fast systems. But the two parts aren’t independent: the human needs to express the algorithm in a way that allows the compiler to understand it well enough to be able to really optimize it.

And that’s where C and C++ fall down. C and C++ are strongly pointer-based languages. The real semantics of almost anything interesting end up involving pretty much unrestricted pointers. In C and C++, there’s no such thing as an array – there’s just pointers, which you can subscript and a shorthand for pointer arithmetic and indirection(`x[n]` in C/C++ is the same thing as `*(x+n)`.)

That pointer based nature means that in a C or C++ program, it’s very hard for a compiler to figure out what things are independent. It comes down to a problem called alias detection. Alias detection is identifying when two variables *might* be referencing the same location. Alias detection becomes a horrific mess in the presence of unrestricted pointers. Let me show you an example:

for (int i=0; i < 20000) {
  for (int j=0; j < 20000) {
    x[i][j] = y[i-2][j+1] * y[i+1][j-2];
  }
}

If you look at that loop, it can be parallelized or vectorized without any problem if and only if the array pointed to by `x` and the array pointed to by `y` are completely distinct with no overlap. But there’s no way to write code in C or C++ that guarantees that. If it were Fortran-77, you could easily check if they were distinct. If it were Fortran-98, you could check if `x` or `y` were declared as possible pointer targets, and the programmer could make it obvious that they didn’t overlap if they wanted to. But you can’t do that in C or C++. (And Fortran isn’t even the best – an experimental language called Sisal from Lawrence Livermore labs used to be able to beat Fortran by around 20% on typical code!)

That example involves parallelization of code, but alias related problems aren't just an issue for parallelism; it’s just easiest to show an example for parallelism. The aliasing issues in C and C++ have a very direct impact on real code. Let me tell you about a concrete example of this, and then I’ll stop ranting. About six years ago, I was working on a project where I needed to implement a rather messy algorithm to compute something called the "longest common subsequence" (LCS) of two arrays. The standard algorithm for computing LCS is using something called dynamic programming; it's **O***(n3) time, and **O**(n2) space. There’s an algorithm that was designed by people doing computational biology that can do it in the same time, but using on average **O**(n) space.

I didn’t know what language to use for this project, so I decided to do an experiment. I wrote the LCS algorithm in a bunch of different languages, to compare how complex the code was, and how fast it ran. I wrote the comp bio algorithm in C, C++, OCaml, Java, and Python, and recorded the results. What I got timing-wise for running the programs on arrays of 2000 elements each was:

  • C: 0.8 seconds.
  • C++: 2.3 seconds.
  • OCaml: 0.6 seconds interpreted, 0.3 seconds fully compiled.
  • Java: 1 minute 20 seconds.
  • Python: over 5 minutes.

About a year later, testing a new JIT for Java, the Java time was down to 0.7 seconds to run the code, plus about 1 second for the JVM to start up. (The startup times for C, C++, and Ocaml weren’t really measurable – they were smaller than the margin of error for the measurements.)The Objective-Caml bytecode interpreter was faster than the carefully hand-optimized C program! Why? Because the OCaml compiler could recognize that the arrays were completely independent – it didn’t need to worry about one iteration of the loop stepping on the values used by another. The C compiler couldn’t apply a lot of useful optimizations, because it couldn’t be sure that they were valid.

And it’s not just non-assignment based functional languages where you can see supposedly less-efficient high level languages crushing the performance of C/C++. CMU CommonLisp can beat C/C++ on numeric code. There was a paper a few years back documenting it: using a Sun SPARC workstation, if you use the optional type declarations, and write scientific/numeric code in Lisp, using vectors (Lisp arrays) and assignments to implement exactly the same algorithm as C, the CMU CommonLisp code will perform better than C code generated by either the Solaris C compiler or GCC with maximum optimization.

ISCID and the Definition of Specified Complexity

A while ago, I wrote about Dembski’s definition of specified complexity, arguing that it was a non-sensical pile of rubbish, because of the fact that “specified complexity” likes to present itself as being a combination of two distinct concepts: specification and complexity. In various places, Dembski has been fairly clear that his complexity is equivalent to Kolmogorov-Chaitin information complexity, meaning that a complex entity has *high* K-C information content; and in [my debunking of a paper where Dembski tried to define specificiation][debunk-spec], I argue that his definition of specification is basically an entity with *low* K-C information content. Put those two together, and an entity with specified complexity is “An entity with simultaneously high and low K-C information content.”
In the comments on that post, and in some rather abusive emails, some Dembski backers took me to task, alleging that I was misrepresenting the IDists view of specified complexity; that
the definition of specification used by IDists was *not* low K-C complexity, and that therefore, the definition of specified complexity was *not* self-contradictory.
Well, I was just doing a web-search to try to find some article where Dembski makes his case for a fourth law of thermodynamics, and in the search results, I came across a [very interesting discussion thread][iscid-thread] at ISCID (the “International Center for Complexity, Information, and Design”, an alleged professional society of which William Dembski is a fellow in mathematics). In this thread, Salvador Cordova is trying to make an argument that Dembski’s “Fourth Law” actually subsumes the second law. In the course of it, he attempts to define “Specified Complexity”. This thread started back in the spring of 2005, and continues to this day.
>The definition of Specified Complexity you gave is closer to Irreducible Complexity.
>
>Specified Complexity has this thing that is called “Probabilistic Complexity” which means simply
>that it’s improbable.
>
>These defintions are understandably confusing at first, but surmountable.
>
>We have many complexities involved, and seriously each one should be explored, but I’ll have to go
>into the details later:
>
>* Probabilistic Complexity
>* Specificational Complexity
>* Specified Complexity
>* Irreducible Complexity
>* Kolmogorov Complexity
>
>All of these are in Dembski’s book, and should be treated with care, lest one becomes totally
>confused. The diagram addresses 3 of the 5 complexities listed above explicitly.
>
>Probabilistic and Specificational Complexity require a separate discussion.
>
>Specified complexity has these features (per Design Revolution, page 84)
>
>1. Low Specificational Complexity
>2. High Probabilistic Complexity
Sal attempts to wiggle around, but low specificational complexity is, in his own words, means that
an entity that has low specification complexity is one whose description has *low* K-C complexity. Probabilistic complexity is, as far as I know, Sal’s addition. In Dembski’s writings, he’s been
fairly clear that complexity is complexity in the information theory sense – see the Dembski paper linked above, which quotes him explaining why “probabilistic complexity” isn’t sufficient and introducing K-C complexity to fix it.
So – one of Dembski’s associates, in an extensive message thread quoting Dembski’s books, says that specification is *low* K-C complexity; and tries to wiggle around the fact that the complexity part of “specified complexity”, when examined in terms of information theory means that the K-C complexity is high. Ergo, specified complexity as described by Dembski, *is* the property of simultaneously having both high and low K-C complexity.
Sad, isn’t it?
[debunk-spec]: http://scienceblogs.com/goodmath/2006/06/dembskis_profound_lack_of_comp.php
[iscid-thread]: http://www.iscid.org/boards/ubb-get_topic-f-6-t-000562.html