The Go I Forgot: Concurrency and Go-Routines

A couple of people pointed out that in my wednesday post about Go, I completely left out the concurrency stuff! That’s what I get for rushing the post – I managed to leave out one of the most interesting subjects! Go provides very strong support for communicating processes.

I haven’t done a lot of hacking with the concurrency stuff yet – so my impressions of it are still very preliminary. But my early impressions are very good.

A lot of people have been talking about Go as a language for distributed computation. I don’t think that’s correct: Go programs run in a single, shared address space. You can have multiple concurrent threads; if you have multiple CPUs, those threads can be run in parallel. But it’s only concurrency within a single OS process and a single address space. To me at least, distributed means that a program runs on multiple machines. Go is pretty good for writing distributed programs, but that’s not what the concurrency support in Go is; What it is is really solid support for coroutine-style threading, with explicit communication between the threads.

(Addendum 2011: Go now supports a library called netchan, and with that, it’s really a great language for distributed programs. In fact, in my opinion, it’s the best imperative language for distributed programs.)

The flavor of the communication system is very π-calculus-like. You can, at any time, start a new thread by invoking a function: just prefix the invocation with the word “go”. So “f(x)” is a function call; “go f(x)” is an invocation of a goroutine which is, which runs concurrently with the code that called it.

Once you’ve created a go-routine, you can only talk to it through channels> – you don’t have any handle or identifier for the executing goroutine, unless you create one by using a channel. A channel is very much like the π-calculus concept of a channel name: it’s a first-class value which can be passed around. Channels support two operations: output, and input. Any channel operation blocks until a matching operation is executed by a different goroutine. So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read. The channels are strongly typed – you can only pass a single type of value over a channel.

So, for example, to write a program where you launch a thread to do a computation, and then read the result when it’s done, you could do:

func ComputeAValue(c chan float64) {
  // Do the computation.
  x := 2.3423 / 534.23423
  c <- x
}

func UseAGoroutine() {
  channel := make(chan float64)
  go ComputeAValue(channel)
  // do something else for a while
  value := <- channel
  fmt.Printf("Result was: %v", value)
}

It’s very simple, and very convenient. And since Go has anonymous functions, you could also write it as:

func UseAGoroutine() {
  channel := make(chan float64)
  go func(c chan float64) {
    // Do the computation.
    x := 2.3423 / 534.23423
    c <- x
  }(channel)
  // do something else for a while
  value := <- channel;
  fmt.Printf("Result was: %v", value);
}

Even better, Go has lexical scope – so you don’t actually need to pass the channel:

func UseAGoroutine() {
  channel := make(chan float64)
  go func() {
    // Do the computation.
    x := 2.3423 / 534.23423
    c <- x
  }()
  // do something else for a while
  value := <- channel
  fmt.Printf("Result was: %v", value)
}

Goroutines all run in the same thread, so you can also do implicit communication via shared variables. The standard libraries include things like mutexes so that you can implement other forms of communication and sharing between goroutines, but communication via channels is intended to be the main way that gorountines interact.

In the implementation, the goroutines are very cheap and lightweight, and they’re multiplexed onto OS threads. So you don’t need to worry (much) about creating lots of goroutines – they’ll get mapped onto a manageable set of OS threads. This allows you to do some really neat things where you create tons of goroutines. An example that I really like is in the Go tutorial: it’s a go-routine based version of the sieve algorithm for computing primes.

In the sieve, you start with the set of all integers greater than or equal to 2. 2 is the first prime – so you go through the rest of the list, and get rid of all multiples of 2. Now, the first number remaining in the list is the next prime: 3. So you go through and strike out all multiples of 3. Then the first element of the list is the next prime, 5. And so on.

This translates pretty naturally into a collection of concurrent filters. Each filter removes multiples of one prime. You just pump numbers through the filter in sequence. Anything that passes the filter for 2 gets sent to the filter for three. Anything that passes the filter for three gets sent to the filter for 5. Anything that gets past the last filter is the next prime – so you add another filter to the chain, and return the new prime.

To write that in Go, you start with a go-routine that generates the sequence of integers:

func generate_integers() chan int {
  ch := make(chan int)
  go func(){
    for i := 2; ; i++ {
      ch <- i
    }
  }()
  return ch;
}

That’s very simple: just create a channel, and then start a goroutine that loops through the integers greater than or equal to to, writing them to the channel. It returns the channel that you can read the numbers from.

Next, we need to write a filter that removes all multiples of some number:

func filter_multiples(in chan int, prime int) chan int {
  out := make(chan int)
  go func() {
    for {
      if i := <- in; i % prime != 0 {
        out <- i
      }
    }
  }();
  return out;
}

That creates a channel, which will be used to emit the values that pass the filter. Then it starts a goroutine which actually runs the filter: itreads values from the input channel, and then tests them to see if they’re a multiple of the filter prime. If they’re not, it writes them to the output channel. (You can see another bit of minimalism there; all loops in Go are written using for.)

Now for the clever part: stringing the go-routines together.

func sieve() chan int {
  out := make(chan int)
  go func() {
    ch := generate_integers()
    for {
      prime := <- ch
      out <- prime
      ch = filter_multiples(ch, prime)
    }
  }()
  return out
}

So – we create an initial output channel – that’s the channel from which we’ll be able to read the primes. Then we start running the sieve in a goroutine. The goroutine starts the integer sequence generator. Then it runs a loop. It grabs the next number in sequence, which is the next prime. It writes that to the output channel. Then it creates a new filter, which filters that one. It takes the output from the previous filter, and chains it to be the input to the next one. Soyou wind up building a chain of filters. Each time you get a new prime, it creates a new filter with its own goroutine, and adds it to the chain.

Then you can easily print out the primes:

func main() {
  primes := sieve()
  for {
    fmt.Println(<-primes);
  }
}

That’s obviously a toy example – but it’s a very cool toy example.

More real examples of where this could be useful? Well, in my not-very-abundant free time, I’ve been working on a text editor. I’m building it in a very decoupled way – there’s a backend which has an interface that’s almost like a little programming language based on regular expressions, and a front-end that basically translates keystrokes and UI actions into commands. In Go, it’s a pair of goroutines. The UI sends commands to the backend; the backend sends UI updates to the front-end. It’s all done with a single pair of channels. I came up with that basic design before I knew about Go; I started the original implementation in Java using threads, queues, and locks to build the communication between the components. In Go, it’s much simpler, and much cleaner. And I can add other things into the process just by chaining together channels.

There is, of course, a lot more to goroutines and channels than what I’ve written here. The type system for channels is pretty nice: you can distinguish between generic channels (with which you can do whatever you want), read-only channels, and write-only channels. You can use a switch-like select block to allow you to do things like wait for an input from any of a group of channels. You can set buffer sizes.

Why a whole post about this? Because, these days, we’re all using multi-core computers. For the most part, the way that we’re making things faster isn’t really by making them faster: we’ve been running into some physical barriers that make it hard to keep pushing raw CPU speed. Instead, we’ve been adding CPUs: my laptop has 2 processors, and my server has 8 processors. We’re getting things done faster by dividing the work up into pieces, and doing those pieces simultaneously. But most of our programming languages really suck at describing concurrency.

Goroutines and channels provide the best support I’ve seen outside of Erlang for making use of concurrency. And frankly, I think Go is a lot less ugly than Erlang. (Sorry Erlang fans, but I really don’t like Erlong.) Compared to Java, which I think is the main competitor to Go in this area, Go’s goroutines and channels are just so much easier to work with than Java threads and locks, there’s just absolutely no comparison at all. Go pretty much destroys the competition in this area.

Of course, some of my general complaints about Go carry over. Channels are basically parametric types. But I can’t write my own parametric channels. In other words, I can’t write a piece of code that does something like create generic software-transactional memory, and then let people create a cell of STM that specifically stores integers. I can create an STM cell which stores values – but I can’t do much to say what those values are.

On the other hand, channels do make it easy to do good exception handling of concurrent code. You add a parameter to the goroutine function which is a channel that will be used to signal errors. If anything goes wrong in the goroutine, you have it send an error report to that channel. That’s actually about as good as any thread-based exception handling system I’ve ever seen.

0 thoughts on “The Go I Forgot: Concurrency and Go-Routines

  1. DSimon

    Any channel operation blocks until a matching operation is executed by a different goroutine. So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.

    I’m a little confused about this. Suppose I’ve got a UI thread and a job thread, and I want the UI to show a spinning hourglass until the job thread finishes computing and returns its result. How do I check to see if the job thread has finished without blocking my UI thread?

    Reply
  2. Nicolai Hähnle

    I suppose you could create another goroutine that blocks on the job finishing and updates the mouse cursor.
    If you’re working with some event-driven GUI in which certain GUI operations can only be performed from the main thread/goroutine, then I suppose your GUI library would use an event channel (instead of an event queue). In that case, it is only a matter of writing an appropriate event into the event channel once the job has finished. (I am assuming here that a channel can be written to from different goroutines.)

    Reply
  3. John

    Not that I’ve run it, but for the example after talking about lexical scope, I think the line “c <- x;” should be “channel <- x;”, right?
    (Ugh, preview mangles the &xx; html codes in the comment box!)

    Reply
  4. Mike

    I think you need to change the name of the channel in the lexical scope version of UseAGoRoutine to match the name inside of func.
    Nipicking aside, this is a great introduction to a very interesting language. Thanks.

    Reply
  5. brett

    On the other hand, channels do make it easy to do good exception handling of concurrent code. You add a parameter to the goroutine function which is a channel that will be used to signal errors. If anything goes wrong in the goroutine, you have it send an error report to that channel. That’s actually about as good as any thread-based exception handling system I’ve ever seen.

    Does Go have some way to wait on multiple channels? It seems like if you have one channel for the result and another for the error and you can only wait on one channel if the other channel gets written to then the thread that started the goroutine will never wake up.

    Reply
  6. Michael Chermside

    You write “So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.”
    Somehow that seems INCREDIBLY limiting. What am I missing here? From that primitive, how does one build a thread which works on the first available from several different queues? Or like Brett pointed out, how do you wait on EITHER the work OR the exception channel? Or how, from only a non-blocking primitive, do you build something which performs a read with a timeout?
    The only way I can imagine is for each of these cases to have a separate thread whose sole purpose is to multiplex the incoming requests and pass them on to the receiving thread. (And I’m not sure that would allow you to build the timout or the select.) Am I overlooking or misunderstanding something?

    Reply
  7. Mark James

    @How do I check to see if the job thread has finished without blocking my UI thread?
    There’s a select statement, sort of like a switch in C.

    Reply
  8. Slartibartfast

    In primes example, it seems like you can make channel behave like a lazy evaluated list in Haskell. How far can this parallel be pushed?

    Reply
  9. Jedaï

    Nice stuff, amusingly enough your parallel sieve example has recently been treated in a blog article by Neil Brown with the Haskell CHP (Communicating Haskell Process) library.
    Still the lack of user generics (or parametric polymorphism) is an ugly wart…

    Reply
  10. aaron

    How does the gorutines handle functions being closures? It would seem to have potential for causing a lot of problems with multiple gorutines accessing the same variables (from parent scopes).

    Reply
  11. fd

    I’m a bit surprised at all the buzz this is getting *now*. It looks to me like go’s concurrency model was lifted lock, stock, and barrel from newsqueak, also developed by Pike–back in 1989. (See Pike’s google tech-talk on it from 2007.)

    Reply
  12. markmuetz

    Any channel operation blocks until a matching operation is executed by a different goroutine. So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.

    I think this is only true if you use an unbuffered channel, but you can set the buffer for a channel with e.g.
    out := make(chan int, 20);
    This lets you write asynchronously to a channel (if I understood it correctly, haven’t experimented with it yet). See here for more details.

    Reply
  13. Ben Kudria

    Rörd: 41 lines of clean and straight-forward Go code vs. 85 lines of names, boilerplate, try-catch blocks, @annotations. I think the comparison is … illustrative.

    Reply
  14. Rörd Hinrichsen

    Ben: Yes, I’ve noticed that too. I’m not a big fan of Java either, but I did have a faint memory that the Java library has something similar to channels and I wanted to check this memory.
    PS, here’s an even shorter implementation of that algorithm (though without threads):

    sieve (x:xs) = x : sieve [y | y 0]
    primes = sieve [2..]

    Reply
  15. Rörd Hinrichsen

    I should’ve looked closer at the preview, I hope it’ll work now.
    sieve (x:xs) = x : sieve [y | y <- xs , y `mod` x > 0]
    primes = sieve [2..]

    Reply
  16. Marko

    On one hand, Go provides us with things like
    stepResults := vector.New(0);
    on the other hand, we have to write
    ch := make(chan int);
    How is this New/make dichotomy minimal? Reminds me of Python’s early idiosyncrasies and makes me glad to be a Ruby enthusiast. And what’s with all those colons and semicolons? (Tree-less one-pass parser, I guess…) (-;

    Reply
  17. Robert Israel

    Your example goroutines all have this form:
    func some_func() chan foo {
    ch := make(chan foo);
    go func() {
    // yadda yadda yadda
    // put something in ch
    }();
    return ch;
    }
    If that is idiomatic Go, it would be nice to rewrite it like this:
    func some_func() chan foo {
    return go {
    // yadda yadda yadda
    // put something in _
    }
    }

    Reply
  18. Vincent Foley

    Some of the strong contenders in the concurrent programming domain are Erlang, Haskell and Clojure, and they all give a lot of credit of the success of their concurrent capabilities to immutable data structures. How does Go’s imperative nature fit in the whole concurrency scheme?

    Reply
  19. River

    A lot of this reminds me of Lua, which has less creation/instantiation cruft but no strong typing. I’m especially reminded of it given the coroutine features they both share, though Go adds syntax for it while Lua handles them with library calls. When it comes to being a low-level high-level language, they seem analogous, with Go more suited to systems programming and Lua more suited to use as a glue language. But overall, the more I look at Go, the more I think I’m seeing Lua, repackaged.
    http://www.lua.org/

    Reply
  20. Paul Jurczak

    My mental parser indicates that your 3rd code snippet should use “channel” instead of “c” inside anonymous function.
    Speaking of channels, they seem like a mechanism, which would make distributed implementation of Go relatively straightforward (similar to MPI?).

    Reply
  21. Mark C. Chu-Carroll

    @29:
    And well it should; what I’m trying to do is write something that’s sort-of in-between Sam and Acme. I want the command-execution behaviors of Acme, with the basic structure of Sam. I don’t want the window-management parts of Acme – I think that window management is best left to a window manager. With xmonad or awesome or wmii or ion, I can get better tiled window behavior that I’d be able to implement myself in any reasonable amount of time.

    Reply
  22. Nomen Nescio

    How is this New/make dichotomy minimal? Reminds me of Python’s early idiosyncrasies and makes me glad to be a Ruby enthusiast.

    well, Python did get better. but i’m sure you also remember how long it took.
    what upsets me is the same perception of hypocrisy on the designers’ part as Mark mentioned upset him; that notion of “we’ll use this when building the language, but you don’t get to touch it when using the language” is tremendously grating. and yes, the curly braces and semicolons do make it ugly. no sane programmer will ever fail to use whitespace to render their code readable; no sensible reason to fail to make use of that in the parser too.
    finally, designing a language for the specific goal of great speed of compilation might be worthwhile, and such a language might find itself a niche. but i’m pretty sure that if given a choice, i’d rather be writing my own programs in a domain where speed of execution (saving the users’ time) matters more than making the development process convenient for the programmer. if that means a much slower compiler to better optimize the binary code, i’m all for building that better, smarter, slower compiler.

    Reply
  23. Mark C. Chu-Carroll

    #32:
    Talking about how you’d rather have a slower compiler shows that you’re not working in the same kind of environment as the Go team. At Google, we spend a lot of time on compilation. We’ve had teams of people (including me during my first year at Google) spending years of work on finding ways to reduce build times.
    A large C++ system with a complex component dependency graph can easily take 20 minutes to compile. If you sit down and profile and analyze the time it’s spending, you find some very interesting and surprising things. For example, compiler authors often spend a lot of time worrying about how much time their optimizers take. But in a large C++ system, the time spent optimizing and generating code is actually not a significant factor! What *is* significant (amazingly) is parsing. But, anyone familiar with algorithms should say, parsing *should* be trivial! The catch is that you’re parsing the same thing, over and over and over again. Every time you compile a C++ source file, you need to parse all of its headers. Over, and over, and over again. Back in my C++ compiler days at IBM, I saw real applications that parsed the stdio header file 3000 times. Even if it takes 1/100th of a second, that’s 3 seconds of compile time on a single header file!
    To a great extent, that’s C++ stupidity. i don’t know of *any* other languages that do anything as utterly stupid as the C++ header include thing. But lots of languages do end up with properties that complicate compilation. Language designers really should be aware of what impact various language features will have on compilation times of real
    programs.

    Reply
  24. Nomen Nescio

    Talking about how you’d rather have a slower compiler shows that you’re not working in the same kind of environment as the Go team.

    indeed i’m not, and that’s what my one-liner about how Go might well find a niche was meant to indicate. maybe i should have stressed that more; i do realize that there may very well be applications for such a language (why else would somebody have gone to the trouble of expressly designing one?), it’s just that i would much prefer not to work in such domains. merely personal preference, though.
    i’ll definitely be following along in your reworking of those old Haskell posts, though. i’ve been wanting to learn a type-inferred language, and Haskell has good things said of it. Go seems like it might be a little too low-level for my own tastes, though at least it has memory management, i guess.

    Reply
  25. Dan

    Mark, have you checked out Microsoft’s Task Parallel Library? http://msdn.microsoft.com/en-us/library/dd460717(VS.100).aspx
    BlockingCollection seems to have pretty much the same semantics as a channel.
    TaskFactory.StartNew seems to have the same semantics as the ‘go’ keyword.
    Of course, there’s also a lot more features (Parallel.Do, Parallel.For, futures, concurrent collections, etc) which can be good or bad depending on how you look at it. 🙂

    Reply
  26. Terry Jan Reedy

    In a couple of minutes, I wrote and ran a Python version of your toy example, mostly from your text description rather than the code.
    def plurals():
    i = 2
    while True:
    yield i
    i += 1
    def primefilter(src, prime):
    for i in src:
    if i % prime:
    yield i
    src = plurals()
    while True:
    i = next(src)
    print(i)
    src = primefilter(src, i)
    It stopped at 7877 when it hit the default recursion limit of 1000, which could easily be increased to get out-of-memory error instead.
    I think Google is making a blunder if it moves to another old-fashioned language whose code is bloated with junky boilerplate that doubles the size. It would be much better, for instance, to tweak Python, which it has had great success with, to better run on multiple cores.

    Reply
  27. Bob Foster

    > So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.
    Um, no. There are buffered channels that don’t block unless the buffer size is exceeded.

    Reply
  28. ΤΖΩΤΖΙΟΥ

    You say: “Even better, Go has lexical scope” and then provide example code; however, the example code uses the name “c” for “channel”, like in the previous example where “c” was the parameter name; in the enclosing scope there is only “channel”.

    Reply
  29. Cedric

    Quoting the article
    “Goroutines all run in the same thread, so you can also do implicit communication via shared variables”
    and
    “goroutines are very cheap and lightweight, and they’re multiplexed onto OS threads”
    Mmmh… so, which is it?

    Reply
  30. Nomen Nescio

    if i understand threads correctly, Cedric, both can be true. threads within a single O/S process share address space, i believe, so can communicate via shared variables — [cg]oroutines within one thread certainly could, then.
    why you’d want to multiplex N goroutines onto M threads (for N > M) i don’t really know, but presumably there’s some reason. i’d think if you wanted to get maximal utility out of multiple processors (or cores) you’d want to use multiple processes, likely not just threads. then again, this is all far from my area of expertise.
    (geez, i feel old. i still remember when talk on the linux-kernel list was all, “processes are too heavy, they take too long to spawn, we need threads like windoze has got” and the eventual result was to both introduce pthreads and speed up fork() until it competed with MS’ thread spawning. now even threads are considered too slow? kids these days, mumble grumble, when i was learning to code we sometimes ran out of ones…)

    Reply
  31. mikero

    >The catch is that you’re parsing the same thing, over and over and over again
    Ah, haven’t we had pre-compiled header support now for 15 years?

    Reply
  32. ML

    Norman Rescio
    > why you’d want to multiplex N goroutines onto M threads (for
    > N > M) i don’t really know, but presumably there’s some reason
    Threads are quite expensive in the amount of memory they allocate, while goroutines (or light-weight-threads, or erlang processes, or fibers, or whatever is the name in context X) typically allocate very little memory initially. Like Erlang, it appears that a Go-program can run hundreds of thousands of goroutines on a single machine, which is not possible to do with standard threads. When it is that cheap to put computation in a separate goroutine, it enables a different style of programming.

    Reply
  33. James Sweet

    Somehow that seems INCREDIBLY limiting. What am I missing here? From that primitive, how does one build a thread which works on the first available from several different queues?

    MarkCC mentioned some kind of select statement that would let you get input from any of a number of different channels…
    I also have questions about how you would manage things like a limited-size readahead buffer, but Mark alluded to a lot more syntax not shown here.
    Fascinating shit. Too bad I’ll probably never have an opportunity to really do anything w ith it… :/

    Reply
  34. Justin Davis

    I agree, but I’m not so sure about what you said at the beginning. Where are you getting your information? I’m not disagreeing, but I’m just wondering how you came to that conclusion.
    Justin Davis
    Author does not represent the legal position of the Darpa Challenge 2009 and expresses opinion only.

    Reply
  35. Daniel Sobral

    Go’s Channels are Datachannels from Functional Reactive Programming. They are powerful and have some nice properties about them, but they are not particularly original.
    As a side note, that implementation is not a Sieve of Eratosthenes. To understand why, one just have to notice that the filter for 5 will be applied to 7, 11, 13, 17 and 19, while in the true sieve it will be applied to 10, 15 and 20. As the prime numbers get bigger, the difference gets exarcebated.

    Reply

Leave a Reply