Why model evolution as search? (updated)

In the comments on [my post mocking Granville Sewell’s dreadful article][sewell], one of the commenters asked me to write something about why evolution is frequently modeled as a search process: since there is no goal or objective in evolution, search seems like a strange approach. It’s a very good question, and so I’m going to do my best to give you a very good answer.
As I’ve said before, math is all about *abstraction*: taking something, stripping it to its bare-bones essentials, and seeing what it means: for example category theory, which I’ve been writing about for the last month, is about taking the basic idea of a function as a thing that maps things to other things, and seeing what that means when it’s stripped of everything but the most basic properties of that notion of mapping.
When we build mathematical models of real-world phenomena, we’re doing the same kind of thing: we’re looking at some particular aspect of the phenomena, and trying to see what it means when the complexities of reality are abstracted away to let us see what’s going on in a fundamental sense. When we build bridges, we do static-mechanical analysis of the bridge structures; the mast basic static mechanical analysis is remarkably silly in some ways, but it does manage to capture the basic properties of tension and force distribution through the structure. It lets us understand the most basic properties of the structure without having to worry about wind, about gravitational bowing of metal rods, vibration, etc. (Obviously, when a bridge gets built, all of those things need to get factored in; but you start with the basic analysis.)
So when a mathematician looks at modeling evolution, the way that we approach it is to ask: What are the basic fundamental things that define the process of evolution?
The answer, roughly, is two (or three, depending how you count them 🙂 ) things:
1. Reproduction with mutation;
2. Selection
Evolution is a continual process in which those steps are repeated: you reproduce a pool of individual, either mutating while you copy, or copying then mutating. Then you select some subset of them as “winners” and keep them to copy for the next generation, and you discard the rest.
What’s the selection criteria? From the starting point of a mathematical model, *it doesn’t matter*. What matters is that it’s a cyclic process of reproduction with mutation followed by selection.
So we have a basic model. Now we look at it and ask: does this resemble any problem that I know how to solve? Let’s try drawing a picture of that: we’ll draw labelled dots for individuals; with arrows from a parent to a child. The children that were selected and reproduced, we’ll paint green; the children that were not selected we’ll mark red; and the current generation, which we haven’t selected yet, we’ll leave white.
graph.jpg
So, does this look familiar to a mathematician?
Yes: it’s a mathematical structure called a graph. We know a lot about the mathematics of graphs. And evolution looks a lot like a process that’s building a graph. Each node in the graph is an individual; each edge in the graph is a reproduce with mutation step. The leaf nodes of the graph are either successful individuals (in which case the graph is going to continue to grow from that leaf), or it’s an unsuccessful one, in which case that branch is dead.
When you describe it that way – it sounds like you’ve described a classic graph search problem. To complete the model, you need to define the mutation and selection processes; the criteria by which successful children are selected and future generations are created are, of course, a crucial piece of the model. And to be complete, I’ll note that the mutation process is a *random* process, and so the mutation paths generated by running a simulation of this model will be different in each run; and further, selection is undirected: the selection process does *not* have a specific goal. (Note: this paragraph was edited from the original to clarify some points and to expand on the properties of mutation and selection.)
There are other possibilities for models of evolution; but the most obvious mathematical model for evolution is search: either a graph search or a terrain search, depending on exactly how you want to model the process of mutation.
[sewell]: http://scienceblogs.com/goodmath/2006/07/debunking_a_mathematicians_vie.php

0 thoughts on “Why model evolution as search? (updated)

  1. bmurray

    The “search” analogy is slightly muddled by the fact that the target is actually moving all the time — selection pressure is continuously changing, which is one of the reasons that evolution works so well in the real world: there are better search algorithms for a fixed target, but precious few good ones at hitting close to a target that changes every iteration.

    Reply
  2. RBH

    The distinction that I think is important, and the distinction that leads me to be wary of modeling biological evolution as a general search process, is between the structure of a solution and the function of a solution.
    Mark wrote

    What’s the selection criteria? From the starting point of a mathematical model, it doesn’t matter. What matters is that it’s a cyclic process of reproduction with mutation followed by selection.

    But it does matter if one is modeling biological evolution. If one uses a selection criterion that focuses on a particular terminal structure there is only one such solution. If one makes many runs with different randomizations the structural terminal node will be structurally identical across runs. On the other hand, if one uses a selection criterion that is based on function, there will almost inevitably be many different structures that satisfy the criterion. If one uses a constant functional selection criterion and a number of different randomizations during runs, every end point will almost certainly be structurally different from all others. See, for example, the 23 stucturally different “solutions” evolved with different random mutation sequences in this set evolved in Avida against a constant functional selection criterion. The evolutionary trajectories of each run and the specific structures of the terminal nodes differ among them, but the terminal nodes of the runs are functionally equivalent, at least with respect to the highest peak of the fitness landscape. (They differ in their functionality with respect to lower peaks.) Thus one will have a different graph for every run of the program. Mark’s representation abstracts away a critical distinction, and generalizations drawn from an abstract representation that doesn’t make that distinction in the basis for selection will almost certainly be misleading.
    This is the source of a good deal of confusion and some flat misrepresentations in the various probability calculations made by creationists/IDists and not a few evolutionists. They (mis)represent the process to be structure-based search, then look at some particular terminal node (structure) and calculate the improbability of that particular structure having evolved. Because it is based on functional selection, evolution doesn’t give a damn about the particular structure. Hence without some careful qualifications, modeling evolution as search can be seriously misleading. See all of the creationist/IDist misrepresentations of Dawkins’ WEASEL illustration for examples of those misrepresentations.
    RBH

    Reply
  3. Mark C. Chu-Carroll

    RBH:
    For the purposes of explaining *why* search is used as a mathematical model of evolution, the selection function doesn’t matter; what matters is the the fact that the process of reproduce and select can produce a graph-like structure. That doesn’t mean that there is a single fixed graph generated by mutate-and-select; in fact, you would *not* expect to get the same graph from two different runs: the *mutate* is random! But mathematically, the structure of an evolutionary pattern *is* quite natural to model as a graph.
    Of course in reality, the selection function matters. I thought that was obvious enough that it didn’t even need repeating here. As I’ve said in numerous posts criticizing bozo creationists, if you have to describe evolution using a fitness function, it’s got to be an adaptive, non-deterministic function. Evolution doesn’t follow one path through a pregenerated graph; it follows *many* paths through a continually changing/growing graph.

    Reply
  4. Stogoe

    I would assume it’s because 11’s progeny are dead ends, and contribute nothing to the current generation. 11 might as well have not had ‘offspring’.

    Reply
  5. Zombie

    What would you think of modelling evolution as something like a diffusion process, through a medium of variable permeability, rather than a search? The metaphor of a search suggests a single searcher, when evolution could involve a population that can divide in different directions, many times.
    If you imagine one axis as what we normally think of as “complexity” or “advancedness” it seems less surprising (and less teleological) that a fraction of the population is diffusing in that direction while others are diffusing along other paths that don’t necessarily make them more “advanced”.

    Reply
  6. Mark C. Chu-Carroll

    Zombie:
    You probably could model evolution as diffusion… I’m not quite sure of how the details would map, which would definitely be non-trivial. The problem with diffusion is that it doesn’t have the strong discrete nature of evolution; in evolution, the underlying genetic code is changed is specific discrete steps; and “motion” through the graph is in generations of distinct individuals. That discrete nature of the biological systems is part of what makes it natural to use graphs. Diffusion is a continuous process, with non-distinct paths and steps.
    WRT the discussion in the post about why math folks tend to end up using search based models, I think that they don’t think of diffusion partly because it’s less familiar (most math folks haven’t really studied it, but most math people have studied some graph theory), and partly because of the discrete/continuous distinction I mentioned above.

    Reply
  7. PaulC

    I can see why it’s appealing to look at evolution as a search or more generally an optimization, since we can immediately perceive how a particular characteristic is “useful” and then back-fit an objective function to measure this utility. There is also nothing particularly wrong with this approach; we can in fact use evolutionary processes (and not just algorithms inspired by evolution) to implement some searches by controlling the environment in such a way that the selective pressure matches our objective function. Evolution is at least as powerful as some very useful search algorithms, and incorporates both random probing and hill climbing. I’m not convinced that it’s the best way of thinking of evolution, though, since it presupposes that we can set the fitness criteria for purposes of discussion.
    I think a more general sort of abstraction, though, would be to view evolution in terms of attractors in dynamic systems. E.g. fish and marine mammals both swim even though the marine mammals had ancestors that lived on land. There is no absolute trajectory that says moving to land is “more advanced”, but adaptations to land and sea are both basins of attraction. Given enough time some descendants of modern fish could eventually live on land, and some descendants of land animals could eventually swim in the water. On the other hand, not all traits are plausible. I doubt there will ever be an animal that can rearrange itself like a Rubik’s cube, for instance. Even if the mechanics could be made to work and a genetic encoding exists, I cannot imagine the selective pressure that would produce such a thing (caveat: the limits of one’s imagination should not be confused with the limits of reality; a point that creationists usually miss).
    As a computer scientist, I too would think in terms of a graph, but I’m not sure that the obvious operation to perform on that graph is a search. It looks to me more like a multi-agent walk in which moves are based on local criteria and in which agents can sometimes reproduce. We can begin this walk from nearly arbitrary starting conditions (put all the agents on some node or other, or maybe scatter them randomly) and we can give them quite a bit of local randomness. But despite all that, the result of running the walk for sufficiently long looks nothing like uniform randomness. We would see that probability of a randomly chosen agent being near one particular node varied significantly based on the topology of the graph and whatever other local factors affected the moves of the agents. Over time it is more sensitive to these factors than to the initial conditions.
    Now, you could look at the most probable nodes, backfit an optimization function to them and say that this process is actually a search for those nodes. There’s nothing incorrect about that, but it just seems like an unnecessary step in the interpretation of the system.

    Reply
  8. Torbjörn Larsson

    PaulC,
    I don’t find myself in agreement with your ideas.
    When you say optimization instead of local search, it seems to imply finding an optima. But natural selection makes life make do – success is to be fit enough to procreate, not to be the fittest. And I don’t think anyone believes fitness criteria are simple, independent or fixed when picturing a hill climbing model.
    When you say attractors, it seems to imply global goals. I don’t think anyone has ever found that. Exobiologists and SETI would like to know how probable intelligence is, but it seems no biologist can tell them. If evolution randomly explores a fitness space, perhaps in time every fit solution is explored, so the answer is 1. But is it? The fitness space depends on coevolution, for genes within the organism and with other species. For example, a predator needs a prey or a parasite needs a host. Perhaps such nodes are trivial properties of evolution, perhaps not. I think we agree on that it would be a less than clear model to say that evolution searches for them.

    Reply
  9. John Marley

    “See all of the creationist/IDist misrepresentations of Dawkins’ WEASEL illustration for examples of those misrepresentations.”
    Dawkins’ WEASEL example wasn’t even meant to have anything directly to do with evolution. It was meant to show the power of cumulative selection over random generation, nothing more. The fact that evolution deniers misuse it just shows their dishonesty.

    Reply
  10. PaulC

    Torbjörn: When I say “optimization” I don’t mean a process that necessarily finds the optimal value of a fitness function. Global optima are usually very hard to find unless your algorithm is based on some specific property of the search space or the search space has some property that makes it amenable to a greedy algorithm. Even the designs that humans produce are not likely to be global optima. If I cut a bunch of parts from sheet metal, it’s quite possible I could have fit another part on that sheet and saved some money. But I still tried to optimize the placement. I would call whatever method I used an optimization, just not an exact one. Everyone makes do with good enough solutions, not just evolution but humans as well. That’s one thing I’ve never figured out about ID arguments. I’ve never understood why they think mere human intelligence is more powerful than evolution. Both find approximate solutions and apply them in the absence of having better ones at hand.
    I also don’t mean “attractor” in a very rigorous sense. It’s more of an analogy. What I mean is that in some dynamic system, there is an initial set of probabilities of being in any state. It might not be uniform (actually it probably shouldn’t be, because that would be equivalent to a high entropy starting point as per the other discussion). But as the system functions, the probabilities change. The initial distribution is generally not a good indication of the final one.
    To use a precise example, though a poor model of evolution, consider a simple random walk in an undirected graph. Start at some node and choose an adjacent node randomly. Do this for certain number of steps. You might wonder what is that probability of being at a particular node at some time in the future. That’s not too hard to calculate for any particular time t by taking a matrix to the t power. You can also ask, what is the equilibrium probability; i.e. if you do the walk long enough, what is the probability for any node of being there. Actually, you need to be careful that the graph is not bipartite, tripartite, or you get an oscillating distribution, but graphs without these properties will settle at an equilibrium.
    Anyway, a well known (and in my view a little disappointing) property of such random walks is that the equilibrium probability of being at a particular node is just the number of its neighbors over the sum of the number of neighors for each node taken over all of them. So if the walk goes on long enough, and if there are some nodes with an unusually large number of neighbors, these are the most likely locations of the random walker (though it may be more likely overall to be in one of the others if there are a lot of them).
    Someone could conceivably come along and characterize this as a “search” for the nodes with a lot of neighbors. It’s not a very efficient one, but in some cases, it would work. If there were a few nodes with disproportionately many neighbors and you sampled the walk periodically, your sample would be enriched for high degree. So I would say that this is process that you could think of as a search, though that’s kind of a weird and not very illuminating way to see it.

    Reply
  11. PaulC

    Correction: I think only bipartite graphs give you an oscillating final distribution of probabilities in a random walk. Ignore “tripartite” I was not thinking straight.
    Another comment I want to add is that while nobody knows the probability of intelligent life, my working assumption is that it is high enough that you’d get if you could restart our universe with slightly different initial conditions, and it would also emerge from a wide class of other dynamic systems at a sufficiently large scale. I cannot prove that, and my only semi-rigorous argument is based on assumptions that the observable universe is all there is. If there were a plethora of other systems, it could conceivably be that intelligence is rare but seems more probable to us because by definition an intelligent observer has to inhabit a universe where it occurred.
    I think, though, that self-replication is a provably ubiquitous phenomena in dynamic systems, because it is so common to find it in cellular automata. I would be astonished if there were not many planets on which some form of replicating molecules did not appear. I would likewise be surprised if they did not evolve into ecosystems with complex organisms. I also think intelligence, which is useful to fitness and reproduction, should occur elsewhere, but may not be an inevitable result of a robust ecosystem. I just have no way of knowing; but I admit I have preferred hypotheses.

    Reply
  12. RBH

    Mark wrote

    For the purposes of explaining *why* search is used as a mathematical model of evolution, the selection function doesn’t matter; what matters is the the fact that the process of reproduce and select can produce a graph-like structure. That doesn’t mean that there is a single fixed graph generated by mutate-and-select; in fact, you would *not* expect to get the same graph from two different runs: the *mutate* is random! But mathematically, the structure of an evolutionary pattern *is* quite natural to model as a graph.

    I understand that. Being embedded in the resistance to the evolution deniers in Ohio I tend to be over-sensitive to conceptions that are habitually misrepresented by them. I’m also aware that one can’t always condition one’s language to avoid misrepresentations by evolution deniers.
    At the morphological level I have long been attracted (!) to PaulC’s basin of attraction metaphor. Physical constraints tend to ‘push’ at least some body forms in predictable directions. Underwater predators tend to be of two general sorts, ambush predators or pursuit predators, and the exigencies of selection differ fairly predictably for the two: camouflage vs. muscular streamlined bodies (there’s more to those stories, of course).
    One other remark. The search metaphor often carries a connotation (though not with necessity) of a system starting in some non-target state and, via some process of node to node transitions, arriving at some target state. But in biological evolution a successfully reproducing population starts in a ‘target’ state. Any ‘search’ that occurs is movement from one selectively adequate state to another. That’s the root of Behe’s (and Dembski’s) reliance on irreducible complexity as a barrier to evolution. They claim (in essence) that you can’t get ‘there’ from anywhere via evolutionary transition mechanisms — some nodes on the graph are isolates in selective space. A counter to that is the recent work by Gavrilets (among others). See E. Tellgren’s brief summary on TalkReason and the references there. I’d enjoy reading Mark’s take on that.
    RBH

    Reply
  13. Torbjörn Larsson

    PaulC:
    And now I find myself in complete agreement with your ideas. 🙂
    The result on equilibrioum properties of randowm walks in undirected graphs makes your point very clear. BTW, I think your discussion on initial states is simplified by the fact that a species starts out in some confined part in fitness space as they are more or less welldefined population.
    I feel as you do, reproduction and evolution seems hard to avoid (see for example my comment on abiogenesis in http://scientopia.org/blogs/goodmath/2006/07/debunking-a-mathematicians-view-of-evolution ), so does intelligence. Your picture makes it at least as feasible than a mere local search. If only there was a way to prove that…

    Reply
  14. Torbjörn Larsson

    RBH:
    Yes, I thought about physical constraints as helping defining or pushing nodes in evolution too. But then again, aren’t that supplanting a general function selection criteria with a specific structure selection critera?

    Reply
  15. Kris Vanheede

    A lot of confusion arises from mixing the whole picture of a dynamic, selfreferencing evolutionary system, and the mathematical picture of a beam search that uses evolutionary analog to select its next beam generations.
    Computational evolutionary beam search (CEBS) is the application of a small part of how natural evolution occurs to a GIVEN, UNCHANGING fitness landscape. With given, it is not implied that we know the topology of the fitness landscape a priori. If we did, we wouldn’t need to do a search. If we represent all possible genotypes in one plane (XY), and show fitness on the Z axis, we might have a fitness function f() (let it be how fast a phenotypal expression of genotype (x,y) can swim) that generates a certain fitness landscape. The point of the beam search is to randomly select initial genotypes, and use reproduction, mutation and selection as a good heuristic for converging on descendent genotypes whose phenotype produces an optimum in the given fitness landscape. This search only works if the optima stay the same, otherwise it devolves into a random walk, since most of the time you will not hit on the optima after one step. Indeed, we did not specify what the genotype or phenotype should be to be an optimum, so different runs might give different solutions, all we know is that the generated solutions for the GIVEN fitness value are optimized. So CEBS does imply directionality, and the word search nails it.
    Natural evolution can not be a search, since the fitness landscape changes with each new individual. Hence, there is no direction, only movement from one state of affairs to another, where because of the reproduction mechanism local fitness champions are amplified FOR ONE STEP. The next step might see them as less fit because the closed system they live in has different pressures.
    All the creationist arguments are based on the confusion of a directed search algorithm (which fits them because it implies a designer, just as everyone that specifies a fitness function is a designer), with an undirected dynamic system.
    So I would argue that only mathematical tools for analyzing dynamic systems apply to natural evolution, not an abstraction of a search algorithm in a graph, even if the nature of reproduction implies one.
    These are the kinds of confusion that lead people to say that we are the high point of evolution, that all the intermediate steps had to be the way they are to get at us, which of course seems like a pretty small chance. In this sense, the weak anthropomorphic argument is plain silly in saying that we are a logical consequence of the universe being the way it is, implying directionality and providing fodder for the opposing argument it is trying to nullify. We should be asking people why they are asking anthropomorphic questions, inductively stretching the notions of cause and effect beyond their domain of application.

    Reply
  16. PaulC

    I had some more thoughts about the erroneous notion of evolution amounting to progress from “less advanced” to “more advanced.” I believe that any perceived “arrow of progress” is purely an artifact of the initial conditions.
    To expand on my example above, we might perceive it is “progress” when the first land-dwelling species emerged from sea-dwelling ancestors. But this is kind of silly in light of the fact that we have seen transitions in both directions. There are many sea creatures whose ancestors lived on land (true, they all breathe air as far as I know; it’s conceivable that some could readapt to start breathing water given enough time). The only real difference between these modes of existence is that there was some point in time when life only existed in water and at that time the transitions could only occur in one direction. When writing our histories, we put special emphasis on “firsts”, so these transitions are emphasized to the exclusion of others.
    Thought experiment: imagine you had some kind of steady-state ecology that had supported life for much longer than present day earth. In that ecology, there would be a variety of survival niches, and you would see transitions from one niche to another. It would no longer look like progress because an individual of some species might have ancestors in a whole series of niches and the evolution between these might well look like a random walk. In short, it only looks like a search if the “goals” have not yet been explored, and this is purely an artifact of the initial conditions.
    Let’s look at a toy model, the “lollipop graph” example used in random walks:
    http://jimbomania.com/lollipop-traversal.html
    It consists of a linear chain of vertices (like a lollipop stick) with a complete graph (the lollipop) at the end in which every node is adjacent to every other node. Suppose you start at the far end of the stick from the lollipop and do a random walk. If you watched this from that starting point, it might look like a “search” for the lollipop. The walker would advance, sometimes forward, sometimes backwards. When it actually made it to the lollipop, the walk would begin to look very different for a while. The probability of moving to another lollipop node would be much higher than the probability of moving back onto the stick, and even if you did, you’d have a much higher probability of going back to the lollipop later before you ever went back to the end of the stick.
    You might erroneously conclude that this is some kind of progress to the lollilop. But actually, you just haven’t watched long enough. The walker will eventually go back to the end of the stick. In fact, the probability of finding it there at any point after equilibrium is just 1/(2*#edges). That’s not even such a low probability, though it is less than the probability of finding it in a lollipop node.
    Now the above example is far too simple to capture many interesting properties of evolution. However, it illustrates how one can ascribe a false directionality to a process, and how that perception comes from initial conditions. If the walker was initially placed on a randomly chosen node with equilibrium probability, there would be no perception of directionality.
    We humans fancy ourselves “advanced”, as a sentient, tool-using, very new species, and apparently the first such species to exist on earth. It also strikes me as likely that the impact of technological life is so severe as to rule out the possibility of the “steady state” ecosystem postulated above occurring on earth. But if it were possible, we would see cases of intelligent life adapting to other niches that require less intelligence, perhaps through greater disease resistance, greater reproductivity, ability to digest new food sources, etc. If it could go on long enough, intelligence would not look like progress from an evolutionary perspective. My hunch is that any robust ecosystem will, given enough time and population, develop intelligence since it seems like it would add to fitness at a fairly small cost. But it is a little misleading to view it as a search.

    Reply
  17. Gaurav

    Thanks for the post.
    I think I the point where I was getting stuck was my assumption of a “static” graph (as opposed to a growing one) for the search model. Your and others’ post were very helpful.
    Thanks everyone.

    Reply
  18. Blake Stacey

    Diffusion equations are used in population genetics to study the distribution of alleles in a gene pool. Boiling it down to a synopsis, the idea goes like this:
    Suppose we have a gene of interest which has two alleles, a and A. We’d like to know what percentage of individuals carry the allele A, so we call the proportion of genes which are A (as opposed to a) an unknown x. The proportion x will be in the range from 0 to 1; we then ask with what probability we see the population of N individuals exhibiting an allele proportion x. Call this probability p(x,t), because it can and will vary over time.
    (A technicality: N individuals can have 2N or more total genes to count — polyploidy!)
    The way p(x,t) varies with time can be modeled using a Fokker-Planck diffusion equation. (Incidentally, Fokker-Planck is a close cousin to the Schroedinger equation: you just have to make time an imaginary number to turn one into the other! The relation between the Fokker-Planck equation and the Langevin equation, the other chief way of looking at diffusion, is completely analogous to the relationship between the Schroedinger and Heisenberg “pictures” of quantum mechanics, but that’s getting far afield.) The effects of random mating and mutation go into an abstract diffusion coefficient, which tells how quickly p(x,t) spreads out. Selection pressure — i.e., one allele leading to a more favourable phenotype than the other — is modeled as a “force” parameter.
    This approach is useful for telling when genetic drift will be significant, based on population size, mutation rates and such parameters.

    Reply
  19. Torbjörn Larsson

    I need to revoke my last comment to RBH. There is a confusion of mine there, and RBH is perferctly correct.
    “My hunch is that any robust ecosystem will, given enough time and population, develop intelligence since it seems like it would add to fitness at a fairly small cost.”
    An ‘exploring the landscape’ picture, perhaps with the addition that intelligence confers benefits, especially if codeveloped, so it is less likely that it is dropped later.
    Though human level intelligence cost us a lot IMO, much of the energy consumption and mass is the brain, and there were quite some adaptations to make it. There was posts at the Loom blog on tenous connections between nerve cells sugars to our aggressive T-cells to sicknesses such as MS and perhaps a further connection to HIV vs less agressive simian SIV. Makes one think. Or perhaps wishing not to. 😉

    Reply
  20. RPM

    On shorter time scales, it’s often easier to model evolution in reverse, starting with the tips of a genealogy and working your way back to the most recent common ancestor (MRCA). This saves time by excluding the dead ends which don’t reveal anything about the extant variation in a population. This reverse model, known as the coalescent, is the most popular model in molecular population genetics.
    From what I’ve seen, the greatest influence of graph theory in molecular biology has been with gene interaction networks.

    Reply

Leave a Reply to Torbjörn Larsson Cancel reply