{"id":1401,"date":"2011-04-16T15:41:14","date_gmt":"2011-04-16T19:41:14","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/?p=1401"},"modified":"2011-04-16T15:41:14","modified_gmt":"2011-04-16T19:41:14","slug":"regular-expressions-and-derivatives","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2011\/04\/16\/regular-expressions-and-derivatives\/","title":{"rendered":"Regular Expressions and Derivatives"},"content":{"rendered":"<p> When you&#8217;re working with regular languages specified in regular expression form, there&#8217;s a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It&#8217;s called the <em>Brzozozwksi derivative<\/em> of a regular expression &#8211; or just simply the derivative of a regexp.<\/p>\n<p> The basic idea of the derivative is that given a regular expression, <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, you can derive a new regular expression called the derivative with respect to symbol <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/>, <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r)' style='vertical-align:1%' class='tex' alt='D_c(r)' \/>. <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r)' style='vertical-align:1%' class='tex' alt='D_c(r)' \/> is a regular expression describing the string matched by <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> <em>after<\/em> it&#8217;s matched an <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>.<\/p>\n<p><!--more--><\/p>\n<p> To define the derivative, we first need a helper, which we&#8217;ll call <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta' style='vertical-align:1%' class='tex' alt='delta' \/>. What <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta' style='vertical-align:1%' class='tex' alt='delta' \/> does is tell us if a given regular expression canmatch the empty string. We&#8217;ll use it a few ways &#8211; both as a part of the derivative, and as part of the process of turning a regular expression into a finite state machine. For convenience, we&#8217;ll define it so that <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r)' style='vertical-align:1%' class='tex' alt='delta(r)' \/> returns <img src='http:\/\/l.wordpress.com\/latex.php?latex=epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='epsilon' style='vertical-align:1%' class='tex' alt='epsilon' \/> (a pattern matching the empty string) if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> can match the empty string, or <img src='http:\/\/l.wordpress.com\/latex.php?latex=emptyset&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='emptyset' style='vertical-align:1%' class='tex' alt='emptyset' \/> (the null pattern, a regular expression which never matches anything) if it can&#8217;t.<\/p>\n<p> Given a regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, we define <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r)' style='vertical-align:1%' class='tex' alt='delta(r)' \/> as follows:<\/p>\n<ul>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is <img src='http:\/\/l.wordpress.com\/latex.php?latex=emptyset&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='emptyset' style='vertical-align:1%' class='tex' alt='emptyset' \/>, then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20emptyset&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = emptyset' style='vertical-align:1%' class='tex' alt='delta(r) = emptyset' \/>; since the void pattern can&#8217;t ever match anything, it doesn&#8217;t match the empty string.<\/li>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is <img src='http:\/\/l.wordpress.com\/latex.php?latex=epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='epsilon' style='vertical-align:1%' class='tex' alt='epsilon' \/>, then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = epsilon' style='vertical-align:1%' class='tex' alt='delta(r) = epsilon' \/>; obviously, \tthe pattern that (by definition) matches only the empty string does match the empty string.<\/li>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a single-character pattern, then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20emptyset&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = emptyset' style='vertical-align:1%' class='tex' alt='delta(r) = emptyset' \/>. A pattern which matches a specific single character\tcan&#8217;t match anything but that single character &#8211; so it can&#8217;t match the empty string.<\/li>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a sequence <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1r_2' style='vertical-align:1%' class='tex' alt='r_1r_2' \/> then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20delta%28r_1%29delta%28r_2%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = delta(r_1)delta(r_2)' style='vertical-align:1%' class='tex' alt='delta(r) = delta(r_1)delta(r_2)' \/>. A sequence matches the empty string if all of the elements of the sequence match the empty string.<\/li>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a choice <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1%20%7C%20r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1 | r_2' style='vertical-align:1%' class='tex' alt='r_1 | r_2' \/>, then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%20%3D%20delta%28r_1%29%20%7C%20delta%28r_2%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r)  = delta(r_1) | delta(r_2)' style='vertical-align:1%' class='tex' alt='delta(r)  = delta(r_1) | delta(r_2)' \/>. A sequence matches empty if any of its elements match empty.<\/li>\n<li> if <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a starred regular expression, <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1%5E%2A&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1^*' style='vertical-align:1%' class='tex' alt='r_1^*' \/>,  then <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = epsilon' style='vertical-align:1%' class='tex' alt='delta(r) = epsilon' \/>. Starred regular expressions are sequences of zero or more repetitions of some other pattern. Zero repetitions is the same as the empty string &#8211; so all starred patterns match empty.<\/li>\n<\/ul>\n<p> To make that useful, we also need to define how empty and void patterns combine with other patterns:<\/p>\n<ul>\n<li> For any regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, the regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=epsilon%20%20r%20%3D%20r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='epsilon  r = r' style='vertical-align:1%' class='tex' alt='epsilon  r = r' \/>; in a sequence, concatenating an empty pattern with any regular expression is equivalent to the regular expression <em>without<\/em> the pattern.<\/li>\n<li> For any regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, <img src='http:\/\/l.wordpress.com\/latex.php?latex=emptyset%20r%20%3D%20epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='emptyset r = epsilon' style='vertical-align:1%' class='tex' alt='emptyset r = epsilon' \/>. If\tyou&#8217;ve got a sequence of void with a pattern, it&#8217;s equivalent to void.<\/li>\n<li> For any regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, <img src='http:\/\/l.wordpress.com\/latex.php?latex=emptyset%20%7C%20r%20%3D%20r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='emptyset | r = r' style='vertical-align:1%' class='tex' alt='emptyset | r = r' \/>. A choice between void and <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is equivalent to just <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>.<\/li>\n<li> <img src='http:\/\/l.wordpress.com\/latex.php?latex=emptyset%5E%2A%20%3D%20emptyset&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='emptyset^* = emptyset' style='vertical-align:1%' class='tex' alt='emptyset^* = emptyset' \/>.<\/li>\n<li> <img src='http:\/\/l.wordpress.com\/latex.php?latex=epsilon%5E%2A%20%3D%20epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='epsilon^* = epsilon' style='vertical-align:1%' class='tex' alt='epsilon^* = epsilon' \/>.<\/li>\n<\/ul>\n<p> Now, it&#8217;s really easy to define the derivative:<\/p>\n<ul>\n<li> If r is the void pattern, then any derivative of it is void.<\/li>\n<li> If r is the empty pattern, then any derivative of it is void.<\/li>\n<li> If r is a character pattern matching character <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/>, then <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r%29%3D%3Depsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r)==epsilon' style='vertical-align:1%' class='tex' alt='D_c(r)==epsilon' \/>, and the derivative of <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> with respect to any other character is void.<\/li>\n<li> If <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a choice pattern between <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1' style='vertical-align:1%' class='tex' alt='r_1' \/> and <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_2' style='vertical-align:1%' class='tex' alt='r_2' \/>, then for all characters c, <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r%29%20%3D%20D_c%28r_1%29%20%7C%20D_c%28r_2%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r) = D_c(r_1) | D_c(r_2)' style='vertical-align:1%' class='tex' alt='D_c(r) = D_c(r_1) | D_c(r_2)' \/>.<\/li>\n<li> If <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> is a sequence pattern consisting of <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1' style='vertical-align:1%' class='tex' alt='r_1' \/> followed by <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_2' style='vertical-align:1%' class='tex' alt='r_2' \/>, then for all characters <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/>, <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r%29%20%3D%20D_c%28r_1%29r_2%20%7C%20delta%28r_1%29D_c%28r_2%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r) = D_c(r_1)r_2 | delta(r_1)D_c(r_2)' style='vertical-align:1%' class='tex' alt='D_c(r) = D_c(r_1)r_2 | delta(r_1)D_c(r_2)' \/>. This one might need a bit of explanation. What that means is that for two patterns put together sequentially, you&#8217;ve got a choice. You could match the first pattern in the sequence &#8211; producing <img src='http:\/\/l.wordpress.com\/latex.php?latex=D_c%28r_1%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='D_c(r_1)' style='vertical-align:1%' class='tex' alt='D_c(r_1)' \/> followed by <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_2' style='vertical-align:1%' class='tex' alt='r_2' \/>. Or, <em>if<\/em> <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_1&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_1' style='vertical-align:1%' class='tex' alt='r_1' \/> could match the empty pattern, then you can drop it, and match <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_2&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_2' style='vertical-align:1%' class='tex' alt='r_2' \/>. With the rules for combining empty and void patterns with\tother patterns, the statement above using <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta' style='vertical-align:1%' class='tex' alt='delta' \/> is the same thing as this explanation.<\/li>\n<\/ul>\n<p> The beauty of this is that it <em>is<\/em> really simple. A lot of the earlier mechanisms for decomposing regular expressions were rather complicated. This simple construct makes it very easy. For example, to convert a regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/> to a finite state machine, you do the following:<\/p>\n<ol>\n<li> Create an initial state, labeled with the complete regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>.<\/li>\n<li> While there are states <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i' style='vertical-align:1%' class='tex' alt='r_i' \/> in the machine which haven&#8217;t been processed yet:\n<ol>\n<li> For each character, <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/> in the alphabet, compute\t\t\t\tthe derivative r_i&#8217;.<\/li>\n<li> If there is a state <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i%27&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i'' style='vertical-align:1%' class='tex' alt='r_i'' \/> already in the machine, then add a transition from <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i' style='vertical-align:1%' class='tex' alt='r_i' \/> to <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i%27&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i'' style='vertical-align:1%' class='tex' alt='r_i'' \/> labeled with symbol <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/>.<\/li>\n<li> If there is no state <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i%27&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i'' style='vertical-align:1%' class='tex' alt='r_i'' \/>, then add it, and add a transition from <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i' style='vertical-align:1%' class='tex' alt='r_i' \/> to <img src='http:\/\/l.wordpress.com\/latex.php?latex=r_i%27&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r_i'' style='vertical-align:1%' class='tex' alt='r_i'' \/> labeled with the character <img src='http:\/\/l.wordpress.com\/latex.php?latex=c&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='c' style='vertical-align:1%' class='tex' alt='c' \/>.<\/li>\n<\/ol>\n<\/li>\n<li> For each state in the machine labeled with a regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, it is a final state if and only if <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta%28r%29%20%3D%20epsilon&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta(r) = epsilon' style='vertical-align:1%' class='tex' alt='delta(r) = epsilon' \/>.<\/li>\n<\/ol>\n<p> For your amusement, I threw together a really quick implementation of the regular expression derivative in Haskell:<\/p>\n<pre>\n\tdata Regexp = CharRE Char\n\t            | ChoiceRE [ Regexp ]\n\t            | SeqRE [ Regexp ]\n\t\t\t\t| StarRE Regexp\n\t            | VoidRE\n\t            | EmptyRE deriving (Eq, Show)\n\n\tdelta :: Regexp -> Bool\n\tdelta (CharRE c) = False\n\tdelta (ChoiceRE (re:res)) =\n\t   if (delta re)\n\t     then True\n\t     else (delta (ChoiceRE res))\n\tdelta (ChoiceRE []) = False\n\tdelta (SeqRE []) = True\n\tdelta (SeqRE (re:res)) =\n\t   (delta re) && (delta (SeqRE res))\n\tdelta VoidRE = False\n\tdelta EmptyRE = True\n\tdelta (StarRE r) = True\n\n\tderivative :: Regexp -> Char -> Regexp\n\tderivative VoidRE c = VoidRE\n\tderivative EmptyRE c = VoidRE\n\tderivative (CharRE c) d =\n\t  if c == d\n\t    then EmptyRE\n\t    else VoidRE\n\n\tderivative (SeqRE (re:res)) c =\n\t   let re' = (derivative re c)\n\t   in case re' of\n\t        VoidRE -> VoidRE\n\t        EmptyRE -> (SeqRE res)\n\t        _ -> (SeqRE (re':res))\n\tderivative (SeqRE []) c = VoidRE\n\n\tderivative (ChoiceRE []) c = VoidRE\n\tderivative (ChoiceRE res) c =\n\t   let derivs = filter ( x -> x \/= VoidRE) (map ( r -> derivative r c) res)\n\t   in case derivs of\n\t     [] -> VoidRE\n\t     [re] -> re\n\t     (r:res) -> (ChoiceRE (r:res))\n\n\tderivative (StarRE r) c =\n\t   let r' = derivative r c\n\t   in case r' of\n\t     EmptyRE -> (StarRE r)\n\t     VoidRE -> VoidRE\n\t     _ -> SeqRE [r', (StarRE r)]\n\n\n<\/pre>\n<p> This can easily be used to implement a regular expression matcher. In fact, it can be used to build an RE matcher that&#8217;s nearly (not quite, but nearly) as efficient as a traditional table-based FSM implementation &#8211; only without the extra step of generating the table, and building code that can interpret it. Basically, for each input symbol, you just take the derivative of the expression. If it&#8217;s not the void expression, then go on to the next character, using the derivative to process the rest of the string. When you get to the end of your input, if <img src='http:\/\/l.wordpress.com\/latex.php?latex=delta&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='delta' style='vertical-align:1%' class='tex' alt='delta' \/> of the final RE is empty, then you accept the string. <\/p>\n<p> If you do this intelligently &#8211; i.e., you do something like memoize the derivative  function, so that you&#8217;re not constantly recomputing derivatives, this ends up being a reasonably efficient way of processing regular expressions. (Memoization is a technique where you save the results of every invocation of a function, so that if you call it repeatedly with the same input, it doesn&#8217;t redo the computation, but just returns the same result as the last time it wass called with that input.)<\/p>\n<p> The obvious question about all of this is: why is this called the derivative?<\/p>\n<p> If you think about differential calculus in continuous number mathematics, a simple explanation of the derivative of a function <img src='http:\/\/l.wordpress.com\/latex.php?latex=f&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f' style='vertical-align:1%' class='tex' alt='f' \/> is a function <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%27&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f'' style='vertical-align:1%' class='tex' alt='f'' \/>, which tells you how the the value of <img src='http:\/\/l.wordpress.com\/latex.php?latex=f%28x%29&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='f(x)' style='vertical-align:1%' class='tex' alt='f(x)' \/> will change.  The derivative of a regular expression is sort-of similar, if you squint at it just right: what it does is take a regular expression <img src='http:\/\/l.wordpress.com\/latex.php?latex=r&#038;bg=FFFFFF&#038;fg=000000&#038;s=0' title='r' style='vertical-align:1%' class='tex' alt='r' \/>, and show you how it changes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When you&#8217;re working with regular languages specified in regular expression form, there&#8217;s a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It&#8217;s called the Brzozozwksi derivative of a regular expression &#8211; or just simply the derivative of a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[79,89],"tags":[107,131,132,176,224],"class_list":["post-1401","post","type-post","status-publish","format-standard","hentry","category-computation","category-haskell","tag-automata","tag-computability-2","tag-computation-2","tag-haskell-2","tag-regular-expressions"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-mB","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=1401"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/1401\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=1401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=1401"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=1401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}