{"id":829,"date":"2009-11-23T09:55:45","date_gmt":"2009-11-23T09:55:45","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/11\/23\/creating-user-defined-types-in-haskell\/"},"modified":"2009-11-23T09:55:45","modified_gmt":"2009-11-23T09:55:45","slug":"creating-user-defined-types-in-haskell","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2009\/11\/23\/creating-user-defined-types-in-haskell\/","title":{"rendered":"Creating User-Defined Types in Haskell"},"content":{"rendered":"<ul>\n<li><em> (This is an edited repost of one of the posts from the earlier<br \/>\nversion of my Haskell tutorial.)<\/em><\/li>\n<li><em> (This file is a literate haskell script. If you save it<br \/>\nas a file whose name ends in &#8220;.lhs&#8221;, it&#8217;s actually loadable and<br \/>\nrunnable in GHCI or Hugs.)<\/em><\/li>\n<\/ul>\n<p> Like any other modern programming language, Haskell has excellent support<br \/>\nfor building user-defined data types. In fact, even though Haskell is very<br \/>\nmuch <em>not<\/em> object-oriented, most Haskell programs end up being centered<br \/>\naround the design and implementation of data structures, using constructions<br \/>\ncalled <em>classes<\/em> and <em>instances<\/em>.<\/p>\n<p> In this post, we&#8217;re going to start looking at how you implement data types<br \/>\nin Haskell. What I&#8217;m going to do is start by showing you how to implement a<br \/>\nsimple binary search tree. I&#8217;ll start with a very simple version, and then<br \/>\nbuild up from there.<\/p>\n<p><!--more--><\/p>\n<p> There are a variety of useful primitive types in Haskell &#8211; a wide range of<br \/>\nnumeric types (arbitrary sized integers, rationals, floats, etc.), characters,<br \/>\nand so on. There are also a lot of basic compound types provided by the<br \/>\nstandard library &#8211; the usual tuples, arrays, lists, and so on. But when you go<br \/>\nto build your own datatypes, the main thing that you use in Haskell is<br \/>\nsomething called an <em>algebraic data types<\/em>. An algebraic data type<br \/>\nconsists of a set of <em>variants<\/em>, each of which is (essentially) a tuple<br \/>\nwhich has been tagged with a marker that identifies which variant it belongs<br \/>\nto. When you define an algebraic type, Haskell creates a set of<br \/>\n<em>constructor functions<\/em> for the type, one for each variant.<\/p>\n<p> As usual, talking about things like this in theory is hopelessly vague, so<br \/>\nlet&#8217;s dive in and look at some code. We&#8217;ll start with a very simple binary<br \/>\nsearch tree containing integers. <\/p>\n<p> The basics are pretty straightforward. A binary search tree is a data<br \/>\nstructure consisting of a collection of nodes. Every node contains a value, a<br \/>\nleft child, and a right child. The left and right children each contain<br \/>\nanother binary search tree which could contain some values, or which could be<br \/>\nempty.<\/p>\n<p> When you&#8217;re describing a data type and you use the word &#8220;or&#8221;, that&#8217;s your<br \/>\nclue that you&#8217;re looking at a variant. The tree type has two cases: trees<br \/>\ncontaining values, and empty trees. A declaration of that type would look<br \/>\nlike:<\/p>\n<pre>\n&gt;data SimpleIntBST = SIBNode Integer SimpleIntBST SimpleIntBST\n&gt;                  | SIBEmpty\n<\/pre>\n<p>What this says is: we&#8217;re defining a data type named<br \/>\n<code>SimpleIntBST<\/code>. There are two kinds of <code>SimpleIntBST<\/code><br \/>\nvalues. The first kind is an <code>SIBNode<\/code>, which contains an integer<br \/>\nvalue, a left child, and a right child. The other kind is an<br \/>\n<code>SIBEmpty<\/code>, which contains no values. In a language<br \/>\nlike C++ or Java, you&#8217;d have each node contain pointers to children<br \/>\nnodes, and an empty child would be represented by a null pointer. But<br \/>\nin Haskell, there are no pointers &#8211; and more importantly, null doesn&#8217;t<br \/>\nhave a well-defined type. In Haskell, when you&#8217;ve got something<br \/>\nlike a null, you need to explicity define the null case using<br \/>\na variant, like <code>SIBEmpty<\/code>.<\/p>\n<p> In the declaration, <code>SIBNode<\/code> and <code>SIBEmpty<\/code> are the<br \/>\nnames of the <em>constructor functions<\/em> that are used to create values. To<br \/>\ncreate an <code>SIBNode<\/code>, you call it like a normal function with<br \/>\nparameters of the correct type: for example, to create an <code>SIBNode<\/code><br \/>\ncontaining the value &#8220;4&#8221; with empty left and right children, you&#8217;d write:<\/p>\n<pre>\nSIBNode 4 SIBEmpty SIBEmpty\n<\/pre>\n<p> Let&#8217;s try that in GHCI:<\/p>\n<pre>\n*Main&gt; SIBEmpty\n&lt;interactive&gt;:1:0:\nNo instance for (Show SimpleIntBST)\narising from use of `print' at &lt;interactive&gt;:1:0-7\nPossible fix: add an instance declaration for (Show SimpleIntBST)\nIn the expression: print it\nIn a 'do' expression: print it\n<\/pre>\n<p> What&#8217;s wrong? The interpreter tried to print out the value returned by the<br \/>\nexpression. The interpreter outputs values using the <code>print<\/code><br \/>\nfunction. In general in Haskell, when we have a problem like this, we look at<br \/>\ntypes. What&#8217;s the type of the <code>print<\/code> function?<\/p>\n<pre>\n*Main&gt; :type print\nprint :: (Show a) =&gt; a -&gt; IO ()\n*Main&gt;\n<\/pre>\n<p> The parameter to <code>print<\/code> must be a value of some type that is<br \/>\nan instance of the <code>Show<\/code> class. But we didn&#8217;t make our binary tree<br \/>\nan instance of <code>Show<\/code>. Fortunately, since <code>Show<\/code> is such<br \/>\na fundamental interface, Haskell provides a nifty shorthand to make the<br \/>\ncompiler <em>generate<\/em> the code necessary to provide an implementation of<br \/>\nwhatever is needed for <code>Show<\/code>, called a <em>deriving<\/em> clause. Let&#8217;s<br \/>\ntry again, this time with a deriving clause. We&#8217;ll call the new version<br \/>\n&#8220;IntBST&#8221; rather than &#8220;SimpleIntBST&#8221;, because treating this post as a literate<br \/>\nfile, we can&#8217;t re-declare &#8220;SimpleIntBST&#8221;.<\/p>\n<pre>\n&gt;data IntBST = IBNode Integer IntBST IntBST\n&gt;                  | IBEmpty deriving (Show)\n<\/pre>\n<p>Now, loading that into GHC, we&#8217;ll see:<\/p>\n<pre>\n*Main&gt; IBNode 4 IBEmpty IBEmpty\nIBNode 4 IBEmpty IBEmpty\n*Main&gt;\n<\/pre>\n<p>Which is exactly what we want: our binary search tree is now printable.<\/p>\n<p>A data structure like this isn&#8217;t useful without some operations to work<br \/>\nwith it. So let&#8217;s go ahead and do some implementing! We&#8217;ll start with an<br \/>\ninsert operator:<\/p>\n<pre>\n&gt;ibstInsert :: IntBST -&gt; Integer -&gt; IntBST\n&gt;ibstInsert (IBNode i left right) v\n&gt;        | i &gt; v = IBNode i (ibstInsert left v) right\n&gt;        | otherwise = IBNode i left (ibstInsert right v)\n&gt;ibstInsert IBEmpty v = (IBNode v IBEmpty IBEmpty)\n<\/pre>\n<p> So let&#8217;s look at that a bit. It starts off with an explicit type<br \/>\ndeclaration. That&#8217;s not strictly necessary: the compiler can infer the type of<br \/>\nthe function without any help. But it&#8217;s a good practice to put it there: it&#8217;s<br \/>\na useful bit of documentation for a human reader, and it&#8217;s useful for<br \/>\ncatching problems during compilation &#8211; if you made any mistakes writing<br \/>\nthe function, the compiler might be able to catch them because the<br \/>\ninferred type won&#8217;t match the declaration.<\/p>\n<p> After the type declaration, we get to the real definition, which is where<br \/>\nthings get interesting. I&#8217;m sure you noticed that there was something a bit<br \/>\nstrange about the type declaration for <code>IntBST<\/code>. In most<br \/>\nnon-functional languages, you define a new data type as some kind of a<br \/>\n<em>record<\/em> or <em>structure<\/em>, which contains a list of <em>named<\/em><br \/>\nfields; when you use the structure, you access its fields by derefencing an<br \/>\ninstance of the data structure using the name of a field. But in our Haskell<br \/>\nbinary search tree, the values belonging to an instance of the data structure<br \/>\n<em>don&#8217;t have names<\/em>. Like most functional languages, Haskell uses<br \/>\n<em>pattern matching<\/em> to access the pieces of a data structure. <\/p>\n<p> A <em>pattern<\/em> looks like a value expression, except that it may<br \/>\ncontain some <em>unbound<\/em> variables. The pattern can also be followed by a<br \/>\n<em>guard expression<\/em> which is a boolean expression using variables bound<br \/>\nin the pattern. If the guard is included, it&#8217;s separated from the patterns by<br \/>\na <code>|<\/code> character, which is generally read as &#8220;such that&#8221;. So the<br \/>\nfirst declaration line of <code>ibstInsert<\/code> has two patterns:<br \/>\n<code>(IBNode i left right)<\/code>, and <code>v<\/code>. It&#8217;s followed by a<br \/>\nguard. So that full declaration line can be read as &#8220;ibstInsert of an IntBST<br \/>\nmatching &#8216;<code>(IBNode i left right)<\/code>&#8216; and a value matching<br \/>\n&#8216;<code>v<\/code>&#8216; <em>such that<\/em> &#8216;<code>i &gt; v<\/code>&#8216; returns<br \/>\n&#8216;<code>(IBNode i left right)<\/code>&#8216;.<\/p>\n<p> As a shorthand, if you have a guard expression on an indented line<br \/>\nfollowing a pattern with a guard, the pattern is reused with the second guard.<br \/>\nSo the second line starting with a &#8220;<code>|<\/code>&#8221; is used if the first guard<br \/>\nfails. As a guard expression, &#8220;<code>otherwise<\/code>&#8221; can only be used as the<br \/>\n<em>last<\/em> guard for a pattern, and it&#8217;s always true.<\/p>\n<p> So the first case of the declaration of <code>ibstInsert<\/code>, in full,<br \/>\ndoes the following:<\/p>\n<ol>\n<li> If the tree parameter is an <code>IBNode<\/code>, then bind &#8220;i&#8221; to<br \/>\nthe value element of the node, &#8220;left&#8221; to the left subtree, and &#8220;right&#8221;<br \/>\nto the right subtree; and bind the variable &#8220;v&#8221; to the second parameter,<br \/>\nwhich is the value to be inserted to the tree. <\/li>\n<li> If the value in the tree node is larger than the value being<br \/>\ninserted, then return a tree that&#8217;s the same as this, except that<br \/>\nthe value is inserted into the <em>left<\/em> subtree.<\/li>\n<li> Otherwise, return a tree like this one, except that the new<br \/>\nvalue is inserted into the <em>right<\/em> subtree.<\/li>\n<\/ol>\n<p> The second case says that if the node is an empty, then return a new<br \/>\nnon-empty node with the new value as its value, and with empty left and right<br \/>\nchildren.<\/p>\n<p> If you look at the code now that you know how to read it, one thing that<br \/>\nshould hopefully be rather striking about it is just how much it looks like<br \/>\nthe explanatory pseudo-code that you&#8217;d find in an algorithms textbook. It&#8217;s<br \/>\nvery straightforward code, written in a very easy to read style. (As a brief<br \/>\naside, this is an example of one of the things that really attracts me to<br \/>\nHaskell programming. Code in Haskell tends to very naturally decompose into<br \/>\nvery small definitions that look and read like explanatory code from a<br \/>\ntextbook.)<\/p>\n<p> So let&#8217;s look at a quick example of using this to build a tree.<\/p>\n<pre>\n&gt; treeone = ibstInsert (ibstInsert (ibstInsert (ibstInsert (ibstInsert IBEmpty 3) 5) 2) 17) 10\n&gt; treetwo = (ibstInsert (ibstInsert (ibstInsert (ibstInsert treeone 4) 6) 1) 9)\n<\/pre>\n<p>That gives us two trees, one of which contains a subset of the values in<br \/>\nthe other. We can ask ghc to show us the values of them:<\/p>\n<pre>\n*Main&gt; treeone\nIBNode 3 (IBNode 2 IBEmpty IBEmpty) (IBNode 5 IBEmpty (IBNode 17 (IBNode 10 IBEmpty IBEmpty) IBEmpty))\n*Main*gt; treetwo\nIBNode 3 (IBNode 2 (IBNode 1 IBEmpty IBEmpty) IBEmpty) (IBNode 5 (IBNode 4 IBEmpty IBEmpty) (IBNode 17 (IBNode 10 (IBNode 6 IBEmpty (IBNode 9 IBEmpty IBEmpty)) IBEmpty) IBEmpty))\n*Main&gt;\n<\/pre>\n<p> The other thing that we want to be able to do with a binary search tree is<br \/>\nfind nodes in it. Since our current BST only contains integers, we&#8217;ll just<br \/>\nwrite a function that returns a boolean value indicating whether or not a<br \/>\nparticular value is in the tree.<\/p>\n<pre>\n&gt;\n&gt;ibstSearch :: IntBST -&gt; Integer -&gt; Bool\n&gt;ibstSearch (IBNode k left right) v\n&gt;    | k == v = True\n&gt;        | k &gt; v = ibstSearch left v\n&gt;        | otherwise = ibstSearch right v\n&gt;ibstSearch IBEmpty _ = False\n&gt;\n<\/pre>\n<p> This one should be pretty easy to read without much explanation. To be<br \/>\npedantic, I&#8217;ll walk through it quickly. If it&#8217;s a non-empty node, and the<br \/>\nvalue in the node equals the value being searched for, return true. If the<br \/>\nvalue in the node is bigger, then return the result of searching the left<br \/>\nsubtree; otherwise, return the result of searching the right subtree. If it&#8217;s<br \/>\nan empty node, then the value isn&#8217;t there, so return false.<\/p>\n<hr \/>\n<p> A binary search tree isn&#8217;t particularly useful if it can only hold<br \/>\nintegers. So as a first step towards making it more real, we&#8217;d like to be able<br \/>\nto store not just integers in the tree, but any value for which the necessary<br \/>\ncomparisons (<code>&gt;<\/code> and <code>==<\/code>) are defined. There&#8217;s a<br \/>\ntypeclass for that, called &#8220;<code>Ord<\/code>&#8220;, for <em>Ordered<\/em>. So now<br \/>\nlet&#8217;s just rewrite <code>IntBST<\/code> as a generic BST using typeclasses:<\/p>\n<pre>\n&gt; data (Ord a) =&gt; GenBST a = GBNode a (GenBST a) (GenBST a) &gt; | GBEmpty\n&gt;      deriving (Show)\n<\/pre>\n<p>The variable on the <em>left<\/em> of the equal in the data type definition<br \/>\nis a type parameter. The type-class constraint preceding the name of the new<br \/>\ndatatype means that the type parameter &#8220;<code>a<\/code>&#8221; can <em>only<\/em> be<br \/>\ninstantiated by a type that is an instance of the type-class<br \/>\n&#8220;<code>Ord<\/code>&#8220;. The method implementations will barely change:<\/p>\n<pre>\n&gt; gbstInsert :: (Ord a) =&gt; GenBST a -&gt; a -&gt; GenBST a\n&gt; gbstInsert (GBNode i left right) v\n&gt;         | i &gt; v = GBNode i (gbstInsert left v) right\n&gt;         | otherwise = GBNode i left (gbstInsert right v)\n&gt; gbstInsert GBEmpty v = (GBNode v GBEmpty GBEmpty)\n&gt;\n&gt; gbstSearch :: (Ord a) =&gt; GenBST a -&gt; a -&gt; Bool\n&gt; gbstSearch (GBNode k left right) v\n&gt;        | k == v = True\n&gt;        | k &gt; v = gbstSearch left v\n&gt;        | otherwise = gbstSearch right v\n&gt; gbstSearch GBEmpty _ = False\n<\/pre>\n<p> With that implementation of <code>gbstInsert<\/code>, we can pretty much<br \/>\nreuse the code to generate a sample instance: <\/p>\n<pre>\n&gt; gtreeone = gbstInsert (gbstInsert\n&gt;        (gbstInsert (gbstInsert (gbstInsert GBEmpty 3) 5) 2) 17) 10\n&gt; gtreetwo = (gbstInsert (gbstInsert (gbstInsert (gbstInsert gtreeone 4) 6) 1) 9)\n<\/pre>\n<p> Look what ghci says about those:<\/p>\n<pre>\n*Main*gt; gtreetwo\nGBNode 3 (GBNode 2 (GBNode 1 GBEmpty GBEmpty) GBEmpty) (GBNode 5 (GBNode 4 GBEmpty GBEmpty) (GBNode 17 (GBNode 10 (GBNode 6 GBEmpty (GBNode 9 GBEmpty GBEmpty)) GBEmpty) GBEmpty))\n*Main&gt; :type gtreetwo\ngtreetwo :: GenBST Integer\n*Main&gt; gbstSearch gtreetwo 17\nTrue\n*Main&gt; gbstSearch gtreetwo 18\nFalse\n*Main&gt; gbstSearch gtreetwo 1\nTrue\n*Main&gt; gbstSearch gtreeone 1\nFalse\n*Main&gt;\n<\/pre>\n<p> We can use the new generic tree in <em>exactly<\/em> the same way that we<br \/>\nused the integer specific one. We don&#8217;t need to declare types of expressions<br \/>\nthat use it. The compiler will happily infer the type parameter, and just make<br \/>\neverything work. But now we&#8217;re not restricted to integers:<\/p>\n<pre>\n*Main&gt; gbstInsert (gbstInsert (gbstInsert GBEmpty \"hi\") \"hello\") \"Salutations\"\nGBNode \"hi\" (GBNode \"hello\" (GBNode \"Salutations\" GBEmpty GBEmpty) GBEmpty) GBEmpty\n*Main&gt; gbstInsert (gbstInsert (gbstInsert (gbstInsert GBEmpty \"hi\") \"hello\") \"S\nalutations\") \"Greetings\"\nGBNode \"hi\" (GBNode \"hello\" (GBNode \"Salutations\" (GBNode \"Greetings\" GBEmpty GBEmpty) GBEmpty) GBEmpty) GBEmpty\n*Main&gt;\n<\/pre>\n<p> That&#8217;s some of the basics. Next post, we&#8217;ll build on the binary search<br \/>\ntree, giving it some more features, and updating the insert routine so that<br \/>\nthe tree stays nicely balanced. That will lead us in to some interesting<br \/>\nthings about how you code in a language like Haskell, where you can&#8217;t actually<br \/>\nmodify data structures in-place.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(This is an edited repost of one of the posts from the earlier version of my Haskell tutorial.) (This file is a literate haskell script. If you save it as a file whose name ends in &#8220;.lhs&#8221;, it&#8217;s actually loadable and runnable in GHCI or Hugs.) Like any other modern programming language, Haskell has excellent [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[89],"tags":[],"class_list":["post-829","post","type-post","status-publish","format-standard","hentry","category-haskell"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-dn","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/829","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=829"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/829\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=829"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=829"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=829"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}