{"id":232,"date":"2006-12-03T20:11:53","date_gmt":"2006-12-03T20:11:53","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2006\/12\/03\/building-datatypes-in-haskell-part-1\/"},"modified":"2006-12-03T20:11:53","modified_gmt":"2006-12-03T20:11:53","slug":"building-datatypes-in-haskell-part-1","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2006\/12\/03\/building-datatypes-in-haskell-part-1\/","title":{"rendered":"Building Datatypes in Haskell (part 1)"},"content":{"rendered":"<p><em> For this post, I&#8217;m doing a bit of an experiment. Haskell includes a &#8220;literate&#8221; syntax mode. In literate mode, and text that doesn&#8217;t start with a &#8220;&gt;&#8221; sign is considered a comment. So this entire post can be copied and used as a source file; just save it with a name ending in `&#8221;.lhs&#8221;`. If this doesn&#8217;t work properly, please post something in the comments to let me know. It&#8217;s more work for me to write the posts this way, so if it&#8217;s not working properly, I&#8217;d rather not waste the effort. I&#8217;ve tested it in both Hugs and ghci, and everything works, but who knows what will happen after a pass through MovableType!<\/em><\/p>\n<p> Like most modern programming languages, Haskell has excellent support for building user-defined data types. In fact, even though Haskell is very much <em>not<\/em> object-oriented, most Haskell programs end up being centered around the design and implementation of data structures.<\/p>\n<p>So today, I&#8217;m going to start looking at how you implement data types in Haskell. What I&#8217;m<br \/>\ngoing to do is start by showing you how to implement a simple binary search tree. I&#8217;ll start with<br \/>\na very simple version, and then build on that.<\/p>\n<p><!--more--><\/p>\n<p>The most common form of user-defined data type in Haskell is a <em>constructed<\/em> data type.<br \/>\nConstructed types are types with one or more <em>variants<\/em>, each of which has an associated<br \/>\n*constructor function*. The names of constructor functions must be unique to the type. Let&#8217;s start<br \/>\nwith a very simple version of a binary search tree containing integers. We&#8217;ll build it so that there<br \/>\nare empty nodes and non-empty nodes. Every non-empty node of the tree has an integer value, a left<br \/>\nchild, and a right child. A declaration for that would look like the following:<\/p>\n<pre>\n&gt;data SimpleIntBST = SIBNode Integer SimpleIntBST SimpleIntBST\n&gt;                  | SIBEmpty\n<\/pre>\n<p>What this says is: we&#8217;re defining a data type named <code>SimpleIntBST<\/code>. There are two kinds of <code>SimpleIntBST<\/code> values. The first kind is an <code>SIBNode<\/code>, which contains an integer value, a left child, and a right child. The other kind is an <code>SIBEmpty<\/code>, which contains no values. <code>SIBNode<\/code> and <code>SIBEmpty<\/code> are the names of the <em>constructor functions<\/em> that are used to create values. To create an<br \/>\n<code>SIBNode<\/code>, you call it like a normal function with parameters of the correct type: for example, to create an <code>SIBNode<\/code> containing the value &#8220;4&#8221; with empty left and right children, you&#8217;d write:<\/p>\n<pre>\nSIBNode 4 SIBEmpty SIBEmpty\n<\/pre>\n<p> Let&#8217;s try that in GHCI:<\/p>\n<pre>\n*Main&gt; SIBEmpty\n&lt;interactive&gt;:1:0:\nNo instance for (Show SimpleIntBST)\narising from use of `print' at &lt;interactive&gt;:1:0-7\nPossible fix: add an instance declaration for (Show SimpleIntBST)\nIn the expression: print it\nIn a 'do' expression: print it\n<\/pre>\n<p> What&#8217;s wrong? The interpreter tried to print out the value returned by the expression.<br \/>\nThe interpreter outputs values using the <code>print<\/code> function. In general in Haskell, when we<br \/>\nhave a problem like this, we look at types. What&#8217;s the type of the <code>print<\/code> function?<\/p>\n<pre>\n*Main&gt; :type print\nprint :: (Show a) =&gt; a -&gt; IO ()\n*Main&gt;\n<\/pre>\n<p> The parameter to <code>print<\/code> must be a value of some type that is an instance of the <code>Show<\/code> class. But we didn&#8217;t make our binary tree an instance of <code>Show<\/code>. Fortunately, since <code>Show<\/code> is such a fundamental interface, Haskell provides a nifty shorthand to make the compiler *generate* the code necessary to provide an implementation of whatever is needed for <code>Show<\/code>, called a *deriving* clause. Let&#8217;s try again, this time with a deriving clause. We&#8217;ll call the new version &#8220;IntBST&#8221; rather than &#8220;SimpleIntBST&#8221;, because treating<br \/>\nthis post as a literate file, we can&#8217;t re-declare &#8220;SimpleIntBST&#8221;.<\/p>\n<pre>\n&gt;data IntBST = IBNode Integer IntBST IntBST\n&gt;                  | IBEmpty deriving (Show)\n<\/pre>\n<p>Now, loading that into GHC, we&#8217;ll see:<\/p>\n<pre>\n*Main&gt; IBNode 4 IBEmpty IBEmpty\nIBNode 4 IBEmpty IBEmpty\n*Main&gt;\n<\/pre>\n<p>Which is exactly what we want: our binary search tree is now printable.<\/p>\n<p>A data structure like this isn&#8217;t useful without some operations to work with it. So let&#8217;s<br \/>\ngo ahead and do some implementing! We&#8217;ll start with an insert operator:<\/p>\n<pre>\n&gt;ibstInsert :: IntBST -&gt; Integer -&gt; IntBST\n&gt;ibstInsert (IBNode i left right) v\n&gt;        | i &gt; v = IBNode i (ibstInsert left v) right\n&gt;        | otherwise = IBNode i left (ibstInsert right v)\n&gt;ibstInsert IBEmpty v = (IBNode v IBEmpty IBEmpty)\n<\/pre>\n<p> So let&#8217;s look at that a bit. It starts off with an explicit type declaration. It&#8217;s not strictly<br \/>\nnecessary, but it&#8217;s a good practice to put it there. Then we get to the real definition, which is<br \/>\nwhere things get interesting. I&#8217;m sure you noticed that there was something a bit strange about the<br \/>\ntype declaration for <code>IntBST<\/code>. In most non-functional languages, you define a new data type as some kind of a <em>record<\/em> or <em>structure<\/em>, which contains a list of <em>named<\/em> fields; when you use the structure, you access its fields by derefencing an instance of the data structure using the name of a field. But in our Haskell binary search tree, the values belonging to an instance of the data structure <em>don&#8217;t have names<\/em>. Like most functional languages, Haskell uses <em>pattern matching<\/em> to access the pieces of a data structure. <\/p>\n<p> A <em>pattern<\/em> looks like a value expression, except that it may contain some <em>unbound<\/em> variables. The pattern can also be followed by a <em>guard expression<\/em> which<br \/>\nis a boolean expression using variables bound in the pattern. If the guard is included, it&#8217;s separated from the patterns by a <code>|<\/code> character, which is generally read as &#8220;such that&#8221;.<br \/>\nSo the first declaration line of <code>ibstInsert<\/code> has two patterns: <code>(IBNode i left right)<\/code>, and <code>v<\/code>. It&#8217;s followed by a guard. So that full declaration line can be read as &#8220;ibstInsert of an IntBST matching &#8216;<code>(IBNode i left right)<\/code>&#8216; and a value matching &#8216;<code>v<\/code>&#8216; <em>such that<\/em> &#8216;<code>i &gt; v<\/code>&#8216; returns &#8216;<code>(IBNode i left right)<\/code>&#8216;.<\/p>\n<p> As a shorthand, if you have a guard expression on an indented line following a pattern<br \/>\nwith a guard, the pattern is reused with the second guard. So the second line<br \/>\nstarting with a &#8220;<code>|<\/code>&#8221; is used if the first guard fails. As a guard expression,<br \/>\n&#8220;<code>otherwise<\/code>&#8221; can only be used as the <em>last<\/em> guard for a pattern, and<br \/>\nit&#8217;s always true.<\/p>\n<p> So the first case of the declaration of <code>ibstInsert<\/code>, in full, does the<br \/>\nfollowing:<\/p>\n<ol>\n<li> If the tree parameter is an <code>IBNode<\/code>, then bind &#8220;i&#8221; to the value<br \/>\nelement of the node, &#8220;left&#8221; to the left subtree, and &#8220;right&#8221; to the right subtree; and<br \/>\nbind the variable &#8220;v&#8221; to the second parameter, which is the value to be inserted to the tree.<\/p>\n<li> If the value in the tree node is larger than the value being inserted, then return a tree that&#8217;s the same as this, except that the value is inserted into the <em>left<\/em> subtree.\n<li> Otherwise, return a tree like this one, except that the new value is inserted<br \/>\ninto the <em>right<\/em> subtree.\n<\/ol>\n<p> The second case says that if the node is an empty, then return a new non-empty node with<br \/>\nthe new value as its value, and with empty left and right children.<\/p>\n<p> If you look at the code now that you know how to read it, one thing that should hopefully be rather striking about it is just how much it looks like the explanatory pseudo-code that you&#8217;d<br \/>\nfind in an algorithms textbook. It&#8217;s very straightforward code, written in a very easy to read style. (As a brief aside, this is an example of one of the things that really attracts me to Haskell programming. Code in Haskell tends to very naturally decompose into very small definitions that<br \/>\nlook and read like explanatory code from a textbook.)<\/p>\n<p> So let&#8217;s look at a quick example of using this to build a tree.<\/p>\n<pre>\n&gt;treeone = ibstInsert (ibstInsert (ibstInsert (ibstInsert (ibstInsert IBEmpty 3) 5) 2) 17) 10\n&gt;treetwo = (ibstInsert (ibstInsert (ibstInsert (ibstInsert treeone 4) 6) 1) 9)\n<\/pre>\n<p>That gives us two trees, one of which contains a subset of the values in the other. We can ask ghc to show us the values of them:<\/p>\n<pre>\n*Main&gt; treeone\nIBNode 3 (IBNode 2 IBEmpty IBEmpty) (IBNode 5 IBEmpty (IBNode 17 (IBNode 10 IBEmpty IBEmpty) IBEmpty))\n*Main*gt; treetwo\nIBNode 3 (IBNode 2 (IBNode 1 IBEmpty IBEmpty) IBEmpty) (IBNode 5 (IBNode 4 IBEmpty IBEmpty) (IBNode 17 (IBNode 10 (IBNode 6 IBEmpty (IBNode 9 IBEmpty IBEmpty)) IBEmpty) IBEmpty))\n*Main&gt;\n<\/pre>\n<p> The other thing that we want to be able to do with a binary search tree is find nodes in it. Since our current BST only contains integers, we&#8217;ll just write a function that returns a boolean value indicating whether or not a particular value is in the tree.<\/p>\n<pre>\n&gt;\n&gt;ibstSearch :: IntBST -&gt; Integer -&gt; Bool\n&gt;ibstSearch (IBNode k left right) v\n&gt; \t    | k == v = True\n&gt;        | k &gt; v = ibstSearch left v\n&gt;        | otherwise = ibstSearch right v\n&gt;ibstSearch IBEmpty _ = False\n&gt;\n<\/pre>\n<p> This one should be pretty easy to read without much explanation. To be pedantic, I&#8217;ll walk through it quickly. If it&#8217;s a non-empty node, and the value in the node equals the value being searched for, return true. If the value in the node is bigger, then return the result of searching the left subtree; otherwise, return the result of searching the right subtree. If it&#8217;s an empty node, then the value isn&#8217;t there, so return false.<\/p>\n<hr \/>\n<p> A binary search tree isn&#8217;t particularly useful if it can only hold integers. So as a first step<br \/>\ntowards making it more real, we&#8217;d like to be able to store not just integers in the tree,<br \/>\nbut any value for which the necessary comparisons (<code>&gt;<\/code> and <code>==<\/code>) are<br \/>\ndefined. There&#8217;s a typeclass for that, called &#8220;<code>Ord<\/code>&#8220;, for <em>Ordered<\/em>. So now let&#8217;s just rewrite <code>IntBST<\/code> as a generic BST using typeclasses:<\/p>\n<pre>\n&gt;data (Ord a) =&gt; GenBST a = GBNode a (GenBST a) (GenBST a)\n&gt;                  | GBEmpty deriving (Show)\n<\/pre>\n<p>The variable on the <em>left<\/em> of the equal in the data type definition is<br \/>\na type parameter. The type-class constraint preceding the name of the<br \/>\nnew datatype means that the type parameter &#8220;<code>a<\/code>&#8221; can <em>only<\/em> be<br \/>\ninstantiated by a type that is an instance of the type-class &#8220;<code>Ord<\/code>&#8220;. The<br \/>\nmethod implementations will barely change:<\/p>\n<pre>\n&gt;gbstInsert :: (Ord a) =&gt; GenBST a -&gt; a -&gt; GenBST a\n&gt;gbstInsert (GBNode i left right) v\n&gt;        | i &gt; v = GBNode i (gbstInsert left v) right\n&gt;        | otherwise = GBNode i left (gbstInsert right v)\n&gt;gbstInsert GBEmpty v = (GBNode v GBEmpty GBEmpty)\n&gt;\n&gt;gbstSearch :: (Ord a) =&gt; GenBST a -&gt; a -&gt; Bool\n&gt;gbstSearch (GBNode k left right) v\n&gt; \t    | k == v = True\n&gt;        | k &gt; v = gbstSearch left v\n&gt;        | otherwise = gbstSearch right v\n&gt;gbstSearch GBEmpty _ = False\n<\/pre>\n<p> With that implementation of <code>gbstInsert<\/code>, we can pretty much reuse the code to<br \/>\ngenerate a sample instance:\t<\/p>\n<pre>\n&gt;gtreeone = gbstInsert (gbstInsert (gbstInsert (gbstInsert (gbstInsert GBEmpty 3) 5) 2) 17) 10\n&gt;gtreetwo = (gbstInsert (gbstInsert (gbstInsert (gbstInsert gtreeone 4) 6) 1) 9)\n<\/pre>\n<p> Look what ghci says about those:<\/p>\n<pre>\n*Main*gt; gtreetwo\nGBNode 3 (GBNode 2 (GBNode 1 GBEmpty GBEmpty) GBEmpty) (GBNode 5 (GBNode 4 GBEmpty GBEmpty) (GBNode 17 (GBNode 10 (GBNode 6 GBEmpty (GBNode 9 GBEmpty GBEmpty)) GBEmpty) GBEmpty))\n*Main&gt; :type gtreetwo\ngtreetwo :: GenBST Integer\n*Main&gt; gbstSearch gtreetwo 17\nTrue\n*Main&gt; gbstSearch gtreetwo 18\nFalse\n*Main&gt; gbstSearch gtreetwo 1\nTrue\n*Main&gt; gbstSearch gtreeone 1\nFalse\n*Main&gt;\n<\/pre>\n<p> We can use the new generic tree in <em>exactly<\/em> the same way that we used the integer<br \/>\nspecific one. We don&#8217;t need to declare types of expressions that use it. The compiler<br \/>\nwill happily infer the type parameter, and just make everything work. But now we&#8217;re not<br \/>\nrestricted to integers:<\/p>\n<pre>\n*Main&gt; gbstInsert (gbstInsert (gbstInsert GBEmpty \"hi\") \"hello\") \"Salutations\"\nGBNode \"hi\" (GBNode \"hello\" (GBNode \"Salutations\" GBEmpty GBEmpty) GBEmpty) GBEmpty\n*Main&gt; gbstInsert (gbstInsert (gbstInsert (gbstInsert GBEmpty \"hi\") \"hello\") \"S\nalutations\") \"Greetings\"\nGBNode \"hi\" (GBNode \"hello\" (GBNode \"Salutations\" (GBNode \"Greetings\" GBEmpty GBEmpty) GBEmpty) GBEmpty) GBEmpty\n*Main&gt;\n<\/pre>\n<p> I think that&#8217;s enough for one day. Next time, we&#8217;ll take our binary search tree, and look at using it for key\/value pairs, and maintaining tree balance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For this post, I&#8217;m doing a bit of an experiment. Haskell includes a &#8220;literate&#8221; syntax mode. In literate mode, and text that doesn&#8217;t start with a &#8220;&gt;&#8221; sign is considered a comment. So this entire post can be copied and used as a source file; just save it with a name ending in `&#8221;.lhs&#8221;`. If [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[89],"tags":[],"class_list":["post-232","post","type-post","status-publish","format-standard","hentry","category-haskell"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-3K","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/232","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=232"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/232\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=232"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=232"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=232"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}