{"id":806,"date":"2009-09-21T12:55:37","date_gmt":"2009-09-21T12:55:37","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/09\/21\/information-vs-meaning\/"},"modified":"2009-09-21T12:55:37","modified_gmt":"2009-09-21T12:55:37","slug":"information-vs-meaning","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2009\/09\/21\/information-vs-meaning\/","title":{"rendered":"Information vs. Meaning"},"content":{"rendered":"<p> If you regularly follow comments on this blog, you&#8217;ll know that I&#8217;ve been<br \/>\nhaving a back-and-forth with a guy who doesn&#8217;t know much about information<br \/>\ntheory, and who&#8217;s been using his ignorance to try to assemble arguments against the<br \/>\nKolmogorov-Chaitin information-theoretic measure of information in a string.<br \/>\nIn the course of doing that, I came up with what I think are some interesting ways<br \/>\nof explaining bits of it, so I thought I&#8217;d promote it to the top-level to share<br \/>\nwith more readers.<\/p>\n<p> To be absolutely clear up front: I&#8217;m <em>far<\/em> from an expert on K-C theory. It&#8217;s something that I find incredibly fascinating, and I&#8217;ve read a lot of Chaitin&#8217;s work. I&#8217;ve been fortunate enough to meet Greg Chaitin a bunch of  times when we both worked at<br \/>\nIBM, and I&#8217;ve attended a bunch of his lectures. But this isn&#8217;t my main area of expertise,<br \/>\nnot by a long-shot.<\/p>\n<p> If you don&#8217;t know any K-C information theory, then you can look at my<br \/>\nintroduction <a href=\"http:\/\/scientopia.org\/blogs\/goodmath\/2006\/06\/an-introduction-to-information-theory-updated-from-blogspot\">here<\/a>. The rest is beneath the fold.<\/p>\n<p><!--more--><\/p>\n<h2> Information vs. Meaning<\/h2>\n<p> Kolmogorov-Chaitin information theory is focused on<br \/>\n<em>quantification<\/em> &#8211; that is, on being able to measure the quantity of<br \/>\ninformation in a string of characters. It doesn&#8217;t care what the string<br \/>\n<em>means<\/em>. Information theory cares about <em>how much<\/em> information<br \/>\nis in a string; it doesn&#8217;t care what the string means. In fact, you can say<br \/>\nsomething much stronger about information theory: if you consider meaning<br \/>\n<em>at all<\/em>, then you&#8217;re <em>not<\/em> doing information theory. A truly<br \/>\nabstract string could mean many different things: information theory&#8217;s<br \/>\nmeasurement of its information content <em>must<\/em> include everything that<br \/>\ncould be used relative to <em>any possible system of meaning<\/em> to determine<br \/>\nthe information content of a string.<\/p>\n<p> I&#8217;ll borrow a tiny bit of formality from denotational semantics. In<br \/>\ndenotational semantics, we take a domain of objects and functions that range<br \/>\nover those objects, and we map <em>statements<\/em> &#8211; sentences that have some<br \/>\nsyntactic structure &#8211; onto the objects and functions in the domain.<br \/>\nTo assign meaning, what you&#8217;re basically doing is creating a system where<br \/>\nyou can <em>map<\/em> from syntax &#8211; that is, from strings of symbols &#8211; to<br \/>\nobjects and functions. The string of characters &#8220;Mark Chu-Carroll&#8221; has<br \/>\nabsolutely <em>no<\/em> intrinsic meaning. But by interpreting in a domain<br \/>\nof meaning, we can say that the domain <em>maps<\/em> that string of symbols<br \/>\nin a way that makes it a representation of a reference to me. But there&#8217;s<br \/>\nnothing about that string that makes it <em>necessarily<\/em> be a reference<br \/>\nto me. It could mean many different things, depending on domain of meaning<br \/>\nwhich we use to interpret it.<\/p>\n<p> To be a bit more specific, consider the string<br \/>\n&#8220;0000000000000000000000000000000000000000&#8221;. That&#8217;s a string of 40 &#8220;0&#8221;s. If you<br \/>\ninterpret that as a number &#8211; that is, you understand the string of digits as<br \/>\n<em>having a meaning<\/em> as a number, it&#8217;s the number 0. But a string of 40<br \/>\nzeros <em>is not<\/em> equivalent to the number 0 in terms of information<br \/>\ntheory. That string could also mean the number 40 understood as a unary<br \/>\nnumber. Or it could mean a sequence of 5 bytes containing nothing but zeros in<br \/>\na computer memory. Or it could mean sequence of 41 bytes, the first forty<br \/>\ncontaining the ASCII code for the character &#8220;0&#8221;, and one containing a<br \/>\nterminator-mark. The string contains enough information to allow you to<br \/>\n<em>interpret<\/em> it as any of those things. But if you choose any of those<br \/>\nas <em>the<\/em> meaning of it, and work out a representation of that meaning,<br \/>\nyou&#8217;ll wind up with a <em>different<\/em> answer to the quantity of information<br \/>\ncontained in that string than you would for the string considered as an<br \/>\nabstract. Choosing a framework in which to assign it meaning can make a string<br \/>\nappear to have either more or less information than the<br \/>\nstring-as-abstract.<\/p>\n<p> You can make it appear to have more information by choosing a system of<br \/>\nmeaning in which the system itself contains extra information that it assigns<br \/>\nto the string; for example, when I say &#8220;the complete table of printable<br \/>\nunicode code-points in order&#8221;, if I interpret that string in the right<br \/>\nframework of meaning, it <em>means<\/em> a sequence of over 1100 characters.<br \/>\nBut those 1100 characters aren&#8217;t containing in the string; they&#8217;re part of the<br \/>\nframework of meaning that&#8217;s used to interpret the string. <\/p>\n<p> You can also choose a framework in which to assign meaning which makes a<br \/>\nstring appear to have <em>less<\/em> information content than the<br \/>\nstring-as-abstract. For example, the string of 0s above. If we interpret a<br \/>\nstring of 40 &#8220;0&#8221;s as <em>meaning<\/em> zero, then in terms of meaning, that<br \/>\nstring has <em>one character<\/em> of information.<\/p>\n<h2> What&#8217;s information?<\/h2>\n<p> Intuitively, it&#8217;s hard for us to separation information from meaning. Informally,<br \/>\nwe consider them to be the same thing. But in information theory, they&#8217;re very<br \/>\ndifferent. As I said above, meaning is what happens when you <em>interpret<\/em><br \/>\na piece of information in some context. But absent any context, how can<br \/>\nyou quantify its information content?\n<\/p>\n<p> To be strict, you can&#8217;t. Without any context of any kind, you&#8217;ve got no<br \/>\nway to quantify anything. But you can create a sort of minimal context, which<br \/>\ngives you a really simple way to quantify the amount of information in the<br \/>\nstring.<\/p>\n<p> The basic, fundamental idea is <em>description<\/em>. The amount of<br \/>\ninformation contained in a string is the length of the <em>shortest<\/em><br \/>\ndescription that uniquely identifies that string.<\/p>\n<p> And that&#8217;s the essence of information theory: <em>description<\/em>. The<br \/>\namount of information in something is the <em>shortest description<\/em> that<br \/>\ncan uniquely reproduce that thing. It doesn&#8217;t matter what the thing means. You<br \/>\ndon&#8217;t need to understand it. You don&#8217;t need to interpret it. All you need is a<br \/>\ndescription that allows you to reproduce it. To produce the string<br \/>\n10101101, you don&#8217;t need to know that it means the number ten million, one hundred and<br \/>\none thousand, one hundred and one interpreted as a number written<br \/>\nin decimal notation. You don&#8217;t need to know that it means the number<br \/>\n173, interpreted as a binary number. All you need to know is that<br \/>\nit&#8217;s a sequence of symbols.<\/p>\n<p> That definition might seem like a trick &#8211; it just replaces &#8220;quantity of<br \/>\ninformation&#8221; with &#8220;length of the shortest description&#8221;. How can we<br \/>\n<em>describe<\/em> something in a way that doesn&#8217;t involve meaning? <\/p>\n<p> The answer is: by <em>computation<\/em>. We can define a simple,<br \/>\n<em>minimal<\/em> universal computing device. A description is a program that<br \/>\ngenerates a string. The shortest description of a string is the shortest<br \/>\ninput-free program that can generate that string.<\/p>\n<p> Of course, there&#8217;s another copout there. What&#8217;s <em>minimal<\/em>? And<br \/>\nwon&#8217;t the information content of a string be different using different computing<br \/>\ndevices?<\/p>\n<p> Minimality is a tricky topic to tackle formally. You could easily write<br \/>\na PhD dissertation (and someone probably already has) on a precise<br \/>\ndefinition of a minimal computing device. But informally, the idea that<br \/>\nwe&#8217;re getting at is very simple: <em>no magic instructions<\/em>. That is,<br \/>\neach instruction performs a single action &#8211; so you can&#8217;t output<br \/>\nmore than one symbol with a single instruction.<\/p>\n<p> For the second question: yes, that&#8217;s true &#8211; to a limited extent. But<br \/>\ngiven any pair of computing machines, the difference between the<br \/>\nlength of the shortest programs to generate a specific string will vary<br \/>\nby no more than a single fixed constant.<\/p>\n<p> That last point is easy (and fun) to prove. First, we need to<br \/>\nformalize the statement. Take two computing<br \/>\nmachines, P and Q. Let the information content of a string S relative<br \/>\nto P be written as I(S, P), and relative to Q be I(S,Q).<br \/>\nFor all strings S, the absolute value of I(S,P) &#8211; I(S,Q) will<br \/>\nbe less than or equal to a constant C.<\/p>\n<p> Here&#8217;s the proof:<\/p>\n<ol>\n<li> if P and Q are both universal computing devices, then<br \/>\nthere is a <em>program<\/em> for P which allows it to emulate Q;<br \/>\nand there is a program for Q which allows it to emulate P. <\/li>\n<li> Let the program for P to emulate Q have size P<sub>q<\/sub>,<br \/>\nand let the program for Q to emulate P be Q<sub>p<\/sub>.<\/li>\n<li> Consider a string S. Let the program for P to generate S<br \/>\nbe P<sub>S<\/sub>, and the program for Q to generate S be<br \/>\nQ<sub>S<\/sub>. If P<sub>S<\/sub> is shorter than Q<sub>S<\/sub>,<br \/>\nthen the shortest program for Q to generate S cannot possibly<br \/>\nbe longer than Q<sub>S<\/sub> + Q<sub>P<\/sub>. <\/li>\n<li> Reverse the above statement for P and Q.<\/li>\n<li> Now, take the longer of P<sub>Q<\/sub> and Q<sub>P<\/sub>. For<br \/>\nany string S, it should be obvious that information content of<br \/>\nS relative to P or Q cannot possibly differ by more that<br \/>\nmax(len(P<sub>Q<\/sub>), len(Q<sub>P<\/sub>)).<\/li>\n<\/ol>\n<p> No magic there. It&#8217;s very straightforward.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you regularly follow comments on this blog, you&#8217;ll know that I&#8217;ve been having a back-and-forth with a guy who doesn&#8217;t know much about information theory, and who&#8217;s been using his ignorance to try to assemble arguments against the Kolmogorov-Chaitin information-theoretic measure of information in a string. In the course of doing that, I came [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[30],"tags":[],"class_list":["post-806","post","type-post","status-publish","format-standard","hentry","category-information-theory"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-d0","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=806"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/806\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=806"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}