{"id":765,"date":"2009-04-15T13:28:43","date_gmt":"2009-04-15T13:28:43","guid":{"rendered":"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/04\/15\/the-return-of-the-compression-idiot\/"},"modified":"2009-04-15T13:28:43","modified_gmt":"2009-04-15T13:28:43","slug":"the-return-of-the-compression-idiot","status":"publish","type":"post","link":"http:\/\/www.goodmath.org\/blog\/2009\/04\/15\/the-return-of-the-compression-idiot\/","title":{"rendered":"The Return of the Compression Idiot"},"content":{"rendered":"<p> Remember <a href=\"http:\/\/scientopia.org\/blogs\/goodmath\/2009\/03\/i-get-mail-iterative-compression\">a while back<\/a>, I wrote about a crackpot who pestered me both about<br \/>\nconverting to Christianity, and his wonderful, miraculous compression system? He<br \/>\nclaimed to be able to repeatedly compress any file, making it smaller each time.<\/p>\n<p> Well, he&#8217;s back pestering me again. Repeatedly asking him to leave me alone,<br \/>\nshouting at him, etc., hasn&#8217;t worked. (His current claim is that he doesn&#8217;t know how<br \/>\nto delete me from his gmail contacts list.) So I&#8217;m resorting to another round of<br \/>\npublic humiliation, which I hope will be informative and entertaining as well.<\/p>\n<p><!--more--><\/p>\n<p> To remind you of the relevant background about compression: what compression does<br \/>\nis reduce the size of a string by eliminating redundancy in that string. A trivial example<br \/>\nof that is that most text strings use a very limited range of characters &#8211; a string<br \/>\nmade up of pure ASCII text will typically need less that 7 bits per character &#8211; but the<br \/>\ncommon representation uses a minimum of 8 bits per character. The high bit is always zero. So<br \/>\nin a string like that, without considering anything else, 1\/8th of the size of the document<br \/>\nis completely redundant.<\/p>\n<p> Looked at in terms of information theory, for a given system, any string contains a particular<br \/>\namount of information. That quantity of information is typically <em>much<\/em> smaller than any<br \/>\ncommon representation &#8211; the representation is full of padding and redundancy. So what compression<br \/>\ndoes is reduce the size of an input string to something closer to the minimum length required by<br \/>\nthe information it contains.<\/p>\n<p> But <em>most<\/em> strings are not compressible <em>at all<\/em> &#8211; which is really<br \/>\neasy to prove. A compression function <em>must<\/em> be invertible &#8211; that is, a given<br \/>\nfunction f can only be a compression function if there is a function f<sup>-1<\/sup>, such<br \/>\nthat f<sup>-1<\/sup>(f(n)) = n.<\/p>\n<p> Think of the strings of N bits. Suppose we want to compress them to one-half<br \/>\nof their original length. How many of those strings can be compressed that small?<\/p>\n<p> There are 2<sup>n\/2<\/sup> strings of length n\/2; and there are 2<sup>n<\/sup> uncompressed<br \/>\nstrings. So, at best, 1\/2<sup>n\/2<\/sup> strings are compressible to<br \/>\na string of length n\/2 &#8211; any more, and the compression function isn&#8217;t invertible &#8211; you can&#8217;t<br \/>\nuncompress!<\/p>\n<p> Even a trivial compressor &#8211; one that removes exactly one bit from the input string &#8211; <em>can&#8217;t<\/em> compress half of its inputs. If it could, it wouldn&#8217;t be invertible. And<br \/>\neven that analysis has a problem: it assumes that all inputs have length N. If you were looking<br \/>\nat strings whose length is <em>less than or equal to<\/em> n, then it gets even worse.<\/p>\n<p> The problem is very simple to see &#8211; the number of compressed strings is smaller than the number of input strings &#8211; but to uncompress, you need to know <em>exactly<\/em> which input string produced a compressed string.<\/p>\n<p> My persistently dimwitted correspondent is apparently incapable of understanding this. In<br \/>\nhis latest missive, he sent me a list of documents, which are the result of repeatedly<br \/>\nrunning bzip2 on an input document. In between runs of bzip2, he runs his own magic program,<br \/>\nwhich takes the uncompressible output from bzip2, and renders it compressible. According to him,<br \/>\nusing his system, he can compress <em>any<\/em> input document down to about 12K. As a<br \/>\n&#8220;proof&#8221; of this, he provides a script that runs his program, and shows the file sizes. This<br \/>\nis pretty long, but it&#8217;s amusing. What he&#8217;s doing is iteratively running bzip2, then running his<br \/>\nmagic program, and then running bzip2 again.<\/p>\n<pre>\n7160799 -rw-r--r--  1 user  wheel  415241 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  415241 Apr 14 10:24 in\n7160797 -rw-r--r--  1 user  wheel  375688 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  340370 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  307278 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  277451 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  251925 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  228574 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  208202 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  189791 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  172300 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  155808 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  140951 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  128431 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  116992 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  107075 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  97124 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  89283 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  82238 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  74710 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  68219 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  63097 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  58134 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  53304 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  48433 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  45429 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  41524 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  38161 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  35385 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  33316 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  30773 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  29004 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  26714 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  25371 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  23605 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  22728 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  21723 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  20478 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  20092 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  19682 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  19247 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  18686 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  18102 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  17418 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  16543 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  15252 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  15123 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  14958 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  14779 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  14558 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  14313 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  14044 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  13705 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  13283 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12838 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12242 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12469 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11729 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11882 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12086 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12295 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11394 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11543 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11729 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11910 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12105 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12334 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11441 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11565 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11772 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12041 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12235 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12442 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11580 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11754 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11918 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12120 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12369 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11564 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11718 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11883 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12047 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12250 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12481 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11672 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11879 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12093 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12300 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11361 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11510 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11710 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11880 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12109 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12371 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11584 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11790 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12017 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12219 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12431 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11619 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11777 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11977 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12175 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12400 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11576 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11781 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12014 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12203 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12461 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11675 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11854 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12082 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12341 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11489 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11641 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11850 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12047 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12277 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12495 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11692 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11904 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12114 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12296 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11348 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11447 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11564 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11736 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11950 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12142 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12379 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11523 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11642 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11849 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12026 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12249 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12472 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11710 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11891 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12126 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12341 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11403 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11563 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11765 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12004 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12225 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12435 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11563 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11774 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11926 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12152 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12395 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11541 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11744 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11920 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12135 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12320 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11493 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11624 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11788 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11988 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12216 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12461 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11620 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11838 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12017 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12232 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12468 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11759 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11946 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12180 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12385 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11498 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11661 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11869 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12077 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12273 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12515 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11752 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11972 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12146 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12343 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11474 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11593 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11792 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12024 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12223 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12486 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11745 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11932 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12121 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12318 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11396 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11573 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11716 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11949 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12175 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12384 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11565 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11738 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11914 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12123 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12307 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11370 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11521 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11706 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11923 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12112 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12369 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11539 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11672 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11883 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12111 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12383 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11552 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11740 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11893 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12124 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12349 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11547 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11752 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11989 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12233 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12452 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11637 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11830 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12010 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12225 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12469 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11731 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  11912 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  12120 Apr 14 10:24 in\n7160799 -rw-r--r--  1 user  wheel  12355 Apr 14 10:24 in\n7160798 -rw-r--r--  1 user  wheel  11500 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11664 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  11876 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  12066 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  12297 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11397 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  11558 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11750 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  11935 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  12151 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  12367 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11495 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  11676 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11879 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  12113 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  12341 Apr 14 10:25 in\n7160798 -rw-r--r--  1 user  wheel  11458 Apr 14 10:25 in\n7160799 -rw-r--r--  1 user  wheel  11687 Apr 14 10:25 in\n<\/pre>\n<p> For <em>any<\/em> arbitrary input file, he can do the same process &#8211; so that for any<br \/>\npossible input, his process will, eventually, converge on a file size around 12K. Sounds<br \/>\nabsolutely amazing, doesn&#8217;t it?<\/p>\n<p> But you know what? I can do <em>better<\/em>. I have here a script which you can<br \/>\nrun on any input, and which will render it recompressible &#8211; and it works until the file<br \/>\nreaches around 40 <em>bytes<\/em>. And it usually does it in very few steps. For example,<br \/>\nI took my local copy of plan9 from userspace &#8211; source code and binaries &#8211; and<br \/>\ntarred it up. The resulting file contains 149,852,160 bytes. Running gzip on it<br \/>\nshrinks it to 52,824,151. Pretty good compression &#8211; around 60%. Interestingly,<br \/>\nrerunning gzip once does actually compress it a tad more &#8211; reducing it to 52,563,667 bytes. Doing it again increases the size by a small amount &#8211; repeating just cycles aronud 52,600,000 bytes.<\/p>\n<p> Now, I&#8217;ll run my magic program on it. The output of my magic program is <em>exactly<\/em><br \/>\nthe same as the length of its input &#8211; only now it&#8217;s compressible! So I&#8217;ll run gzip on<br \/>\nit again. The result is 29,918,882. Wow!<\/p>\n<p> Now I&#8217;ll repeat that! 17013250! Again: 9674608. Again: 5502157. And again, and again, and again, until I get down to 43 bytes.<\/p>\n<p> What&#8217;s the catch? It should be obvious &#8211; it&#8217;s exactly what I said when I talked about<br \/>\nthe proof. I can&#8217;t <em>uncompress<\/em>. I can repeatedly remove information from the<br \/>\nfile, making it possible to compress it more &#8211; but I can&#8217;t put that information back,<br \/>\nbecause <em>it&#8217;s not in the compressed file<\/em>. <\/p>\n<p> And that&#8217;s exactly what the jerk did: he can compress, and compress, and compress to his hearts content. But he can&#8217;t <em>uncompress<\/em>.<\/p>\n<p> For folks who are interested, here&#8217;s my magic recompressibleizer program:<\/p>\n<pre>\n#include\nint main(int argc, char** argv) {\nfor (int c = getchar(); c != -1; c = getchar()) {}\nc = c &amp; 170;\nputchar(c);\n}\n}\n<\/pre>\n<p> What does it do? It changes every other bit in the input file to 0. That removes a heck<br \/>\nof a lot of information &#8211; now every byte of the file contains 4 meaningful bits instead<br \/>\nof 8. So gzip can then compress it. And then I can run my program on it again. And again. And<br \/>\nagain. Until it reduces the file to a basic minimum space overhead needed by<br \/>\ngzip.<\/p>\n<p> Of course, that &#8220;recompressibleizer&#8221; script is silly. But the basic idea of it &#8211; reducing<br \/>\nthe amount of information contained in a compressed string string in an irreversible way &#8211;<br \/>\nis what <em>any<\/em> program that produces iterative compressibility <em>must<\/em> do. You can&#8217;t<br \/>\never get around the basic facts of compressibility: most strings aren&#8217;t compressible. They<br \/>\ncan&#8217;t be.<\/p>\n<p> The simple fact, from the proof up towards the top of this post, is that the set of<br \/>\nstrings of length N is <em>larger<\/em> than the set of strings of length less than N. You can&#8217;t<br \/>\nget around that.<\/p>\n<p> Any idiot can produce a script which renders uncompressible documents compressible &#8211; if they don&#8217;t have to be able to <em>uncompress<\/em> them afterwards. If you want to be able to<br \/>\nuncompress them, then you&#8217;ve got a problem: you can&#8217;t reinsert the information you<br \/>\nextracted without knowing what it is. The <em>only<\/em> way of making this scheme work<br \/>\nat all is by adding information to the decompression program &#8211; making it larger. As I<br \/>\nexplained last time, you <em>can<\/em> do that, and it can even have a very nice payoff<br \/>\nin the size of the compressed document. For example, using a pre-populated dictionary<br \/>\nof common n-grams in english text can have a great effect on the compressed size of english<br \/>\ndocuments. But your compressor and decompressor need to be considerably larger to store that dictionary, and they&#8217;ll perform worse than a standard LZW compressor on non-english text inputs. And it only works once &#8211; once you&#8217;ve used that trick to strip out the english-language pattern<br \/>\ndata, using that trick again <em>won&#8217;t work<\/em> &#8211; the patterns that it takes advantage of<br \/>\nare gone.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Remember a while back, I wrote about a crackpot who pestered me both about converting to Christianity, and his wonderful, miraculous compression system? He claimed to be able to repeatedly compress any file, making it smaller each time. Well, he&#8217;s back pestering me again. Repeatedly asking him to leave me alone, shouting at him, etc., [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7],"tags":[],"class_list":["post-765","post","type-post","status-publish","format-standard","hentry","category-bad-software"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4lzZS-cl","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/comments?post=765"}],"version-history":[{"count":0,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/posts\/765\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/media?parent=765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/categories?post=765"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.goodmath.org\/blog\/wp-json\/wp\/v2\/tags?post=765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}