Retrograde Dynamic Tablature Assembly


Algorithm compression
April 10, 2009, 3:40 am
Filed under: algorithms, complexity, computer science | Tags: ,

Information theory has a very developed and precise grasp on what it means for data to be “compressed.” Specifically, the data has to be essentially random. If there is any way to describe a pattern in the data that would allow the overall description to be shorter than the data itself, then it is possible to compress it further. When compressing however, one must consider both the data and the decompresser.For instance, you could pass a message to your pal that simply said “2″ and that would mean, “Read message number 2 from the code book.” The message could have any arbitrary amount of complexity to it, but this doesn’t mean you’ve compressed the long message down to simply the number 2. You have to consider the entire codebook as part of the overall message. And if you are training a computer to look up these coded messages, you have to include the rules that the computer follows to look things up as well.

The point of all this is that it was long ago discovered that there is a minimum theoretical level to which a message can be compressed. You can’t keep compressing things down forever, it will eventually reach some minimum size, and further attempts to compress it do nothing, and can even make it larger. This is why zipping your folder of mp3’s does little to reduce the overall disk size. The idea is that there is some basic information that can’t really be done away with or squeezed any further.

Algorithms are like data in a certain way. Specifically, they can be efficient or inefficient, similar to the way data can be compressed to fill space efficiently or inefficiently. Data that has a lot of redundancies and patterns that can be abstracted out can be considered an inefficient use of space. The reason is that a lot of the space goes towards storing data that contains information that is already stored elsewhere. The same thing can be said of algorithms that are inefficient, though I’ll be specifically referring to inefficient in running time. I believe the reason that some algorithms for certain tasks are faster than others is because they are more efficient in that they do not make redundant calculations or calculations that result in information that has already been obtained.

Any given algorithm is composed of many individual steps. Each of these steps can be grouped into one of three large categories:

  1. Steps that obtain information that makes the correct result more likely to be the result of the overall algorithm.
  2. Steps that obtain information that makes the correct result less likely to be the result of the algorithm.
  3. Steps that do not change the likelihood that the correct result will be obtained by the algorithm.

Theoretically, we would like to get rid of any steps that fall into the categories number 2 or 3. We certainly don’t want to have any steps that make it more likely that we will obtain an incorrect answer. And of course, steps that do not help the overall progress of the algorithm in obtaining the correct answer we will want to skip, in order to save time. Among the steps in category 1, we can break them down even farther into these three categories:

  1. Steps that contain no shared information with any previous result obtained by the algorithm. In other words, the step produces wholly new information towards the result.
  2. Steps that contain some shared information with a previous result. These steps contain some new information and some information that has already been obtained.
  3. Steps that produce information that has already been obtained. These steps are completely redundant.

Again, steps in categories 2 and 3 should be avoided. If a step produces some shared information, then the goal should be to produce that new information without the redundant information. And if a step is producing completely redundant information it should be skipped entirely.


No Comments Yet so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>