Filed under: algorithms, complexity, computer science | Tags: algorithms, computational complexity
Information theory has a very developed and precise grasp on what it means for data to be “compressed.” Specifically, the data has to be essentially random. If there is any way to describe a pattern in the data that would allow the overall description to be shorter than the data itself, then it is possible to compress it further. When compressing however, one must consider both the data and the decompresser.For instance, you could pass a message to your pal that simply said “2″ and that would mean, “Read message number 2 from the code book.” The message could have any arbitrary amount of complexity to it, but this doesn’t mean you’ve compressed the long message down to simply the number 2. You have to consider the entire codebook as part of the overall message. And if you are training a computer to look up these coded messages, you have to include the rules that the computer follows to look things up as well.
The point of all this is that it was long ago discovered that there is a minimum theoretical level to which a message can be compressed. You can’t keep compressing things down forever, it will eventually reach some minimum size, and further attempts to compress it do nothing, and can even make it larger. This is why zipping your folder of mp3′s does little to reduce the overall disk size. The idea is that there is some basic information that can’t really be done away with or squeezed any further.
Algorithms are like data in a certain way. Specifically, they can be efficient or inefficient, similar to the way data can be compressed to fill space efficiently or inefficiently. Data that has a lot of redundancies and patterns that can be abstracted out can be considered an inefficient use of space. The reason is that a lot of the space goes towards storing data that contains information that is already stored elsewhere. The same thing can be said of algorithms that are inefficient, though I’ll be specifically referring to inefficient in running time. I believe the reason that some algorithms for certain tasks are faster than others is because they are more efficient in that they do not make redundant calculations or calculations that result in information that has already been obtained.
Any given algorithm is composed of many individual steps. Each of these steps can be grouped into one of three large categories:
- Steps that obtain information that makes the correct result more likely to be the result of the overall algorithm.
- Steps that obtain information that makes the correct result less likely to be the result of the algorithm.
- Steps that do not change the likelihood that the correct result will be obtained by the algorithm.
Theoretically, we would like to get rid of any steps that fall into the categories number 2 or 3. We certainly don’t want to have any steps that make it more likely that we will obtain an incorrect answer. And of course, steps that do not help the overall progress of the algorithm in obtaining the correct answer we will want to skip, in order to save time. Among the steps in category 1, we can break them down even farther into these three categories:
- Steps that contain no shared information with any previous result obtained by the algorithm. In other words, the step produces wholly new information towards the result.
- Steps that contain some shared information with a previous result. These steps contain some new information and some information that has already been obtained.
- Steps that produce information that has already been obtained. These steps are completely redundant.
Again, steps in categories 2 and 3 should be avoided. If a step produces some shared information, then the goal should be to produce that new information without the redundant information. And if a step is producing completely redundant information it should be skipped entirely.
Filed under: programming
There are two different kinds of hacking. Of course, that’s a pretty general statement, and we can certainly divide the category of hacking into two parts many different ways (white hat vs black hat, hardware vs. software, etc), but the way in which I am speaking today is not a functional split. It is a split in environment in which the action is undertaken, rather than a difference in what task is actually performed. So far, I’m being pretty opaque, so let’s get to specifics.
I refer to hacking in probably something like the original sense of the term, which is doing something clever and resourceful with something that wasn’t intended for the purposes it has been put to by the hacker. This could mean infiltrating a remote computer using a buffer overflow attack, or writing new software for your iPod so it can play Doom. The key element here is constraint. The actions of the hacker are constrained in some way, and he must use intelligence and experience and whatever else is at his disposal to accomplish some goal despite those constraints. If my definition is allowed to be stretched, any work done around constraints can be viewed as hacking (For example, see the excellent Dinosaur Comics, in which the author uses the same image but different text each day to make something new).
If hacking is about constraints, how about we divide up hacking into two different types based on whether the constraints are self-imposed or imposed by some external agent. In the case of the hacker trying to gain access to a remote system he is not allowed into, the constraint is imposed by the admins of the remote system. They put into place security measures to prevent the hacker from gaining access, and if he wants to gain access he must twist around these constraints and find weaknesses that can be put to his use. In the case of the hardware hacker putting Doom on his iPod, the constraints are externally determined by the architecture of the device, but the choice of device is at the discretion of the hacker. While the hacker into a remote system may only be able to get what he wants from that specific server, if the hardware hacker decides it’s too hard to work on an iPod, he can do what he wants on some other device.
The far end of the self-constraint/external-constraint spectrum is the hacker who is not constrained externally at all. This is where we come to the crux of the “vs.” mentioned in the title. The hacker who places constraints on himself can be either lauded as a genius, or despised for implementing a terrible solution. The difference is the environment in which the “hack” is executed. If the hacker is on his own time and chooses constraints to make something challenging, and comes up with a solution that is unexpected, it doesn’t affect others. The pejorative “hack” comes when the hacker was simply too lazy to come up with a proper solution, and his only “constraint” was that he didn’t want to take the time to do something the “right way.”
So let’s focus the discussion on professional programmers who are employed for a purpose, or to open source programmers working with others toward a common goal. On the one hand, the externally constrained hacker is the one who the night before the expected launch of a program finds a massive bug that would require rewriting 40% of the code base to fix “the right way.” He implements a hack, and is a hero the next day. On the other hand is the programmer whose code is just one long string of hacks, completely unmaintainable and unreadable by anyone but the original hacker. This man is either dropped for incompetence, or kept on forever because he’s the only one who can maintain the code. Either way no one (save perhaps himself) likes what he’s done.
In the former example, we may raise issues with how the project was structured that such a major fundamental flaw was found only at the last minute, but in general most people would agree what the hacker did was right. There was no time available, the constraints were placed on him. In the latter case, it’s a safe bet everyone would blame the “hacker” who couldn’t be bothered to stop hacking. In both cases, the hacked code probably looks similarly incomprehensible, or may not scale, or may make the code base more fragile, but the circumstances are what decide whether these weaknesses are acceptable or not.
The good hacking of the first example can turn into the bad hacking of the second example if the architectural problems are not rectified. The time constraint of the first example evaporates at some point, and now you just have bad code sitting in there. There might be some other constraints (such as budget), but if the goal is quality, that hack done in the nick of time needs to be rectified. Here is the reason why: Hacking is a response, not a strategy. Hacking is a riposte, not an attack. If you are not backed into a corner, you shouldn’t be risking your life to get out.
If there is no constraint, you shouldn’t be hacking.
Filed under: Uncategorized
This morning I got addicted to a crazy little flash game called Cursor*10
that is pretty mind boggling. It’s fun once you start to realize what you need to do. So in order to suck all the fun out of it for anyone who is stuck on it, here is my 100% walkthrough of the game to get all the triangles and finish the game with the maximum of 188 points.
Guide *Spoilers obviously*
- Cursor #1: Go immediately to floor 4, find the box, then go down and clear out the bottom 3 floors of triangles.[EXTREMELY IMPORTANT! Remember which box the staircase was in since it is randomly placed every game.] If you have time to spare, go back to the box room and memorize where the box with the stairs is.
- Cursor #2: Go as quickly as possible to floor 8, then hold the button until 400. After that, go back down skipping the triangles on 8, and clear out 7, then 6, then 5.
- Cursor #3: Go quickly to floor 9, clear it out, then go back down to floor 8 and clear it out.
- Cursor #4: Go quickly to to floor 10, and click on the box with the “100″ as fast as you can until you have 200 cursor life. (Don’t worry about finishing off the box, your future cursors will do it.) Then race back down to floor 6 and hold the button for your
remaining life. You should be on the button before 75 cursor life. - Cursor #5: Go quickly to floor 10, click furiously on the “100″ box until it opens, then clear out the triangles from floor 10.
- Cursor #6: Go to floor 10, click furiously, then clear floor 11 and 12.
- Cursor #7: Go to floor 10, click like crazy, go to floor 13, clear it of triangles, then go to 15 and hold the leftmost button until the end.
- Cursor #8: Go to floor 10, click a lot, go to floor 14, clear it of triangles. Then go to 15 and hold the center button until theend.
- Cursor #9: Go to floor 10, click a little, saunter up to 15, clear it of triangles and hold the rightmost button until the end.
- Cursor #10: Go to floor 10, it should take care of itself, go to 15 and wait until your score says 88, then click on the staircase to floor 16.
Hope this is useful to somebody! I am friggen exhausted after doing all the research for this little guide…

How’s that for a post title that gives me a lot to talk about? Obviously, I can only talk about things I know I am ignorant of. Also, I will restrict the topic to the vague category of computer science, so that will narrow down the list. And what the hell, I’ll make it a top ten list, since those seem to be so popular these days. So, without further ado, here is the list of 10 things that finish the sentence, “I don’t know… “
#1 “What monads are.” — I have tried and tried to get a straight answer to this one. I have not tried terribly hard, but that’s how these things happen. I know monads are some kind of tricky way to avoid side effects for certain things like IO in functional languages. I know that they’re a transplant from category theory in mathematics, but to date monads are a black magic as far as I’m concerned.
#2 “What currying is.” — Listen. I will just tell you that I have not investigated this one. I keep seeing it thrown around as a feature, or mechanism certain languages have, but haven’t looked into it. A quick trip to Wikipedia would probably cure me, but I need to pad this list a bit.
#3 “What the minimal set of features for a Turing complete language are.” — All modern programming languages are Turing complete, sure. SQL is not. Regular expressions are not. I have an intuitive sense of the ways these things are different from real programming languages. Currently though, if you gave me a brand new programming language you just invented, I couldn’t tell you with certainty whether or not it was Turing complete.
#4 “How to program in Smalltalk, Haskell, Scheme, Objective-C, OCaml, Tcl/Tk, Brainfuck, Befunge, and Python.” — This list only includes languages that A: I could think of off the top of my head, and B: languages I’m interested in learning eventually. Scala almost made this list, but I don’t like the JVM enough to get excited. .NET can likewise go away. I won’t list all the reasons I am interested in these languages, but Befunge is worth learning specifically because it is 2D.
#5 “How to program x86 with all the neat extensions like MMX, and SSE 1,2,3 etc. “ — I have programmed assembly for the Motorola 68000 for a class on computer organisation, but the huge instruction set available on a modern Intel machine intrigues and mystifies me.
#6 “Whether P = NP “ — Luckily, I am in good company on this one. Personally, I don’t think they are equal, but the optimist inside me always holds onto hope. The rate at which mankind would progress following a constructive proof that P=NP would be something to marvel at.
#7 “The exact relationship between time complexity and space complexity.” — Everyone seems to have the intuition that there is always a tradeoff between the amount of computation that must be done and the amount of storage space required. Store a bunch of lookup tables and you can reduce your computational load. Increase the complexity of your algorithm and skip making up a bunch of tables. It seems very fundamental, but is there an exact mathematical theory for a law of “conservation of complexity” or something similar? I don’t know. And I promised myself I wouldn’t go to Wikipedia while making this list. (Actually I already cheated, because I looked up Befunge to make sure it was the 2D language I was thinking of. It was, so I claim it doesn’t count!)
#8 ” enough about reversible computation.” — This one is stretching the format a bit, but essentially, I know a little about reversible computing, but not anywhere near as much as I would like to. The basic idea is really neat: theoretically cool processors. Information isn’t destroyed in the process of computing, so no heat is generated. The sad part is that I have told you basically everything I know about reversible computing in this little paragraph.
#9 “How to write a compiler.” — I am sorely lacking in the skills to write compilers of computer languages. I am more specifically lacking in the skills to write compilers for 2D languages…
#10 “How to parallelize quicksort in Erlang.” — In theory, it should be easy to make divide and conquer algorithms parallel, but I cannot for the life of me figure out how to do this one. It is probably more a result of my lack of competency with Erlang than with any real barrier to writing such an algorithm… But there you go: number 10.
There are a lot more things I could write in here, like: “How long sandwich meat keeps in the fridge.” and, “When it’s finally going to cool down in Florida. It’s still 70 in the middle of December.” and “Who let the Dogs out?” But I think I will leave it this length for the time being.