But what is Occam's razor really?

Home

Almost every problem that you come across is befuddled with all kinds of extraneous data of one sort or another; and if you can bring this problem down into the main issues, you can see more clearly what you are trying to do an perhaps find a solution. Now in so doing you may have stripped away the problem you're after. You may have simplified it to the point that it doesn't even resemble the problem that you started with; but very often if you can solve this simple problem, you can add refinements to the solution of this until you get back to the solution of the one you started with.”

Claude Shannon, lecture: Creative Thinking

The simplest explanation

I have a group chat of local friends we use to coordinate our weekend events among other things. One of the things we do is we wish each other happy birthday when these days roll around. My birthday is shared with another birthday, of a guy we'll call Nick (not his real name). Every year on my birthday, messages ring through the group chat, "Happy birthday Tyler and Nick" (except in German). But this year, it was only "Happy birthday Nick."

What happened? Was the whole group mad at me for some reason? I haven't done anything different. None of the conditions have changed since last year. At baseline, they are not mad at me. For any phenomenon, we got trained as scientists to find the simplest possible explanation. That would be the most likely hypothesis. This is known as Occam's Razor.

For example: where does lightning come from? One explanation is that it's electric discharges from the clouds, and another explanation is that it is a god of lightning in the sky who throws lightning bolts everywhere all the time. If we didn't do the Ben Franklin kite experiment, we would at least know that the second explanation requires validation of some superhuman supernatural being that no one has ever actually seen, which would require validation of the existence of a class of life form called "superhuman supernatural being" that no one has ever seen. The first explanation requires only known physics. So we would sift more weight to the known physics based hypothesis as opposed to the superhuman supernatural being based hypothesis. Do this at your own risk, depending on the ideology your culture is steeped in.

But "forgetting" and "mad at me" is a bit more subtle. I've seen both. There's no supernatural component. We have to get a bit a bit more technical. The spoiler alert is that they just forgot, and remembered later. I'm at an age now where I would rather forget my birthday when it rolls around, too. But this being said, let's go into the details of the different scenarios and show why forgetting is a simpler explanation, so you can get some intuition around Occam's razor.

The simplest computer program

To formalize Occam's razor, we have to think computationally. Let's look at "mat at me" and "forgot" explanations in more detail. For now, we will think at the level of python, as opposed to the level of Turing machines, which we would have to do if we really wanted to formalize this.

Forgetting.

Someone remembers it's Nick's birthday. Says happy birthday to Nick.

For each person:
    See that it's Nick's birthday given the group chat birthday wishes.
    Don't cross check.
    Say happy birthday to Nick.

Mad at me, individual level.

For each person:
    See that it's Nick and Tyler's birthday.
    Be mad at Tyler.
    Decide to exclude him from birthday wishes.

Mad at me, group level.

Everyone gets together, and decides that we're all mad at Tyler. 

Decide that they're just mad enough at Tyler to exclude him from the birthday wishes, after mulling over removal from the group and confrontation.

For each person:
    Follow the rules, and exclude Tyler from the birthday wishes.

We can see that high-level programs 2 and 3 require a lot of additional functions. Between being mad assuming not-mad as baseline (not common with my group), planning a passive aggressive activity (not common with my group at all), or everyone being mad at me at the individual level for the same or different reasons. Forgetting because we're busy and two birthdays on the same day is a bit of an edge case is a much simpler computer program that would run on my social graph.

So far, for the sake of simplifying the problem, we have made the assumption that we're dealing with a closed system. Me and my friends. But the reality is we're actually dealing with an open system. Your friends are connected to more people who are connected to more people and so on until Kevin Bacon. So it would be very hard to write the set of all computer programs across the broader social graph that could lead to the phenomenon you're observing. This is where figuring out probabilities and doing Bayesian updates as you gain more information comes in. While we'll talk about probabilities here, the topic of Bayesian epistemology will be for another time.

Anyway, the computational formalism of Occam's razor involves looking at a phenomenon and all computer programs that could produce the phenomenon, and choosing the simplest one as the most probable. This is also known as Solomonoff induction. As I've said earlier, this is intractable, especially if we start thinking about literal Turing machines producing bits that represent whatever phenomenon we're looking at. But it's still a nice way to think a bit more rigorously about Occam's razor, a mental model to add to your epistemic toolkit.

How simple = how much can you compress it

Probabilistic thinking and Solomonoff induction do intersect. To do this, consider the sequence [1, 2, 3, 4, 5]. Let's think of all the possible computer programs in existence that could produce this sequence. Let's drill down into two of them.

Computer program 1:

Set x to 1.
loop, 5 times:
    add 1 to x.
    print x.

Computer program 2:

print 1
print 2
print 3
print 4
print 5

But let's pretend that we're dealing with a sequence of [1, 2, 3, …. 1 billion]. We can compress this sequence using program 1, only 4 lines, which will output the sequence. Program 2 we cannot compress any further, at 1 billion lines. Program 1 is the simpler program, it is the most likely explanation by Solomonoff induction. But importantly, we can now think in terms of how much simpler program 1 (4 lines) is from program 2 (1 billion lines). Note that if we really wanted to be rigorous here, we'd have to look at a mathematical model of the implementation of each of these computer programs at the bit string level, but thinking in lines of code at least gives us some intuition here.

Probabilities, assumptions, and the conjunction fallacy

Ok, so we have an idea that the simplest explanation is most likely true, and a formulation that says that the simplest possible computer program that can output the phenomenon is most likely true. But that doesn't tell us why this is the case. Why are simpler explanations more likely to be true than complex explanations? To understand why, we next have to think in probabilities.

Let's think back to my friends again. We will simplify so we're only talking about one friend. We have two scenarios.

My friend didn't say happy birthday to me.
My friend is mad at me and didn't say happy birthday to me.

Which of these is more likely? If we simplify further, we have:

A.
A and B.

Let's look at it from a more sinister angle so you can see where Occam's razor, as a heuristic, comes in. We're going to look at a related problem. Consider the following description:

Description: Steve is a physically fit person in his 30s who did cross country in high school and college. He likes working out, and in particular doing endurance sports. He doesn't eat dessert and he doesn't drink alcohol. He gets up at 5am every morning to run at least 8 miles.

What is more likely?

Steve is a personal trainer.
Steve is a personal trainer who has run at least one marathon.

If you said 2, then you experienced the conjunction fallacy. What is this? Again, let's simplify.

Description: Blah blah blah.

What is more likely?

A.
A and B.

Even if the probability of Steve being a personal trainer is 0.9, and the probability of Steve having run at least one marathon 0.9, the combined probabilities are A*B, or 0.81, or less than just being a personal trainer. This will always be the case if the probabilities are both less than 1.

Going back to the initial case my birthday, we have one thing (not saying happy birthday) versus two things (not saying happy birthday and being mad at me). The one with the fewest conjunctions is the most likely because every conjunction decreases the probability assuming the conjunction has a probability of less than one.

A practical way to think of Occam's razor in light of the conjunction fallacy, is the idea that the simplest explanation makes the fewest assumptions. Each assumption has some probability of being true. Fewer assumptions have fewer probabilities being multiplied together, and therefore fewer opportunities to decrease the probability of being true. So I'm going to go with the explanation that has one embedded assumption as opposed to the one that has ten embedded assumptions.

Of course, we can make all of this more complicated if we start thinking in terms of not saying happy birthday GIVEN being mad at me. But this moves us toward Bayes Theorem, which needs another article altogether.

Conclusion

We started with Occam's razor. But I'm a computational biologist by training and these days I see the world in code. Naturally, I discovered and took kindly to Solmonoff induction, the computational formulation of Occam's razor, and I have stuck with that ever since. What it has done for me is help me think computationally about what Occam's razor really is, and especially what a simple explanation really is. What I find is if I'm trying to weigh explanations for a particular thing in my life, what helps is if I try to turn the explanations into computer programs, even if it's lines of pseudo-code as I did in this article, and to up-weight the smaller programs in terms of the probability of being true. I then used the conjunction fallacy to connect "simple" with "probable." In sum, I hope you'll be able to approach Occam's razor through a computational and probabilistic lens too the next time you have to make sense of something.