# Cognitive Toolkit Must-Haves

## Hueristics and mental models (that you'll thank me for later)

These past few days I’ve been joining friends at the latest Effective Altruism meetup: EA Global x Boston (or at least most recent that’s with subway distance). For those of you that have heard of Effective Altruism, you’re aware that it’s a movement centered around figuring out how to maximize the amount of good one can do in the world. Topics at EA events range from poverty alleviation, mass vaccination, animal welfare, and even mitigating existential risks (though I sometimes feel like too many engineers jump on the AGI bandwagon because it’s fashionable).

One of the more interesting ideas I’ve seen emphasized is epistemic responsibility. In short, we have a responsibility to think rationally about our beliefs and make sure they stand up to scrutiny. Even if we do not vocalize our beliefs and opinions, these influence our actions and how we behave in the world (and by extension influence others). There’s a lot we can do to take stock of our beliefs, but many of us might be looking for quicker ways to do this than René Descartes’ approach of devoting an entire career to this problem. While there are plenty of cliff-notes on things like logical fallacies (things to be removed from our reasoning), there’s not many equivalent cliff-notes on concepts to add to our toolbox instead.

If you haven’t seen it already, Peter McIntyre’s Conceptually is currently one of the closest things I’ve seen to such cliff-notes, though at the time of this writing it’s still in it’s very early stages. Since I was inspired by Peter McIntyre’s page (and partly because I’m to impatient to wait for Peter’s updates), I decided to jump on the bandwagon and list a few concepts, general rules/laws, hueristics, and other mental models that I think should be spread around. It’s worth noting that some of these may be contradictory. That’s because while listing them here is advocating for their usefulness in some cases, they aren’t all useful in all possible contexts (I will try to mention exceptions where possible).

If you’ve seen my recent works, you may be aware that I’ve been (temporarily) turning away from Biotech and Aging research and dabling in Machine learning more. You may be asking, “Is this geared towards biologists, or towards ML engineers?”, to which my answer is “yes”.

Here are my top 20 picks for useful concepts (why 20? These were just the ones off the top of my head when I wrote this). Let’s go…

• General Logic and Reasoning

• [1] Occam’s Razor
• [2] George Box’s “All Models Are Wrong”
• [3] Ex Ante and Ex Post
• [4] The Pareto Principle (a.k.a. the 80/20 Rule)
• [Networks and Groups of People

• [5] Dunbar’s Number
• [6] Reed’s Law
• [7] Metcalfe’s Law
• [Building things

• [8] Gall’s Law
• [9] Goodhart’s Law
• [10] Kernighan’s Law
• [People and motivations

• [11] Zero, Positive, and Negative Sum Games
• [12] Amara’s Law
• [13] Hanlon’s Razor
• [14] Hick’s Law
• [Project Management

• [16] Parkinson’s Law
• [17] Brooks’ Law
• [18] The Law of Triviality
• [General Approaches to Life

• [19] Foxes and Hedgehogs
• [20] Wheaton’s Law

## General Logic and Reasoning

### 1. Occam’s Razor

You’ve probably heard this one before in the form “the simplest explanation is usually the correct one”. This isn’t exactly accurate (definitely not in the form used by conspiracy theorists everywhere). In it’s original form, Occam’s Razor sounded more like this:

“Entities should not be multiplied without necessity.”

—— William of Ockham

What this says is that among several possible solutions, the most likely solution is the one with the least number of concepts and assumptions. The key is unnecessary complexity. This solution is the simplest and solves only the given problem, without introducing accidental complexity and possible negative consequences. One common trend among conspiracy theorists is to take the first, incorrect interpretaton (“Evil agent X being secretly responsible for event Y is the simple statement”), which runs counter to the second, correct interpretation (“Evil agent successfully pulling off the plan for event Y requires a lot of unproven assumptions, compared to more benign explanations for event Y”). In the space of statistics and machine learning, there are more formal versions of this (i.e., the model that peroforms best while using the fewest number of parameters is the most robust).

### 2. George Box’s “All Models Are Wrong”

“All models are wrong, but some are useful.”

——George Box

This is pretty straightforward. All models of systems are flawed, simply due to the fact that the models are simpler than the more complex systems in question. Despite that, the models are still useful as long as they are not too flawed. If the model is too simple, then it might be as useless as no model at all. If the model is to detailed, it might be too complex to use. Like with Ockham’s Razor, this can be applied to statistics, scientific research, and computing models.

Also like Ockham’s Razor, there is a long history of this showing up in literature and philosophy. In fact, “All models are wrong” is the likely successor to “confusing the map with the territory”. Jorge Luis Borges’ short story “On Exactitude in Science” takes this idea to its extreme. It describes a fictional empire where cartography becomes so exact that its map-makers seek to built a perfect 1:1 scale map of the empire. Unfortunately for everyone, they succeed, and the entire empire is burried in a map too large and cumbersome to use (not that it has any use, since mapping every rock and puddle resulted in the map not even doing the information compression you’d expect from a map).

### 3. Ex Ante and Ex Post

Ex Ante and Ex Post reasoning (Ex post is also known as post hoc reasoning). What distinguishes them is really just whether you make a hypothesis/prediction before or after an event (ex ante is making a prediction before the event, while ex post refers to reasoning after the event). While this difference might seem inconsequential, it is critical to how we define science vs. pseudo-science. Karl Popper famously made the distinction between the two in the frame of ex ante and ex post reasoning. For example, in comparing Einsten’s theory of relativity to Freud’s theories of psychoanalysis, Einstein made testable hypotheses about the measurements of the transit of Mercury (a hypothesis that famously proved incredibly accurate).

Ex ante vs ex post is also relevant to many cognitive biases. You’ve probably heard the phrase “Hindsight is 20/20”. That perfectly describes the appeal of ex post reasoning: it’s easy to come up with hypotheses that fit past data, but it’s much harder to do this for future events. It’s worth taking this into account whenever you hear advice from anyone, from successful startup founders to lottery winners. Both may extoll that starting a company or playing the lottery was a great decision, and may even go as far as to explain why they succeeded. Most of these explanations are far less useful when trying to predict which people will go on to found successful companies in the future (this is why VC firms hedge their bets, and try to use a few big successes to offset the losses from bad bets) or which people will win the jackpot (even more futile).

### 4. The Pareto Principle (a.k.a. the 80/20 Rule)

Oddly enough, the Pareto principle didn’t come from Italian economist Vilfredo Pareto directly. Rather, it was named after him by management consultant Joseph M. Juran, who noticed that in Pareto’s work, Cours d’économie politique, Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population.

The Pareto Principle suggests that in some cases, the majority of results come from a minority of inputs. This principle is also known as: The 80/20 Rule, The Law of the Vital Few, and The Principle of Factor Sparsity. Some examples:

• 80% of a certain piece of software can be written in 20% of the total allocated time
• The hardest 20% of a project takes 80% of the time.
• 20% of the effort produces 80% of the result (see Warren Buffet’s 5/25 rule)
• 20% of a businesses’ customers will create 80% of the profit.
• 20% of the bugs cause 80% of the crashes (happened almost exactly with microsoft)
• 20% of the features cause 80% of the usage
• 80% of what Pareto is known for coming from less than 20% of his work (this is more of a joke, but still technically a case of unequal distribution)

While it’s not always exact (sometimes it’s more like a 90/10 or 99/1 rule), the general idea of lopsided outcomes is always important to remember.

## Networks and Groups of People

### 5. Dunbar’s Number

If you’ve ever struggled to keep up with all the people in your Facebook friends list, contacts list, or even the people you knew from high school, you’re not the first to notice this. “Dunbar’s number” is a rough estimate of how many people one can maintain stable social relationships with, bounded by limitations of human cognition (as guessed by observations of maximum hunter-gatherer tribe sizes, for example).

This might sound kind of vague, because it is kind of vague. The “relationships” have been defined by each person knowing who the other is, to Dunbar’s inexact definition as “the number of people you would not feel embarrassed about joining uninvited for a drink if you happened to bump into them in a bar.” Similarly, the exact number has ranged (depending on who is estimating) from 100 to 250, though 150 is the commonly cited number.

Regardless of the precise number, it’s worth keeping in mind for the sake of recognizing how group dynamics change once they reach a certain inflection point. For example, a few sociologists have even tried using Dunbar’s number as an explanation of why socialism (yes, I’m aware there are many and even competing definitions of that loaded term) is difficult to maintain at national levels. There are many examples of tribes operating under “gift economies”, where tribesmembers may gift excess resources to others. Motivation for this behavior may range from establishing trust (as to receive aid when in need in the future), to avoiding seemling like a free-loader and risking getting kicked out from the tribe. Regardless of the exact motivation, most share in common the fact that they become much weaker motivators when the group becomes large enough for “strangers” to exist (i.e., members that are known personally by a fraction of the group, but not the whole group).

A more relevant example might be how companies change when they grow. When a startup starts to grow beyond 100 people, how well it can grow beyond this point will be dependent on what kinds of formal rules and processes it sets now that it’s easier for some employees to be relatively anonymous to each other.

### 6. Reed’s Law

“The utility of large networks, particularly social networks, scales exponentially with the size of the network.”

——

If Dunbar’s number imposes limits on human networks, then Reed’s law motivates larger networks. Reed’s law is often phrased as an extension of graph theory, where utilty is proportional to the number of possible sub-groups, which in turn grows super-linearly with respect to the total participants (nodes) or pairwise connections (edges).

This might be an over-simplification (after all, if we take Dunbar’s number seriously, there are real limits to the number of possible sub-groups). Despite that this offers a clear motivation behind creating large networks of people.

### 7. Metcalfe’s Law

“In network theory, the value of a system grows as approximately the square of the number of users of the system.”

Similar, though more general than, Reed’s Law, Metcalfe’s law can be applied beyond human social networks to telecommunications, distributed computing, and even government.

While this interpretation is less common, it’s also worth noting that not all value is positive, and that this law can refer to some problems that scale exponentially with network size as well (e.g., contagiousness of harmful memes on social media, how easily communicable diseases can be spread through shipping and air-travel hubs, etc.).

## Building things

### 8. Gall’s Law

“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”

—John Gall

Gall’s law might be to engineering complex systems from scratch, what natural seletion is to intelligent design. Because complex systems have many more possible points of failure, it’s exceedingly difficult to build them all at once. Whether it’s circuit design or city transportation networks, or even the global internet, complex designs usually develop by iteratively building newer additions to simpler designs.

The internet is the classic example. Many private companies tried creating their own contained networks. The idea was that they would have admin access, as well as advertising rights. The problem was that growing the content on such networks with only a few content-makers at the helm became intractable very quickly. By comparison, the TCP/IP protocols that make the backbone of the internet were shared with academics, and then everybody, allowing a simple system shared by just a few research labs to grow to encompass the entire world.

### 9. Kernighan’s Law

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

—Brian Kernighan

Kernighan’s Law is named for Brian Kernighan and derived from a quote from Kernighan and Plauger’s book The Elements of Programming Style:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

While hyperbolic, Kernighan’s Law makes the argument that simple code is to be preferred over complex code, because debugging any issues that arise in complex code may be costly or even infeasible.

If you really need to remember this, you can remember this in the form of it’s alternate formulation: KISS

“Keep it simple, stupid”

The KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided. Originating in the U.S. Navy in 1960, the phrase has been associated with aircraft engineer Kelly Johnson.

The principle is best exemplified by the story of Johnson handing a team of design engineers a handful of tools, with the challenge that the jet aircraft they were designing must be repairable by an average mechanic in the field under combat conditions with only these tools. Hence, the “stupid” refers to the relationship between the way things break and the sophistication of the tools available to repair them, not the capabilities of the engineers themselves.

### 10. Goodhart’s Law

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

—Charles Goodhart

Also commonly referenced as:

“When a measure becomes a target, it ceases to be a good measure.”

—Marilyn Strathern

If you revolve an optimization around only a small number of metrics, you might get unintended outcomes. One of the most common examples is standardized tests. While scores were originally intended to act as stand-ins for aptitude, as time goes on they become blurred by those who are extremely talented at gaming the system.

## People and motivations

### 11. Zero, Positive, and Negative Sum Games

Thee terms all refer to different types of competitions, arenas, or ‘games’ where people and organizations compete. They are defined by how the rewards for victory (or costs of defeat) are distributed, often using analogies such as the “size of the pie”. The “pie” could corespond to anything from monetary gain/loss, power, land, or even points in an actual game for amusement.

• positive-sum games: It is possible to expand the size of the pie, allowing all participants to gain more at the game’s end than they had at the beginning (i.e., “win-win situations”, “win-win-win situations”, or “wins” for however many players there are). Examples include negotiation where both parties gain more than they concede, and idealized free market economies.
• zero-sum games: A player can only win if another player loses. The size of the “pie” is constant (i.e., the net change is zero). Examples include games such as chess or go, pre-industrial societies that could only gain more resources by invading another, classes where all scores are graded “on a curve”).
• negative-sum games: Even if there is a winner, the overall size of the pie may shrink. Examples of this may include spiteful divorce proceedings (where both peoples’ goals are to make sure the other walks away with less rather than preserving posessions for themselves), numerous historical examples of Pyrrhic_victories or “scorched-earth tactice”, or scenarios such as global thermonuclear war (even the “victor” would suffer irreperable losses).

### 12. Amara’s Law

“We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”

—Roy Amara)

The Hype Cycle is an inexact visual representation of the excitement and development of technology over time (as the name suggests, originally produced by Gartner).

In general, this pattern suggests that any new technology, from radio to electricity to recombinant DNA to blockchain, goes through an initial phase of ramping up interest (“Oh, technology X could be useful for something”), followed by a peak of inflated expectations (“Oh my god! Technology X is going to solve everything!”), then a trough of disillusionment (“Technology X hasn’t lived up to the ads! it’s useless!”), followed by a more gradual ramping back up of expectations (“Okay, we can see now where Technology X is appropriate and where it’s not”).

If a technology hasn’t encountered one of these stages, this law suggests it might in the near-term future.

### 13. Hanlon’s Razor

“Never attribute to malice that which is adequately explained by stupidity.”

—Robert Hanlon

This principle suggests that actions resulting in a negative outcome were not a result of ill will. Instead the negative outcome is more likely attributed to those actions and/or the impact being not fully understood.

This is somewhat similar, but not entirely identical, to the concept of Chesterson’s Fence

Reforms should not be made until the reasoning behind the existing state of affairs is understood.

The name of this principle comes from a story by G.K. Chesterson. A man comes across a fence crossing the middle of the road. He complains to the mayor that this useless fence is getting in the way, and asks to remove it. The mayor asks why the fence is there in the first place. When the man says he doesn’t know, the mayor says, “If you don’t know its purpose, I certainly won’t let you remove it. Go and find out the use of it, and then I may let you destroy it.”

This principle is relevant in software engineering when removing technical debt. Each line of a program was originally written by someone for some reason. Chesterson’s Fence suggests that one should try to understand the context and meaning of the code fully, before changing or removing it, even if at first glance it seems redundant or incorrect.

The difference between Hanlon’s Razor and Chesterton’s Fence is that, while both can be applied to situations of perceived malice, the first involves not immediately assigning malice as the motivator while the second refers to taking the time to understand the motivations behind a decision or design in the first place.

### 14. Hick’s Law

“Decision time grows logarithmically with the number of options you can choose from.”

—William Edmund Hick

This law is an attempt at quantiying something we’ve all encountered at some point, choice paralysis. In the equation below, $T$ is the time to make a decision, $n$ is the number of options, and $b$ is a constant (determined by something like reaction time or processing speed).

$T = b \dot log_2 (n+1)$

This is a simplifed version. This formulation assumes the options are ordered (implying that the decision-maker would be doing some kind of binary search variant), while other researchers have tried pegging the constants of this equation to IQ.

Nonetheless, it’s always important to keep in mind this constraint when presenting anyone with choices: If you’re looking for a quick decision, go with fewer choices, but if you’re looking for time-consuing decision, you’ll get diminishing returns on larger choice selections.

## Project Management

“It always takes longer than you expect, even when you take into account Hofstadter’s Law.”

You might hear this law referred to when looking at estimates for how long something will take. You may even see “loading time” pages give incorrect estimates of how long something will take. If a project has ever taken longer than you expected, at least take solace in the fact that you’re not alone.

It’s not completely hopeless, though. As time goes on, you can still get better at making your timeline predictions match up with how they play out in real life.

### 16. Parkinson’s Law

“Work expands so as to fill the time available for its completion.”

Some teams, if they ever do make progress ahead of schedule, might still finish a given objective on the intended date simply by filling up the excess time with less cirital tasks, or by becoming less efficient in countless ways. This law was first applied to beuracracies, but it can also apply to projects ranging from consruction to writing to software. If this law were combined with Hofstadter’s Law from above, we can get an even more dire outome: work expanding to fill the available time and still taking longer than expected.

Some organizations go far beyond the Parkonson-Hofstadter combo with the Shirky Principle:

“Institutions will try to preserve the problem to which they are the solution.”

—Clay Shirky_

This suggests that complex solutions (companies, industries, non-profits, or technologies) can inadvertently (or even intentionally) perpetuate the problems that demanded their existence in the first place.

As Upton Sinclair put it.

“It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

—Upton Sinclair

### 17. Brooks’ Law

Adding human resources to a late software development project makes it later.

If you find yourself on a project that is taking much longer than anticipated, especially a software or data science project, adding more people to the project won’t make it faster. Usually, it will just make it worse.

While there is benefit of being able to have people work on a task in parallel, there will also be plenty of time lost to ramping-up the newcomers, establishing commuication rules, correcting mistakes from miscommunication, and slack owing to the fact that some processes aren t paralellizable (“Nine women can’t make a baby in one month”).

### 18. The Law of Triviality

Sometimes, groups will give far more time and attention to trivial or cosmetic issues over the serious and substantial ones.

I remember hearing the story about how Google Glass was first conceived. The meeting started with some of the most famous thinkers at Google gathered in a Google X conference room. Larry Page put up the idea of being able to access all of Google’s information instantly. The discussion quickly landed on a concept for a wearable interface, that would allow users to instantly get the information in their field of vision no matter where they were requesting it from.

That was the first 20-30 minutes of the meeting.

The next ~45 minutes was spent arguing over what color the interface should be (red was ultimatley chosen, but this was abandoned after the first practial tests in favor of blue).

More generally, this is sometimes the reason why passing certain rules or legislation can take so long. Most of the debating parties may agree on the majority of the content, but more than half of the debate might be focused on a few small details or addtions.

## General Approaches to Life

### 19. Foxes and Hedgehogs

One of the more confusing entries in this list, Hedgehogs vs. Foxes refers to two ways of viewing the world. Fox-like thinking refers to making conclusions from widely varied pieces of information from different viewpoints and angles (i.e., getting multiple perspectives on a topic). Hedgehog-like thinking is the opposite, as it involves world-models built according to some overarching theme or principle. “Foxes” are typically more willing to admit uncertainty in their worldviews, while “Hedgehogs” are typically more confident. Even if a “hedgehog” does correctly predict some event, they might be right for the wrong reasons.

### 20. Wheaton’s Law

There’s all kinds of forumulations of the “Golden Rule”, which has resulted in countless variations (some contradictory, some deliberately confusing, some turned into legalese). The addition of a “Silver Rule” generally hasn’t made much of an impact on a lot of people. Rather than beat that dead horse, here’s an alternative:

“Don’t be a dick.”

—Wil Wheaton

If you take to heart any of the laws, principles, or rules from this list, perhaps it should be this one.

Cited as:

@article{mcateer2018cogtoolkit,
title   = "Cognitive Toolkit Must-haves",
author  = "McAteer, Matthew",
journal = "matthewmcateer.me",
year    = "2018",
url     = "https://matthewmcateer.me/blog/cognitive-toolkit-must-haves/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!