# Cognitive Toolkit Must-Haves

## Hueristics and mental models (that you'll thank me for later)

These past few days I’ve been joining friends at the latest Effective Altruism meetup: EA Global x Boston (or at least most recent that’s with subway distance). For those of you that have heard of Effective Altruism, you’re aware that it’s a movement centered around figuring out how to maximize the amount of good one can do in the world. Topics at EA events range from poverty alleviation, mass vaccination, animal welfare, and even mitigating existential risks (though I sometimes feel like too many engineers jump on the AGI bandwagon because it’s fashionable).

One of the more interesting ideas I’ve seen emphasized is epistemic responsibility. In short, we have a responsibility to think rationally about our beliefs and make sure they stand up to scrutiny. Even if we do not vocalize our beliefs and opinions, these influence our actions and how we behave in the world (and by extension influence others). There’s a lot we can do to take stock of our beliefs, but many of us might be looking for quicker ways to do this than René Descartes’ approach of devoting an entire career to this problem. While there are plenty of cliff-notes on things like logical fallacies (things to be removed from our reasoning), there’s not many equivalent cliff-notes on concepts to add to our toolbox instead.

If you haven’t seen it already, Peter McIntyre’s Conceptually is currently one of the closest things I’ve seen to such cliff-notes, though at the time of this writing it’s still in it’s very early stages. Since I was inspired by Peter McIntyre’s page (and partly because I’m to impatient to wait for Peter’s updates), I decided to jump on the bandwagon and list a few concepts, general rules/laws, hueristics, and other mental models that I think should be spread around. It’s worth noting that some of these may be contradictory. That’s because while listing them here is advocating for their usefulness in some cases, they aren’t all useful in all possible contexts (I will try to mention exceptions where possible).

If you’ve seen my recent works, you may be aware that I’ve been (temporarily) turning away from Biotech and Aging research and dabling in Machine learning more. You may be asking, “Is this geared towards biologists, or towards ML engineers?”, to which my answer is “yes”.

Here are my top 20 picks for useful concepts (why 20? These were just the ones off the top of my head when I wrote this). Let’s go…

## General Logic and Reasoning

### 1. Occam’s Razor

You’ve probably heard this one before in the form “the simplest explanation is usually the correct one”. This isn’t exactly accurate (definitely not in the form used by conspiracy theorists everywhere). In it’s original form, Occam’s Razor sounded more like this:

“Entities should not be multiplied without necessity.”

——William of Ockham

What this says is that among several possible solutions, the most likely solution is the one with the least number of concepts and assumptions. The key is unnecessary complexity. This solution is the simplest and solves only the given problem, without introducing accidental complexity and possible negative consequences.

### 2. George Box’s “All Models Are Wrong”

“All models are wrong, but some are useful.”

——George Box

This principle suggests that all models of systems are flawed, but that as long as they are not too flawed they may be useful. This principle has its roots in statistics but applies to scientific and computing models as well.

A fundamental requirement of most software is to model a system of some kind. Regardless of whether the system being modeled is a computer network, a library, a graph of social connections or any other kind of system, the designer will have to decide an appropriate level of detail to model. Excessive detail may lead to too much complexity, too little detail may prevent the model from being functional.

### 3. Ex Ante and Ex Post

Ex post is also known as post hoc reasoning

What is ex ante and ex post thinking? Ex ante means “before the event”. When you’re making a prediction, you’re doing so ex ante. The opposite of ex ante is ex post, which means after the event. This is a useful framework because people often conflate the two in their reasoning. The ‘Expected Value’ entry made the claim that buying a lottery ticket was a bad idea, but I never specified the point at which you were deciding it was irrational. Buying a lottery ticket loses you money ex ante (in expectation), but if you win, it was the right decision ex post. This distinction means we should focus on making the right decision with the information we have available to us, not on making the perfect decision. Examples of ex ante and ex post thinking If a fund manager succeeds in substantially outperforming the market, ex post they made the right decisions. However, they could have used a monkey throwing darts to pick their stocks, so ex ante they were borderline-negligent, and you probably shouldn’t trust them with your money. Hindsight bias, also known as the ‘knew-it-all-along effect’ is the inclination, after an event has occurred, to see the event as having been predictable, despite there having been little or no objective basis for predicting it. This is why in science we make testable predictions to guard against our tendency to make up plausible-sounding but wrong post-hoc explanations. As Philip Tetlock has shown in Superforecasting, we need to make a lot of predictions to track whether people were right ex ante. They could get lucky, and to our brain an event with a 40% and a 20% chance don’t seem that different, so we need to average over a lot of predictions (a Brier score) to figure out if someone was right.

### 4. The Pareto Principle (a.k.a. the 80/20 Rule)

Oddly enough, the Pareto principle didn’t come from Italian economist Vilfredo Pareto directly. Rather, it was named after him by management consultant Joseph M. Juran, who noticed that in Pareto’s work, Cours d’économie politique, Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population.

The Pareto Principle suggests that in some cases, the majority of results come from a minority of inputs. This principle is also known as: The 80/20 Rule, The Law of the Vital Few, and The Principle of Factor Sparsity. Some examples:

• 80% of a certain piece of software can be written in 20% of the total allocated time
• The hardest 20% of a project takes 80% of the time.
• 20% of the effort produces 80% of the result (see Warren Buffet’s 5/25 rule)
• 20% of a businesses’ customers will create 80% of the profit.
• 20% of the bugs cause 80% of the crashes (happened almost exactly with microsoft)
• 20% of the features cause 80% of the usage
• 80% of what Pareto is known for coming from less than 20% of his work (this is more of a joke, but still technically a case of unequal distribution)

While it’s not always exact (sometimes it’s more like a 90/10 or 99/1 rule), you can gauge how useful a rule is by how many derivatives of it pop up.

## Networks and Groups of People

### 5. Dunbar’s Number

“Dunbar’s number is a suggested cognitive limit to the number of people with whom one can maintain stable social relationships— relationships in which an individual knows who each person is and how each person relates to every other person.” There is some disagreement to the exact number. “… [Dunbar] proposed that humans can comfortably maintain only 150 stable relationships.” He put the number into a more social context, “the number of people you would not feel embarrassed about joining uninvited for a drink if you happened to bump into them in a bar.” Estimates for the number generally lay between 100 and 250.

Like stable relationships between individuals, a developer’s relationship with a codebase takes effort to maintain. When faced with large complicated projects, or ownership of many projects we lean on convention, policy, and modeled procedure to scale. Dunbar’s number is not only important to keep in mind as an office grows, but also when setting the scope for team efforts or deciding when a system should invest in tooling to assist in modeling and automating logistical overhead. Putting the number into an engineering context, it is the number of projects (or normalized complexity of a single project) for which you would feel confident in joining an on-call rotation to support.

### 6. Reed’s Law

“The utility of large networks, particularly social networks, scales exponentially with the size of the network.”

——

This law is based on graph theory, where the utility scales as the number of possible sub-groups, which is faster than the number of participants or the number of possible pairwise connections. Odlyzko and others have argued that Reed’s Law overstates the utility of the system by not accounting for the limits of human cognition on network effects; see Dunbar’s Number.

### 7. Metcalfe’s Law

“In network theory, the value of a system grows as approximately the square of the number of users of the system.”

This law is based on the number of possible pairwise connections within a system and is closely related to Reed’s Law. Odlyzko and others have argued that both Reed’s Law and Metcalfe’s Law overstate the value of the system by not accounting for the limits of human cognition on network effects; see Dunbar’s Number.

## Building things

### 8. Gall’s Law

“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”

—John Gall

Gall’s Law implies that attempts to design highly complex systems are likely to fail. Highly complex systems are rarely built in one go, but evolve instead from more simple systems.

The classic example is the world-wide-web. In its current state, it is a highly complex system. However, it was defined initially as a simple way to share content between academic institutions. It was very successful in meeting these goals and evolved to become more complex over time.

### 9. Kernighan’s Law

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

—Brian Kernighan

Kernighan’s Law is named for Brian Kernighan and derived from a quote from Kernighan and Plauger’s book The Elements of Programming Style:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

While hyperbolic, Kernighan’s Law makes the argument that simple code is to be preferred over complex code, because debugging any issues that arise in complex code may be costly or even infeasible.

If you really need to remember this, you can remember this in the form of it’s alternate formulation: KISS

“Keep it simple, stupid”

The KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided. Originating in the U.S. Navy in 1960, the phrase has been associated with aircraft engineer Kelly Johnson.

The principle is best exemplified by the story of Johnson handing a team of design engineers a handful of tools, with the challenge that the jet aircraft they were designing must be repairable by an average mechanic in the field under combat conditions with only these tools. Hence, the “stupid” refers to the relationship between the way things break and the sophistication of the tools available to repair them, not the capabilities of the engineers themselves.

### 10. Goodhart’s Law

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

—Charles Goodhart

Also commonly referenced as:

“When a measure becomes a target, it ceases to be a good measure.”

—Marilyn Strathern

The law states that the measure-driven optimizations could lead to devaluation of the measurement outcome itself. Overly selective set of measures (KPIs) blindly applied to a process results in distorted effect. People tend to optimize locally by “gaming” the system in order to satisfy particular metrics instead of paying attention to holistic outcome of their actions.

Real-world examples:

• Assert-free tests satisfy the code coverage expectation, despite the fact that the metric intent was to create well-tested software.
• Developer performance score indicated by the number of lines committed leads to unjustifiably bloated codebase.

## People and motivations

### 11. Zero, Positive, and Negative Sum Games

Zero- and positive-sum situations can be framed as ‘games’ involving the size of a pie, and how that pie is distributed (for example, land, profit, timeshare of a condo, or political power). In a zero-sum game we’re fighting over how the pie is distributed. It’s impossible for someone to advance their position without the other person losing out. If one side gets $1,000 more, that means the other side gets$1,000 less: the gains and losses add up to zero. In a zero-sum game, a rational actor seeking the greatest gain for himself or herself will necessarily be seeking the maximum loss for the other actor. In positive-sum games we’re adding to the size of the pie, meaning there are more spoils for everyone to share. So it’s possible for everyone to benefit in a “win-win situation”. Examples of zero- vs positive-sum games Good negotiating is working out how to grow the size of the pie for everyone, not just fighting over who gets what (so-called ‘distributive bargaining’). This shift in mindset fosters trust (crucial for negotiating), helps you look for ‘win-wins’, and decreases the likelihood the deal will die on the table. A couple getting a divorce can try, instead of fighting to get the better of the other while lining their lawyers’ pockets, to keep as much money for the two of them (and out of the billable hours of the lawyers) as possible. The outcomes of immigration and the economy is positive sum because immigration also increases the number jobs available in an economy, and the evidence shows that if we let refugees in and allow them to work it will drastically improve their standard of living while at least maintaining (and possibly increasing) natives’ standard of living. See the lump of labor fallacy. Similarly, international trade doesn’t benefit one country to the detriment of the other - it benefits them both. For this reason, we would all be better off turning away from protectionism to open economies, which make everyone richer and discourage war.

### 12. Amara’s Law

“We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”

—Roy Amara)

The Hype Cycle is a visual representation of the excitement and development of technology over time, originally produced by Gartner. It is best shown with a visual:

(Image Reference: By Jeremykemp at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10547051)

In short, this cycle suggests that there is typically a burst of excitement around new technology and its potential impact. Teams often jump into these technologies quickly, and sometimes find themselves disappointed with the results. This might be because the technology is not yet mature enough, or real-world applications are not yet fully realised. After a certain amount of time, the capabilities of the technology increase and practical opportunities to use it increase, and teams can finally become productive. Roy Amara’s quote sums this up most succinctly - “We tend to overestimate the effect of a technology in the short run and underestimate in the long run”.

### 13. Hanlon’s Razor

“Never attribute to malice that which is adequately explained by stupidity.”

—Robert Hanlon

This principle suggests that actions resulting in a negative outcome were not a result of ill will. Instead the negative outcome is more likely attributed to those actions and/or the impact being not fully understood.

This is somewhat similar, but not entirely identical, to the concept of Chesterson’s Fence

Reforms should not be made until the reasoning behind the existing state of affairs is understood.

The name of this principle comes from a story by G.K. Chesterson. A man comes across a fence crossing the middle of the road. He complains to the mayor that this useless fence is getting in the way, and asks to remove it. The mayor asks why the fence is there in the first place. When the man says he doesn’t know, the mayor says, “If you don’t know its purpose, I certainly won’t let you remove it. Go and find out the use of it, and then I may let you destroy it.”

This principle is relevant in software engineering when removing technical debt. Each line of a program was originally written by someone for some reason. Chesterson’s Fence suggests that one should try to understand the context and meaning of the code fully, before changing or removing it, even if at first glance it seems redundant or incorrect.

The difference between Hanlon’s Razor and Chesterton’s Fence is that, while both can be applied to situations of perceived malice, the first involves not immediately assigning malice as the motivator while the second refers to taking the time to understand the motivations behind a decision or design in the first place.

### 14. Hick’s Law

“Decision time grows logarithmically with the number of options you can choose from.”

—William Edmund Hick

In the equation below, T is the time to make a decision, n is the number of options, and b is a constant which is determined by analysis of the data.

(Image Reference: Creative Commons Attribution-Share Alike 3.0 Unported, https://en.wikipedia.org/wiki/Hick%27s_law)

This law only applies when the number of options is ordered, for example, alphabetically. This is implied in the base two logarithm - which implies the decision maker is essentially performing a binary search. If the options are not well ordered, experiments show the time taken is linear.

This is has significant impact in UI design; ensuring that users can easily search through options leads to faster decision making.

A correlation has also been shown in Hick’s Law between IQ and reaction time as shown in Speed of Information Processing: Developmental Change and Links to Intelligence.

## Project Management

“It always takes longer than you expect, even when you take into account Hofstadter’s Law.”

You might hear this law referred to when looking at estimates for how long something will take. It seems a truism in software development that we tend to not be very good at accurately estimating how long something will take to deliver.

This is from the book ’Gödel, Escher, Bach: An Eternal Golden Braid‘.

### 16. Parkinson’s Law

“Work expands so as to fill the time available for its completion.”

In its original context, this Law was based on studies of bureaucracies. It may be pessimistically applied to software development initiatives, the theory being that teams will be inefficient until deadlines near, then rush to complete work by the deadline, thus making the actual deadline somewhat arbitrary.

If this law were combined with Hofstadter’s Law, an even more pessimistic viewpoint is reached - work will expand to fill the time available for its completion and still take longer than expected.

Some organizations go far beyond combining Parkonson’s Law and Hofstadter’s Law, with what’s known as the Shirky Principle:

“Institutions will try to preserve the problem to which they are the solution.”

—Clay Shirky_

“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

—Upton Sinclair

The Shirky Principle suggests that complex solutions - a company, an industry, or a technology - can become so focused on the problem that they are solving, that they can inadvertently perpetuate the problem itself. This may be deliberate (a company striving to find new nuances to a problem which justify continued development of a solution), or inadvertent (being unable or unwilling to accept or build a solution which solves the problem completely or obviates it).

### 17. Brooks’ Law

Adding human resources to a late software development project makes it later.

This law suggests that in many cases, attempting to accelerate the delivery of a project which is already late, by adding more people, will make the delivery even later. Brooks is clear that this is an over-simplification, however, the general reasoning is that given the ramp up time of new resources and the communication overheads, in the immediate short-term velocity decreases. Also, many tasks may not be divisible, i.e. easily distributed between more resources, meaning the potential velocity increase is also lower.

The common phrase in delivery “Nine women can’t make a baby in one month” relates to Brooks’ Law, in particular, the fact that some kinds of work are not divisible or parallelisable.

This is a central theme of the book ’The Mythical Man Month‘.

### 18. The Law of Triviality

This law suggests that groups will give far more time and attention to trivial or cosmetic issues rather than serious and substantial ones.

The common fictional example used is that of a committee approving plans for nuclear power plant, who spend the majority of their time discussing the structure of the bike shed, rather than the far more important design for the power plant itself. It can be difficult to give valuable input on discussions about very large, complex topics without a high degree of subject matter expertise or preparation. However, people want to be seen to be contributing valuable input. Hence a tendency to focus too much time on small details, which can be reasoned about easily, but are not necessarily of particular importance.

The fictional example above led to the usage of the term ‘Bike Shedding’ as an expression for wasting time on trivial details. A related term is ’Yak Shaving,’ which connotes a seemingly irrelevant activity that is part of a long chain of prerequisites to the main task.

## General Approaches to Life

### 19. Foxes and Hedgehogs

There are two types of people in the world: foxes and hedgehogs. Huh? Let, me explain. The fox vs hedgehog dichotomy describes two contrasting ways of viewing the world. If you adopt fox-like thinking you rely on various pieces of information to form your view on an issue and think about it from different angles. You’re also willing to admit when you’re uncertain. But if you have more of a hedgehog mindset, you develop your world views and predictions with a central, overarching principle in mind and talk about your views with more confidence. So you’re either a fox or a hedgehog - so what? Well, turns out that which style you adopt may influence how good you are at predicting the future: Foxy Forecasters The fox vs hedgehog mindset has proven relevant to the business of predicting or ‘forecasting’ future events. In forecasting competitions, people are asked to rate the probabilities of various global trends and geopolitical events occurring. During these competitions, researchers found that the people who had more fox-like characteristics performed better than their hedgehog counterparts. When it comes to making accurate predictions, traits such as being liberal vs conservative, or optimistic vs pessimistic, matter much less than how you weigh and integrate each of them into your ultimate judgment. The best forecasters put their own theories aside, embraced uncertainty, and used multiple ways of looking at a problem to achieve a more accurate prediction. Hedgehogs who were more sure of their big-picture grasp of how the world worked, performed worse than their more unpresuming fox colleagues. What causes this difference in performance? It could be partly due to the conjunction fallacy. The conjunction fallacy is a cognitive bias we have where we intuitively feel that more specific conditions are more probable than general ones. Hedgehogs like to embellish their world-view with more and more intricate detail - but each embellishment makes it much more unlikely that their particular view is true. See also, Conjunction fallacy, Wikipedia, and Isaiah Berlin’s ‘The Hedgehog and the Fox’

### 20. Wheaton’s Law

There’s all kinds of forumulations of the “Golden Rule”, which has resulted in countless variations (some contradictory, some deliberately confusing, some turned into legalese). The addition of a “Silver Rule” generally hasn’t made much of an impact on a lot of people. Rather than beat that dead horse, here’s an alternative:

“Don’t be a dick.”

—Wil Wheaton

If you take to heart any of the laws, principles, or rules from this list, perhaps it should be this one.

Cited as:

@article{mcateer2018cogtoolkit,
title   = "Cognitive Toolkit Must-haves",
author  = "McAteer, Matthew",
journal = "matthewmcateer.me",
year    = "2018",
url     = "https://matthewmcateer.me/blog/cognitive-toolkit-must-haves/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!