Regression to the Mean Versus Causal Thinking

Regression to the Mean Versus Causal Thinking

Regression to the mean, the idea that there is an average outcome that can be expected and that overtime individual outliers from the average will revert back toward that average, is a boring phenomenon on its own. If you think about it in the context of driving to work and counting your red lights, you can see why it is a rather boring idea. If you normally hit 5 red lights, and one day you manage to get to work with just a single red light, you probably expect that the following day you won’t have as much luck with the lights, and will probably have more red lights than than your lucky one red light commute. Conversely, if you have a day where you manage to hit every possible red light, you would probably expect to have better traffic luck the next day and be somewhere closer to your average. This is regression to the mean. Simply because you had only one red or managed to hit every red one day doesn’t cause the next day’s traffic light stoppage to be any different, but you know you will probably have a more average count of reds versus greens – no causal explanation involved, just random traffic light luck.

 

But for some reason this idea is both fascinating and hard to grasp in other areas, especially if we think that we have some control of the outcome. In Thinking Fast and Slow, Daniel Kahneman helps explain why it is so difficult in some settings for us to accept regression to the mean, what is otherwise a rather boring concept. He writes,

 

“Our mind is strongly biased toward causal explanations and does not deal well with mere statistics. When our attention is called to an event, associative memory will look for its cause – more precisely, activation will automatically spread to any cause that is already stored in memory. Causal explanations will be evoked when regression is detected, but they will be wrong because the truth is that regression to the mean has an explanation but does not have a cause.”

 

Unless you truly believe that there is a god of traffic lights who rules over your morning commute, you probably don’t assign any causal mechanism to your luck with red lights. But when you are considering how well a professional golfer played on the second day of a tournament compared to the first day, or when you are considering whether intelligent women marry equally intelligent men, you are likely to have some causal idea that comes to mind. The golfer was more or less complacent on the second day – the highly intelligent women have to settle for less intelligent men because the highly intelligent men don’t want an intellectual equal. These are examples that Kahneman uses in the book and present plausible causal mechanisms, but as Kahneman shows, the more simple though boring answer is simply regression to the mean. A golfer who performs spectacularly on day one is likely to be less lucky on day two. A highly intelligent woman is likely to marry a man with intelligence closer to average just by statistical chance.

 

When regression to the mean violates our causal expectation it becomes an interesting and important concept. It reveals that our minds don’t simply observe an objective reality, they observe causal structures that fit with preexisting narratives. Our causal conclusions can be quite inaccurate, especially if they are influenced by biases and prejudices that are unwarranted. If we keep regression to the mean in mind, we might lose some of our exciting narratives, but our thinking will be more sound, and our judgments more clear.
Regression to the Mean

Praise, Punishment, & Regression to the Mean

Regression to the mean is seriously underrated. In sports, stock market funds, and biological trends like generational height differences, regression to the mean is a powerful, yet misunderstood phenomenon. A rookie athlete may have a standout first year, only to perform less spectacularly the following year. An index fund may outperform all others one year, only to see other funds catch up the next year. And a tall man may have a son who is shorter. In each instance, regression to the mean is at play, but since we underrate it, we assume there is some causal factor causing our athlete to play worse (it went to his head!), causing our fund to earn less (they didn’t rebalance the portfolio correctly!), and causing our son to be shorter (his father must have married a short woman).

 

In Thinking Fast and Slow Daniel Kahneman looks at the consequences that arise when we fail to understand regression to the mean and attempt to create causal connections between events when we shouldn’t. Kahneman describes an experiment he conducted with Air Force cadets, asking them to flip a coin backwards over their head and try to hit a spot on the floor. Those who had a good first shot typically did worse on their second shot. Those who did poor on their first shot, usually did better the next time. There wasn’t any skill involved, the outcome was mostly just luck and random chance, so if someone was close one time, you might expect their next shot to be a little further out, just by random chance. This is regression to the mean in an easy to understand example.

 

But what happens when we don’t recognize regression to the mean in a random and simplified experiment? Kahneman used the cadets to demonstrate how random performance deviations from the mean during flight maneuvers translates into praise or punishments for the cadets. Those who performed well were often praised, only to regress to  the mean on their next flight and perform worse. Those who performed poorly also regressed to the mean, but in an upward direction, improving on the next flight. Those whose initial performance was poor received punishment (perhaps just a verbal reprimand) between their initial poor effort and follow-up improvement (regression).  Kahneman describes the take-away from the experiment this way:

 

“The feedback to which life exposes us is perverse. Because we tend to be nice to other people when they please us and nasty when they do not, we are statistically punished for being nice and rewarded for being nasty.”

 

Praise a cadet who performed well, and they will then perform worse. Criticize a cadet who performed poorly, and they will do better. Our minds overfit patterns and start to see a causal link between praise and subsequent poor performance and castigation and subsequent improvement. All that is really happening is that we are misunderstanding regression to the mean, and creating a causal model where we should not.

 

If we better understood regression to the mean, we wouldn’t be so shocked when a standout rookie sports star appears to have a sophomore slump. We wouldn’t jump on the bandwagon when an index fund had an exceptional year, and we wouldn’t be surprised by phenotypical regression to the mean from one generation to the next. Our brains are phenomenal pattern recognizing machines, but sometimes they see the wrong pattern, and sometimes that gives us perverse incentives for how we behave and interact with each other. The solution is to step back from individual cases and try to look at an average over time. By gathering more data and looking for longer lasting trends we can better identify regression to the mean versus real trends in performance over time.