Regression to the mean, the idea that there is an average outcome that can be expected and that overtime individual outliers from the average will revert back toward that average, is a boring phenomenon on its own. If you think about it in the context of driving to work and counting your red lights, you can see why it is a rather boring idea. If you normally hit 5 red lights, and one day you manage to get to work with just a single red light, you probably expect that the following day you won’t have as much luck with the lights, and will probably have more red lights than than your lucky one red light commute. Conversely, if you have a day where you manage to hit every possible red light, you would probably expect to have better traffic luck the next day and be somewhere closer to your average. This is regression to the mean. Simply because you had only one red or managed to hit every red one day doesn’t cause the next day’s traffic light stoppage to be any different, but you know you will probably have a more average count of reds versus greens – no causal explanation involved, just random traffic light luck.
But for some reason this idea is both fascinating and hard to grasp in other areas, especially if we think that we have some control of the outcome. In Thinking Fast and Slow, Daniel Kahneman helps explain why it is so difficult in some settings for us to accept regression to the mean, what is otherwise a rather boring concept. He writes,
“Our mind is strongly biased toward causal explanations and does not deal well with mere statistics. When our attention is called to an event, associative memory will look for its cause – more precisely, activation will automatically spread to any cause that is already stored in memory. Causal explanations will be evoked when regression is detected, but they will be wrong because the truth is that regression to the mean has an explanation but does not have a cause.”
Unless you truly believe that there is a god of traffic lights who rules over your morning commute, you probably don’t assign any causal mechanism to your luck with red lights. But when you are considering how well a professional golfer played on the second day of a tournament compared to the first day, or when you are considering whether intelligent women marry equally intelligent men, you are likely to have some causal idea that comes to mind. The golfer was more or less complacent on the second day – the highly intelligent women have to settle for less intelligent men because the highly intelligent men don’t want an intellectual equal. These are examples that Kahneman uses in the book and present plausible causal mechanisms, but as Kahneman shows, the more simple though boring answer is simply regression to the mean. A golfer who performs spectacularly on day one is likely to be less lucky on day two. A highly intelligent woman is likely to marry a man with intelligence closer to average just by statistical chance.
When regression to the mean violates our causal expectation it becomes an interesting and important concept. It reveals that our minds don’t simply observe an objective reality, they observe causal structures that fit with preexisting narratives. Our causal conclusions can be quite inaccurate, especially if they are influenced by biases and prejudices that are unwarranted. If we keep regression to the mean in mind, we might lose some of our exciting narratives, but our thinking will be more sound, and our judgments more clear.