I have an XKCD comic taped to the door of my office. The comic is about the mantra of statistics, that correlation is not causation. I taped the comic to my office door because I loved learning statistics in graduate school and thinking deeply about associations and how mere correlations cannot be used to demonstrate that one thing causes another. Two events can correlate, but have nothing to do with each other, and a third thing may influence both, causing them to correlate without any causal link between the two things.
But Judea Pearl thinks that science and researchers have fallen into a trap laid out by statisticians and the infinitely repeated correlation does not imply causation mantra. Regarding this perspective of statistics he writes, “it tells us that correlation is not causation, but it does not tell us what causation is.”
Pearl seems to suggest in The Book of Why that there was a time where there was too much data, too much humans didn’t know, and too many people ready to offer incomplete assessments based on anecdote and incomplete information. From this time sprouted the idea that correlation does not imply causation. We started to see that statistics could describe relationships and that statistics could be used to pull apart entangled causal webs, identifying each individual component and assessing its contribution to a given outcome. However, as his quote shows, this approach never actually answered what causation is. It never actually told us when we can know and ascertain that a causal structure and causal mechanism is in place.
“Over and over again,” writes Pearl, “in science and in business, we see situations where mere data aren’t enough.”
To demonstrate the shortcomings of our high regard for statistics and our mantra that correlation is not causation, Pearl walks us through the congressional testimonies and trials of big tobacco companies in the United States. The data told us there was a correlation between smoking and lung cancer. There was overwhelming statistical evidence that smoking was related or associated with lung cancer, but we couldn’t attain 100% certainty just through statistics that smoking caused lung cancer. The companies themselves muddied the water with misleading studies and cherry picked results. They hid behind a veil that said that correlation was not causation, and hid behind the confusion around causation that statistics could never fully clarify.
Failing to develop a real sense of causation, failing to move beyond big data, and failing to get beyond statistical correlations can have real harms. We need to be able to recognize causation, even without relying on randomized controlled trials, and we need to be able to make decisions to save lives. The lesson of the comic taped to my door is helpful when we are trying to be scientific and accurate in our thinking, but it can also lead us astray when we fail to trust a causal structure that we can see, but can’t definitively prove via statistics.