I like statistics. I like to think statistically, to recognize that there is a percent chance of one outcome that can be influenced by other factors. I enjoy looking at best fit lines, seeing that there are correlations between different variables, and seeing how trend-lines change if you control for different variables. However, statistics and trend lines don’t actually tell us anything about causality.
In The Book of Why Judea Pearl writes, “the slope (after scaling) is the same no matter whether you plot X against Y or Y against X. In other words, the slope is completely agnostic as to cause and effect. One variable could cause the other, or they could both be effects of a third cause; for the purpose of prediction, it does not matter.”
In statistics we all know that correlation is not causation, but this quote helps us remember important information when we see a statistical analysis and a plot with linear regression line running through it. The regression line is like the owl that Pearl had described earlier in the book. The owl is able to predict where a mouse is likely to be and able to predict which direction it will run, but the owl does not seem to know why a mouse is likely to be in a given location or why it is likely to run in one direction over another. It simply knows from experience and observation what a mouse is likely to do.
The regression line is a best fit for numerous observations, but it doesn’t tell us whether one variable causes another or whether both are influenced in a similar manner by another variable. The regression line knows where the mouse might be and where it might run, but it doesn’t know why.
In statistics courses we end at this point of correlation. We might look for other variables that are correlated or try to control for third variables to see if the relationship remains, but we never answer the question of causality, we never get to the why. Pearl thinks this is a limitation we do not need to put on ourselves. Humans, unlike owls, can understand causality, we can recognize the various reasons why a mouse might be hiding under a bush, and why it may chose to run in one direction rather than another. Correlations can help us start to see where relationships exist, but it is the ability of our mind to understand causal pathways that helps us determine causation.
Pearl argues that statisticians avoid these causal arguments out of caution, but that it only ends up creating more problems down the line. Important statistical research in areas of high interest or concern to law-makers, business people, or the general public are carried beyond the cautious bounds that causality-averse statisticians place on their work. Showing correlations without making an effort to understand the causality behind it makes scientific work vulnerable to the epistemically malevolent who would like to use correlations to their own ends. While statisticians rigorously train themselves to understand that correlation is not causation, the general public and those struck with motivated reasoning don’t hold themselves to the same standard. Leaving statistical analysis at the level of correlation means that others can attribute the cause and effect of their choice to the data, and the proposed causal pathways can be wildly inaccurate and even dangerous. Pearl suggests that statisticians and researchers are thus obligated to do more with causal structures, to round off their work and better develop ideas of causation that can be defended once their work is beyond the world of academic journals.