Large sample sizes are important. At this moment, the world is racing as quickly as possible toward a vaccine to allow us to move forward from the COVID-19 Pandemic. People across the globe are anxious for a way to resume normal life and to reduce the risk of death from the new virus and disease. One thing standing in the way of the super quick solution that everyone wants is basic statistics. For any vaccine or treatment, we need a large sample size to be certain of the effects of anything we offer to people as a cure or for prevention of COVID-19. We want to make sure we don’t make decisions based on extreme outcomes, and that what we produce is safe and effective.
Statistics and probability are frequent parts of our lives, and many of us probably feel as though we have a basic and sufficient grasp of both. The reality, however, is that we are often terrible with thinking statistically. We are much better at thinking in narrative, and often we substitute a narrative interpretation for a statistical interpretation of the world without even recognizing it. It is easy to change our behavior based on anecdote and narrative, but not always so easy to change our behavior based on statistics. This is why we have the saying often attributed to Stalin: One death is a tragedy, a million deaths is a statistic.
The danger with anecdotal and narrative interpretations of the world is that they are drawn from small sample sizes. Daniel Kahneman explains the danger of small sample sizes in his book Thinking Fast and Slow, “extreme outcomes (both high and low) are more likely to be found in small than in large samples. This explanation is not causal.”
In his book, Kahneman explains that when you look at counties in the United States with the highest rates of cancer, you find that some of the smallest counties in the nation have the highest rates of cancer. However, if you look at which counties have the lowest rates of cancer, you will also find that it is the smallest counties in the nation that have the lowest rates. While you could drive across the nation looking for explanations to the high and low cancer rates in rural and small counties, you likely wouldn’t find a compelling causal explanation. You might be able to string a narrative together and if you try really hard you might start to see a causal chain, but your interpretation is likely to be biased and based on flimsy evidence. The fact that our small counties are the ones that have the highest and lowest rates of cancer is an artifact of small sample sizes. When you have small sample sizes, as Kahneman explains, you are likely to see more extreme outcomes. A few random chance events can dramatically change the rate of cancer per thousand residents when you only have a few thousand residents in small counties. In larger more populated counties, you find a reversion to the mean, and few extreme chance outcomes outcomes are less likely to influence the overall statistics.
To prevent our decision-making from being overly influenced by extreme outcomes we have to move past our narrative and anecdotal thinking. To ensure that a vaccine for the coronavirus or a cure for COVID-19 is safe and effective, we must allow the statistics to play out. We have to have large sample sizes, so that we are not influenced by extreme outcomes, either positive or negative, that we see when a few patients are treated successfully. We need the data to ensure that the outcomes we see are statistically sound, and not an artifact of chance within a small sample.