When to Stop Counting

When to Stop Counting

Yesterday I wrote about the idea of scientific versus political numbers. Scientific numbers are those that we rely on for decision-making. They are not always better and more accurate numbers than political numbers, but they are generally based on some sort of standardized methodology and have a concrete and agreed upon backing to them. Political numbers are more or less guestimates or are formed from sources that are not confirmed to be reliable. While they can end up being more accurate than scientific figures they are harder to accept and justify in decision-making processes. In the end, the default is scientific numbers, but scientific numbers do have a flaw that keeps them from ever becoming what they proport to be. How do we know when it is time to stop counting and when we are ready to move forward with a scientific number rather than fall back on a political number?
Christopher Jencks explores this idea in his book The Homeless by looking at a survey conducted by Martha Burt at the Urban Institute. Jencks writes, “Burt’s survey provides quite a good picture of the visible homeless. It does not tell us much about those who avoid shelters, soup kitchens, and the company of other homeless individuals. I doubt that such people are numerous, but I can see no way of proving this. It is hard enough finding the proverbial needle in a haystack. It is far harder to prove that a haystack contains no more needles.” The quote shows that Burt’s survey was good at identifying the visibly homeless people, but that at some point in the survey a decision was made to stop attempting to count the less visibly homeless. It is entirely reasonable to stop counting at a certain point, as Jencks mentions it is hard to prove there are no more needles left to count, but that always means there will be a measure of uncertainty with your counting and results. Your numbers will always come with a margin of error because there is almost no way to be certain that you didn’t miss something.
Where we chose to stop counting can influence whether we should consider our numbers to be scientific numbers or political numbers. I would argue that the decision for where to stop our count is both a scientific and a political decision itself. We can make political decisions to stop counting in a way that deliberately excludes hard to count populations. Alternatively, we can continue our search to expand the count and change the end results of our search. Choosing how scientifically accurate to be with our count is still a political decision at some level.
However, choosing to stop counting can also be a rational and economic decision. We may have limited funding and resources for our counting, and be forced to stop at a reasonable point that allows us to make scientifically appropriate estimates about the remaining uncounted population. Diminishing marginal returns to our counting efforts also means at a certain point we are putting in far more effort into counting relative to the benefit of counting one more item for any given survey. This demonstrates how our numbers can be based on  scientific or political motivations, or both. These are all important considerations for us whether we are the counter or studying the results of the counting. Where we chose to stop matters, and because we likely can’t prove we have found every needle in the haystack, and that no more needles exist. No matter what, we will have to face the reality that the numbers we get are not perfect, no matter how scientific we try to make them.
Political and Scientific Numbers

Political and Scientific Numbers

I am currently reading a book about the beginnings of the Industrial Revolution and the author has recently been comparing the development of textile mills, steam engines, and chemical production in Britain in the 1800’s to the same developments on the European continent. It is clear that within Britain the developments of new technologies and the adoption of larger factories to produce more material was much quicker than on the continent, but exactly how much quicker is hard to determine. One of the biggest challenges is finding reliable and accurate information to compare the number of textile factories, the horse power of steam engines, or how many chemical products were exported in a given decade. In the 1850s getting good data and preserving that data for historians to sift through and analyze a couple of hundred years later was not an easy task. Many of the numbers that the author has referenced are generalized estimates and ranges, not well defined statistical figures. Nevertheless, this doesn’t mean the data are not useful and cannot help us understand general trends of the industrial revolution in Britain and the European continent.
Our ability to obtain and store numbers, information, and data is much better today than in the 1800s, but that doesn’t mean that all of our numbers are now perfect and that we have everything figured out. Sometimes our data comes from pretty reliable sources, like the GPS map data on Strava that gives us an idea of where lots of people like to exercise and where very few people exercise. Other data is pulled from surveys which can be unreliable or influenced by word choice and response order. Some data comes from observational studies that might be flawed in one way or another. Other data may just be incomplete, from small sample sizes, or simply messy and hard to understand. Getting good information out of such data is almost impossible. As the saying goes, garbage in – garbage out.
Consequently we end up with political numbers and scientific numbers. Christopher Jencks wrote about the role that both have played in how we understand and think about homelessness in his book The Homeless. He writes, “one needs to distinguish between scientific and political numbers. This distinction has nothing to do with accuracy. Scientific numbers are often wrong, and political numbers are often right. But scientific numbers are accompanied by enough documentation so you can tell who counted what, whereas political numbers are not.”
It is interesting to think about the accuracy (or perhaps inaccuracy) of the numbers we use to understand our world. Jencks explains that censuses of homeless individuals need to be conducted early in the morning or late at night to capture the full number of people sleeping in parks or leaving from/returning to overnight shelters. He also notes the difficulty of contacting people to confirm their homeless status and the challenges of simply surveying people by asking if they have a home. People use different definitions of having a home, being homeless, or having a fixed address and those differences can influence the count of how many homeless people live within a city or state. The numbers are backed by a scientific process, but they may be inaccurate and not representative of reality. By contrast, political numbers could be based on a random advocate’s average count of meals provided at a homeless shelter or by other estimates. These estimates may end up being just as accurate, or more so, than the scientific numbers used, but how the numbers are used and understood can be very different.
Advocacy groups, politicians, and concerned citizens can use non-scientific numbers to advance their cause or their point of view. They can rely on general estimates to demonstrate that something is or is not a problem. But they can’t necessarily drive actual action by governments, charities, or private organizations with only political numbers. Decisions look bad when made based on rough guesses and estimates. They look much better when they are backed by scientific numbers, even if those numbers are flawed. When it is time to actually vote, when policies have to be written and enacted, and when a check needs to be signed, having some sort of scientific backing to a number is crucial for self-defense and for (at least an attempt at) rational thinking.
Today we are a long way off from the pen and paper (quill and scroll?) days of the 1800s. We have the ability to collect far more data than we could have ever imagined, but the numbers we end up with are not always that much better than rough estimates and guesses. We may use the data in a way that shows that we trust the science and numbers, but the information may ultimately be useless. These are some of the frustrations that so many people have today with the ways we talk about politics and policy. Political numbers may suggest we live in one reality, but scientific numbers may suggest another reality. Figuring out which is correct and which we should trust is almost impossible, and the end result is confusion and frustration. We probably solve this with time, but it will be a hard problem that will hang around and worsen as misinformation spreads online.
Poverty - $2.00 A Day - Kathryn Edin & H. Luke Shaefer

Who Experiences Deep Poverty

The image of deep poverty in the United States is unfairly and inaccurately racialized. For many people, it is hard to avoid associating words like poverty, ghetto, or poor with black and minority individuals and communities. For many, the default mental image for such terms is unavoidably non-white, and white poverty ends up taking on qualifiers to distinguish it as something separate from the default image for poverty. We use white-trash or something related to a trailer park to distinguish white poverty as something different than general poverty which is coded as black and minority.
This distinction, default, and mental image of poverty being a black and minority problem creates a lot of misconceptions about who is truly poor in America. In the book $2.00 A Day Kathryn Edin and H. Luke Shaefer write, “the phenomenon of $2-a-day poverty among households with children [has] been on the rise since the nation’s landmark welfare reform legislation was passed in 1996. … although the rate of growth [is] highest among African Americans and Hispanics, nearly half of the $2-a-day poor [are] white.” (Tense changed from past to present by blog author)
Poverty, in public discourse and public policy, is often presented as a racial problem because we do not recognize how many white people in the United States live in poverty. The quote above shows that the racialized elements of our general view of poverty do reflect real differences in changing rates of poverty among minority groups, but also reveals that almost half – nearly a majority – of people in poverty are white.
The consequence is that policy and public opinion often approaches poverty from a race based standpoint, and not from an economic and class based standpoint. Policy is not well designed when it doesn’t reflect the reality of the situation, and public discourse is misplaced when it fails to accurately address the problems society faces. Biases, prejudices, and discriminatory practices can be propped up and supported when we misunderstand the nature of reality, especially when it comes to extreme poverty. Additionally, by branding only minorities as poor and carving out a special space for white poverty, we reducing the scope and seriousness of the problem, insisting that it is a cultural problem of inferior and deficient groups, rather than a by-product of an economic system or a manifestation of shortcomings of economic and social models. It is important that we recognize that poverty is not something exclusive to black and minority groups.
Data Mining is a First Step

Data Mining is a First Step

From big tech companies, sci-fi movies, and policy entrepreneurs data mining is presented as a solution to many of our problems. With traffic apps collecting mountains of movement data, governments collecting vast amounts of tax data, and heath-tech companies collecting data for every step we take, the promise of data mining is that our sci-fi fantasies will be realized here on earth in the coming years. However, data mining is only a first step on a long road to the development of real knowledge that will make our world a better place. The data alone is interesting and our computing power to work with big data is astounding, but data mining can’t give us answers, only interesting correlations and statistics.
In The Book of Why Judea Pearl writes:
“It’s easy to understand why some people would see data mining as the finish rather than the first step. It promises a solution using available technology. It saves us, as well as future machines, the work of having to consider and articulate substantive assumptions about how the world operates. In some fields our knowledge may be in such an embryonic state that we have no clue how to begin drawing a model of the world. But big data will not solve this problem. The most important part of the answer must come from such a model, whether sketched by us or hypothesized and fine-tuned by machines.”
Big data can give us insights and help us identify unexpected correlations and associations, but identifying unexpected correlations and associations doesn’t actually tell us what is causing the observations we make. The messaging of massive data mining is that we will suddenly understand the world and make it a better place. The reality is that we have to develop hypotheses about how the world works based on causal understandings of the interactions between various factors of reality. This is crucial or we won’t be able to take meaningful action based what comes from our data mining. Without developing causal hypotheses we cannot experiment with associations and continue to learn, we can only observe what correlations come from big data. Using the vast amounts of data we are collecting is important, but we have to have a goal to work toward and a causal hypothesis of how we can reach that goal in order for data mining to be meaningful.
Stories from Bid Data

Stories from Big Data

Dictionary.com describes datum (the singular of data) as “a single piece of information; any fact assumed to be a matter of direct observation.” So when we think about big data, we are thinking about massive amounts of individual pieces of information or individual facts from direct observation. Data simply are what they are, facts and individual observations in isolation.
On the other hand Dictionary.com defines information as “knowledge communicated or received concerning a particular fact or circumstance.” Information is the knowledge, story, and ideas we have about the data. These two definitions are important for thinking about big data. We never talk about big information, but the reality is that big data is less important than the knowledge we generate from the data, and that isn’t as objective as the individual datum.
In The Book of Why Judea Pearl writes, “a generation ago, a marine biologist might have spent months doing a census of his or her favorite species. Now the same biologist has immediate access online to millions of data points on fish, eggs, stomach contents, or anything else he or she wants. Instead of just doing a census, the biologist can tell a story.” Science has become contentious and polarizing recently, and part of the reason has to do with the stories that we are generating based on the big data we are collecting. We can see new patterns, new associations, new correlations, and new trends in data from across the globe. As we have collected this information, our impact on the planet, our understanding of reality, and how we think about ourselves in the universe has changed. Science is not simply facts, that is to say it is not just data. Science is information, it is knowledge and stories that have continued to challenge the narratives we have held onto as a species for thousands of years.
Judea Pearl thinks it is important to recognize the story aspect of big data. He thinks it is crucial that we understand the difference between data and information, because without doing so we turn to the data blindly and can generate an inaccurate story based on what we see. He writes,
“In certain circles there is an almost religious faith that we can find the answers to … questions in the data itself, if only we are sufficiently clever at data mining. However, readers of this book will know that this hype is likely to be misguided. The questions I have just asked are all causal, and causal questions can never be answered from data alone.”
Big data presents us with huge numbers of observations and facts, but those facts alone don’t represent causal structures or deeper interactions within reality. We have to generate information from the data and combine that new knowledge with existing knowledge and causal hypothesis to truly learn something new from big data. If we don’t then we will simply be identifying meaningless correlations without truly understanding what they mean or imply.
Data Driven Methods

Data Driven Methods

In the world of big data scientists today have a real opportunity to push the limits scientific inquiry in ways that were never before possible. We have the collection methods and computing power available to analyze huge datasets and make observations in minutes that would have taken decades just a few years ago. However, many areas of science are not being strategic with this new power. Instead, many areas of science simply seem to be plugging variables into huge data sets and haphazardly looking for correlations and associations. Judea Pearl is critical of this approach to science in The Book of Why and uses the Genome-wide association study (GWAS) to demonstrate the shortcomings of this approach.
 
 
Pearl writes, “It is important to notice the word association in the term GWAS. This method does not prove causality; it only identifies genes associated with a certain disease in the given sample. It is a data-driven rather than hypothesis-driven method, and this presents problems for causal inference.”
 
 
In the 1950s and 1960s, Pearl explains, R. A. Fisher was skeptical that smoking caused cancer and argued that the correlation between smoking and cancer could have simply been the result of a hidden variable. He suggested it was possible for a gene to exist that both predisposed people to smoke and predisposed people to develop lung cancer. Pearl writes that such a smoking gene was indeed discovered in 2008 through the GWAS, but Pearl also notes that the existence of such a gene doesn’t actually provide us with any causal mechanism between people’s genes and smoking behavior or cancer development.  The smoking gene was not discovered by a hypothesis driven method but rather by data driven methods. Researchers simply looked at massive genomic datasets to see if any genes correlated between people who smoke and people who develop lung cancer. The smoking gene stood out in that study.
 
 
Pearl continues to say that causal investigations have shown that the gene in question is important for nicotine receptors  in lung cells, positing a causal pathway to smoking predispositions and the gene. However, causal studies also indicate that the gene increases your chance of developing lung cancer by less than doubling the chance of cancer. “This is serious business, no doubt, but it does not compare to the danger you face if you are a regular smoker,” writes Pearl. Smoking is associated with a 10 times increase in the risk of developing lung cancer, while the smoking gene only accounts for a less than double risk increase. The GWAS tells us that the gene is involved in cancer, but we can’t make any causal conclusions from just an association. We have to go deeper to understand its causality and to relate that to other factors that we can study. This helps us contextualize the information from the GWAS.
 
 
Much of science is still like the GWAS, looking for associations and hoping to be able to identify a causal pathway as was done with the smoking gene. In some cases these data driven methods can pay off by pointing the way for researchers to start looking for hypothesis driven methods, but we should recognize that data driven methods themselves don’t answer our questions and only represent correlations, not underlying causal structures. This is important because studies and findings based on just associations can be misleading. Discovering a smoking gene and not explaining the actual causal relationship or impact could harm people’s health, especially if they decided that they would surely develop cancer because they had the gene. Association studies ultimately can be misleading, misused, misunderstood, and dangerous, and that is part of why Pearl suggests a need to move beyond simple association studies. 

Mediating Variables

Mediating Variables

Mediating variables stand in the middle of the actions and the outcomes that we can observe. They are often tied together and hard to separate from the action and the outcome, making their direct impact hard to pull apart from other factors. They play an important role in determining causal structures, and ultimately in shaping discourse and public policy about good and bad actions.
Judea Pearl writes about mediating variables in The Book of Why. He uses cigarette smoking, tar, and lung cancer as an example of the confounding nature of mediating variables. He writes, “if smoking causes lung cancer only through the formation of tar deposits, then we could eliminate the excess cancer risk by giving smokers tar-free cigarettes, such as e-cigarettes. On the other hand, if smoking causes cancer directly or through a different mediator, then e-cigarettes might not solve the problem.”
The mediator problem of tar still has not been fully disentangled and fully understood, but it is an excellent example of the importance, challenges, and public health consequences of mediating variables. Mediators can contribute directly to the final outcome we observe (lung cancer), but they may not be the only variable at play. In this instance, other aspects of smoking may directly cause lung cancer. An experiment between cigarette and e-cigarette smokers can help us get closer, but we won’t be able to say there isn’t a self-selection effect between traditional and e-cigarette smokers that plays into cancer development. However, closely studying both groups will help us start to better understand the direct role of tar in the causal chain.
Mediating variables like this pop up when we talk about the effectiveness of schools, the role for democratic norms, and the pros or cons of traditional gender roles. Often, mediating variables are driving the concerns we have for larger actions and behaviors. We want all children to go to school, but argue about the many mediating variables within the educational environment that may or may not directly contribute to specific outcomes that we want to see. It is hard to say which specific piece is the most important, because there are so many mediating variables all contributing directly or possibly indirectly to the education outcomes we see and imagine.
Counterfactuals

Counterfactuals

I have written a lot lately about the incredible human ability to imagine worlds that don’t exist. An important way that we understand the world is by imagining what would happen if we did something that we have not yet done or if we imagine what would have happened had we done something different in the past. We are able to use our experiences about the world and our intuition on causality to imagine a different state of affairs from what currently exists. Innovation, scientific advancements, and social cooperation all depend on our ability to imagine different worlds and intuit causal chains between our current world and the imagined reality we desire.
In The Book of Why Jude Pearl writes, “counterfactuals are an essential part of how humans learn about the world and how our actions affect it. While we can never walk down both the paths that diverge in a wood, in a great many cases we can know, with some degree of confidence, what lies down each.”
A criticism of modern science and statistics is the reliance on randomized controlled trials and the fact that we cannot run an RCT on many of the things we study. We cannot run RCTs on our planet to determine the role of meteor impacts or lightning strikes on the emergence of life. We cannot run RCTs on the toxicity of snake venoms in human subjects. We cannot run RCTs on giving stimulus checks  to Americans during the COVID-19 Pandemic. Due to physical limitations and ethical considerations, RCTs are not always possible. Nevertheless, we can still study the world and use counterfactuals to think about the role of specific interventions.
If we forced ourselves to only accept knowledge based on RCTs then we would not be able to study the areas I mentioned above. We cannot go down both paths in randomized experiments with those choices. We either ethically cannot administer an RCT or we are stuck with the way history played out. We can, however, employ counterfactuals, imagining different worlds in our heads to think about what would have happened had we gone down another path. In this process we might make errors, but we can continually learn and improve our mental models. We can study what did happen, think about what we can observe based on causal structures, and better understand what would have happened had we done something different. This is how much of human progress has moved forward, without RCTs and with counterfactuals, imagining how the world could be different, how people, places, societies, and molecules could have reacted differently with different actions and conditions.
Dose-Response Curves

Dose-Response Curves

One limitation of linear regression models, explains Judea Pearl in his book The Book of Why is that they are unable to accurately model interactions or relationships that don’t follow linear relationships. This lesson was hammered into my head by a statistics professor at the University of Nevada, Reno when discussing binomial variables. For variables where there are only two possible options, such as yes or no, a linear regression model doesn’t work. When the Challenger Shuttle’s O-ring failed, it was because the team had run a linear regression model to determine a binomial variable, the O-ring fails or it’s integrity holds. However, there are other situations where a linear regression becomes problematic.
 
 
In the book, Pearl writes, “linear models cannot represent dose-response curves that are not straight lines. They cannot represent threshold effects, such as a drug that has increasing effects up to a certain dosage and then no further effect.”
 
 
Linear relationship models become problematic when the effect of a variable is not constant over dosage. In the field of study that I was trained in, political science, this isn’t a big deal. In my field, simply demonstrating that there is a mostly consistent connection between ratings of trust in public institutions and receipt of GI benefits, for example, is usually sufficient. However, in fields like medicine or nuclear physics, it is important to recognize that a linear regression model might be ill suited to the actual reality of the variable.
 
 
A drug that is ineffective at small doses, becomes effective at moderate doses, but quickly becomes deadly at high doses shouldn’t be modeled with a linear regression model. This type of drug is one that the general public needs to be especially careful with, since so many individuals approach medicine with a “if some is good then more is better” mindset. Within physics, as was seen in the Challenger example, the outcomes can also be a matter of life. If a particular rubber for tires holds its strength but fails at a given threshold, if a rubber seal fails at a low temperature, or if a nuclear cooling pool will flash boil at a certain heat, then linear regression models will be inadequate for making predictions about the true nature of variables.
 
 
This is an important thing for us to think about when we consider the way that science is used in general discussion. We should recognize that people assume a linear relationship based on an experimental study, and we should look for binomial variables or potential non-linear relationships when thinking about a study and its conclusions. Improving our thinking about linear regression and dose-response curves can help us be smarter when it comes to things that matter like global pandemics and even more general discussions about what we think the government should or should not do.

Ignorability

Ignorability

The idea of ignorability helps us in science by playing a role in randomized trials. In the real world, there are too many potential variables to be able to comprehensively predict exactly how a given intervention will play out in every case. We almost always have outliers that have wildly different outcomes compared to what we would have predicted. Quite often some strange factor that could not be controlled or predicted caused the individual case to differ dramatically from the norm.
Thanks to concepts of ignorability, we don’t have to spend too much time worrying about the causal structures that created a single outlier. In The Book of Why Judea Pearl tries his best to provide a definition of ingorability for those who need to assess whether ignorability holds in a given outlier decision. He writes, “the assignment of patients to either treatment or control is ignorable if patients who would have one potential outcome are just as likely to be in the treatment or control group as the patients who would have a different potential outcome.”
What Pearl means is that ignorability applies when there is not a determining factor that makes people with any given outcome more likely to be in a control or treatment group. When people are randomized into control versus treatment, then there is not likely to be a commonality among people in either group that makes them more or less likely to have a given reaction. So a random outlier in one group can be expected to be offset by a random outlier in the other group (not literally a direct opposite, but we shouldn’t see a trend of specific outliers all in either treatment or control).
Ignroability does not apply in situations where there is a self-selection effect for control or treatment. In the world of the COVID-19 Pandemic, this applies in situations like human challenge trials. It is unlikely that people who know they are at risk of bad reactions to a vaccine would self-select into a human challenge trial. This same sort of thing happens with corporate health benefits initiatives, smart phone beta-testers, and general inadvertent errors in scientific studies. Outliers may not be outliers we can ignore if there is a self-selection effect, and the outcomes that we observe may reflect something other than what we are studying, meaning that we can’t apply ignorability in a way that allows us to draw a conclusion specifically on our intervention.