Poverty - $2.00 A Day - Kathryn Edin & H. Luke Shaefer

Who Experiences Deep Poverty

The image of deep poverty in the United States is unfairly and inaccurately racialized. For many people, it is hard to avoid associating words like poverty, ghetto, or poor with black and minority individuals and communities. For many, the default mental image for such terms is unavoidably non-white, and white poverty ends up taking on qualifiers to distinguish it as something separate from the default image for poverty. We use white-trash or something related to a trailer park to distinguish white poverty as something different than general poverty which is coded as black and minority.
This distinction, default, and mental image of poverty being a black and minority problem creates a lot of misconceptions about who is truly poor in America. In the book $2.00 A Day Kathryn Edin and H. Luke Shaefer write, “the phenomenon of $2-a-day poverty among households with children [has] been on the rise since the nation’s landmark welfare reform legislation was passed in 1996. … although the rate of growth [is] highest among African Americans and Hispanics, nearly half of the $2-a-day poor [are] white.” (Tense changed from past to present by blog author)
Poverty, in public discourse and public policy, is often presented as a racial problem because we do not recognize how many white people in the United States live in poverty. The quote above shows that the racialized elements of our general view of poverty do reflect real differences in changing rates of poverty among minority groups, but also reveals that almost half – nearly a majority – of people in poverty are white.
The consequence is that policy and public opinion often approaches poverty from a race based standpoint, and not from an economic and class based standpoint. Policy is not well designed when it doesn’t reflect the reality of the situation, and public discourse is misplaced when it fails to accurately address the problems society faces. Biases, prejudices, and discriminatory practices can be propped up and supported when we misunderstand the nature of reality, especially when it comes to extreme poverty. Additionally, by branding only minorities as poor and carving out a special space for white poverty, we reducing the scope and seriousness of the problem, insisting that it is a cultural problem of inferior and deficient groups, rather than a by-product of an economic system or a manifestation of shortcomings of economic and social models. It is important that we recognize that poverty is not something exclusive to black and minority groups.
Data Mining is a First Step

Data Mining is a First Step

From big tech companies, sci-fi movies, and policy entrepreneurs data mining is presented as a solution to many of our problems. With traffic apps collecting mountains of movement data, governments collecting vast amounts of tax data, and heath-tech companies collecting data for every step we take, the promise of data mining is that our sci-fi fantasies will be realized here on earth in the coming years. However, data mining is only a first step on a long road to the development of real knowledge that will make our world a better place. The data alone is interesting and our computing power to work with big data is astounding, but data mining can’t give us answers, only interesting correlations and statistics.
In The Book of Why Judea Pearl writes:
“It’s easy to understand why some people would see data mining as the finish rather than the first step. It promises a solution using available technology. It saves us, as well as future machines, the work of having to consider and articulate substantive assumptions about how the world operates. In some fields our knowledge may be in such an embryonic state that we have no clue how to begin drawing a model of the world. But big data will not solve this problem. The most important part of the answer must come from such a model, whether sketched by us or hypothesized and fine-tuned by machines.”
Big data can give us insights and help us identify unexpected correlations and associations, but identifying unexpected correlations and associations doesn’t actually tell us what is causing the observations we make. The messaging of massive data mining is that we will suddenly understand the world and make it a better place. The reality is that we have to develop hypotheses about how the world works based on causal understandings of the interactions between various factors of reality. This is crucial or we won’t be able to take meaningful action based what comes from our data mining. Without developing causal hypotheses we cannot experiment with associations and continue to learn, we can only observe what correlations come from big data. Using the vast amounts of data we are collecting is important, but we have to have a goal to work toward and a causal hypothesis of how we can reach that goal in order for data mining to be meaningful.
Stories from Bid Data

Stories from Big Data

Dictionary.com describes datum (the singular of data) as “a single piece of information; any fact assumed to be a matter of direct observation.” So when we think about big data, we are thinking about massive amounts of individual pieces of information or individual facts from direct observation. Data simply are what they are, facts and individual observations in isolation.
On the other hand Dictionary.com defines information as “knowledge communicated or received concerning a particular fact or circumstance.” Information is the knowledge, story, and ideas we have about the data. These two definitions are important for thinking about big data. We never talk about big information, but the reality is that big data is less important than the knowledge we generate from the data, and that isn’t as objective as the individual datum.
In The Book of Why Judea Pearl writes, “a generation ago, a marine biologist might have spent months doing a census of his or her favorite species. Now the same biologist has immediate access online to millions of data points on fish, eggs, stomach contents, or anything else he or she wants. Instead of just doing a census, the biologist can tell a story.” Science has become contentious and polarizing recently, and part of the reason has to do with the stories that we are generating based on the big data we are collecting. We can see new patterns, new associations, new correlations, and new trends in data from across the globe. As we have collected this information, our impact on the planet, our understanding of reality, and how we think about ourselves in the universe has changed. Science is not simply facts, that is to say it is not just data. Science is information, it is knowledge and stories that have continued to challenge the narratives we have held onto as a species for thousands of years.
Judea Pearl thinks it is important to recognize the story aspect of big data. He thinks it is crucial that we understand the difference between data and information, because without doing so we turn to the data blindly and can generate an inaccurate story based on what we see. He writes,
“In certain circles there is an almost religious faith that we can find the answers to … questions in the data itself, if only we are sufficiently clever at data mining. However, readers of this book will know that this hype is likely to be misguided. The questions I have just asked are all causal, and causal questions can never be answered from data alone.”
Big data presents us with huge numbers of observations and facts, but those facts alone don’t represent causal structures or deeper interactions within reality. We have to generate information from the data and combine that new knowledge with existing knowledge and causal hypothesis to truly learn something new from big data. If we don’t then we will simply be identifying meaningless correlations without truly understanding what they mean or imply.
Data Driven Methods

Data Driven Methods

In the world of big data scientists today have a real opportunity to push the limits scientific inquiry in ways that were never before possible. We have the collection methods and computing power available to analyze huge datasets and make observations in minutes that would have taken decades just a few years ago. However, many areas of science are not being strategic with this new power. Instead, many areas of science simply seem to be plugging variables into huge data sets and haphazardly looking for correlations and associations. Judea Pearl is critical of this approach to science in The Book of Why and uses the Genome-wide association study (GWAS) to demonstrate the shortcomings of this approach.
 
 
Pearl writes, “It is important to notice the word association in the term GWAS. This method does not prove causality; it only identifies genes associated with a certain disease in the given sample. It is a data-driven rather than hypothesis-driven method, and this presents problems for causal inference.”
 
 
In the 1950s and 1960s, Pearl explains, R. A. Fisher was skeptical that smoking caused cancer and argued that the correlation between smoking and cancer could have simply been the result of a hidden variable. He suggested it was possible for a gene to exist that both predisposed people to smoke and predisposed people to develop lung cancer. Pearl writes that such a smoking gene was indeed discovered in 2008 through the GWAS, but Pearl also notes that the existence of such a gene doesn’t actually provide us with any causal mechanism between people’s genes and smoking behavior or cancer development.  The smoking gene was not discovered by a hypothesis driven method but rather by data driven methods. Researchers simply looked at massive genomic datasets to see if any genes correlated between people who smoke and people who develop lung cancer. The smoking gene stood out in that study.
 
 
Pearl continues to say that causal investigations have shown that the gene in question is important for nicotine receptors  in lung cells, positing a causal pathway to smoking predispositions and the gene. However, causal studies also indicate that the gene increases your chance of developing lung cancer by less than doubling the chance of cancer. “This is serious business, no doubt, but it does not compare to the danger you face if you are a regular smoker,” writes Pearl. Smoking is associated with a 10 times increase in the risk of developing lung cancer, while the smoking gene only accounts for a less than double risk increase. The GWAS tells us that the gene is involved in cancer, but we can’t make any causal conclusions from just an association. We have to go deeper to understand its causality and to relate that to other factors that we can study. This helps us contextualize the information from the GWAS.
 
 
Much of science is still like the GWAS, looking for associations and hoping to be able to identify a causal pathway as was done with the smoking gene. In some cases these data driven methods can pay off by pointing the way for researchers to start looking for hypothesis driven methods, but we should recognize that data driven methods themselves don’t answer our questions and only represent correlations, not underlying causal structures. This is important because studies and findings based on just associations can be misleading. Discovering a smoking gene and not explaining the actual causal relationship or impact could harm people’s health, especially if they decided that they would surely develop cancer because they had the gene. Association studies ultimately can be misleading, misused, misunderstood, and dangerous, and that is part of why Pearl suggests a need to move beyond simple association studies. 

Mediating Variables

Mediating Variables

Mediating variables stand in the middle of the actions and the outcomes that we can observe. They are often tied together and hard to separate from the action and the outcome, making their direct impact hard to pull apart from other factors. They play an important role in determining causal structures, and ultimately in shaping discourse and public policy about good and bad actions.
Judea Pearl writes about mediating variables in The Book of Why. He uses cigarette smoking, tar, and lung cancer as an example of the confounding nature of mediating variables. He writes, “if smoking causes lung cancer only through the formation of tar deposits, then we could eliminate the excess cancer risk by giving smokers tar-free cigarettes, such as e-cigarettes. On the other hand, if smoking causes cancer directly or through a different mediator, then e-cigarettes might not solve the problem.”
The mediator problem of tar still has not been fully disentangled and fully understood, but it is an excellent example of the importance, challenges, and public health consequences of mediating variables. Mediators can contribute directly to the final outcome we observe (lung cancer), but they may not be the only variable at play. In this instance, other aspects of smoking may directly cause lung cancer. An experiment between cigarette and e-cigarette smokers can help us get closer, but we won’t be able to say there isn’t a self-selection effect between traditional and e-cigarette smokers that plays into cancer development. However, closely studying both groups will help us start to better understand the direct role of tar in the causal chain.
Mediating variables like this pop up when we talk about the effectiveness of schools, the role for democratic norms, and the pros or cons of traditional gender roles. Often, mediating variables are driving the concerns we have for larger actions and behaviors. We want all children to go to school, but argue about the many mediating variables within the educational environment that may or may not directly contribute to specific outcomes that we want to see. It is hard to say which specific piece is the most important, because there are so many mediating variables all contributing directly or possibly indirectly to the education outcomes we see and imagine.
Counterfactuals

Counterfactuals

I have written a lot lately about the incredible human ability to imagine worlds that don’t exist. An important way that we understand the world is by imagining what would happen if we did something that we have not yet done or if we imagine what would have happened had we done something different in the past. We are able to use our experiences about the world and our intuition on causality to imagine a different state of affairs from what currently exists. Innovation, scientific advancements, and social cooperation all depend on our ability to imagine different worlds and intuit causal chains between our current world and the imagined reality we desire.
In The Book of Why Jude Pearl writes, “counterfactuals are an essential part of how humans learn about the world and how our actions affect it. While we can never walk down both the paths that diverge in a wood, in a great many cases we can know, with some degree of confidence, what lies down each.”
A criticism of modern science and statistics is the reliance on randomized controlled trials and the fact that we cannot run an RCT on many of the things we study. We cannot run RCTs on our planet to determine the role of meteor impacts or lightning strikes on the emergence of life. We cannot run RCTs on the toxicity of snake venoms in human subjects. We cannot run RCTs on giving stimulus checks  to Americans during the COVID-19 Pandemic. Due to physical limitations and ethical considerations, RCTs are not always possible. Nevertheless, we can still study the world and use counterfactuals to think about the role of specific interventions.
If we forced ourselves to only accept knowledge based on RCTs then we would not be able to study the areas I mentioned above. We cannot go down both paths in randomized experiments with those choices. We either ethically cannot administer an RCT or we are stuck with the way history played out. We can, however, employ counterfactuals, imagining different worlds in our heads to think about what would have happened had we gone down another path. In this process we might make errors, but we can continually learn and improve our mental models. We can study what did happen, think about what we can observe based on causal structures, and better understand what would have happened had we done something different. This is how much of human progress has moved forward, without RCTs and with counterfactuals, imagining how the world could be different, how people, places, societies, and molecules could have reacted differently with different actions and conditions.
Dose-Response Curves

Dose-Response Curves

One limitation of linear regression models, explains Judea Pearl in his book The Book of Why is that they are unable to accurately model interactions or relationships that don’t follow linear relationships. This lesson was hammered into my head by a statistics professor at the University of Nevada, Reno when discussing binomial variables. For variables where there are only two possible options, such as yes or no, a linear regression model doesn’t work. When the Challenger Shuttle’s O-ring failed, it was because the team had run a linear regression model to determine a binomial variable, the O-ring fails or it’s integrity holds. However, there are other situations where a linear regression becomes problematic.
 
 
In the book, Pearl writes, “linear models cannot represent dose-response curves that are not straight lines. They cannot represent threshold effects, such as a drug that has increasing effects up to a certain dosage and then no further effect.”
 
 
Linear relationship models become problematic when the effect of a variable is not constant over dosage. In the field of study that I was trained in, political science, this isn’t a big deal. In my field, simply demonstrating that there is a mostly consistent connection between ratings of trust in public institutions and receipt of GI benefits, for example, is usually sufficient. However, in fields like medicine or nuclear physics, it is important to recognize that a linear regression model might be ill suited to the actual reality of the variable.
 
 
A drug that is ineffective at small doses, becomes effective at moderate doses, but quickly becomes deadly at high doses shouldn’t be modeled with a linear regression model. This type of drug is one that the general public needs to be especially careful with, since so many individuals approach medicine with a “if some is good then more is better” mindset. Within physics, as was seen in the Challenger example, the outcomes can also be a matter of life. If a particular rubber for tires holds its strength but fails at a given threshold, if a rubber seal fails at a low temperature, or if a nuclear cooling pool will flash boil at a certain heat, then linear regression models will be inadequate for making predictions about the true nature of variables.
 
 
This is an important thing for us to think about when we consider the way that science is used in general discussion. We should recognize that people assume a linear relationship based on an experimental study, and we should look for binomial variables or potential non-linear relationships when thinking about a study and its conclusions. Improving our thinking about linear regression and dose-response curves can help us be smarter when it comes to things that matter like global pandemics and even more general discussions about what we think the government should or should not do.

Ignorability

Ignorability

The idea of ignorability helps us in science by playing a role in randomized trials. In the real world, there are too many potential variables to be able to comprehensively predict exactly how a given intervention will play out in every case. We almost always have outliers that have wildly different outcomes compared to what we would have predicted. Quite often some strange factor that could not be controlled or predicted caused the individual case to differ dramatically from the norm.
Thanks to concepts of ignorability, we don’t have to spend too much time worrying about the causal structures that created a single outlier. In The Book of Why Judea Pearl tries his best to provide a definition of ingorability for those who need to assess whether ignorability holds in a given outlier decision. He writes, “the assignment of patients to either treatment or control is ignorable if patients who would have one potential outcome are just as likely to be in the treatment or control group as the patients who would have a different potential outcome.”
What Pearl means is that ignorability applies when there is not a determining factor that makes people with any given outcome more likely to be in a control or treatment group. When people are randomized into control versus treatment, then there is not likely to be a commonality among people in either group that makes them more or less likely to have a given reaction. So a random outlier in one group can be expected to be offset by a random outlier in the other group (not literally a direct opposite, but we shouldn’t see a trend of specific outliers all in either treatment or control).
Ignroability does not apply in situations where there is a self-selection effect for control or treatment. In the world of the COVID-19 Pandemic, this applies in situations like human challenge trials. It is unlikely that people who know they are at risk of bad reactions to a vaccine would self-select into a human challenge trial. This same sort of thing happens with corporate health benefits initiatives, smart phone beta-testers, and general inadvertent errors in scientific studies. Outliers may not be outliers we can ignore if there is a self-selection effect, and the outcomes that we observe may reflect something other than what we are studying, meaning that we can’t apply ignorability in a way that allows us to draw a conclusion specifically on our intervention.
Alternative, Nonexistent Worlds - Judea Pearl - The Book of Why - Joe Abittan

Alternative, Nonexistent Worlds

Judea Pearl’s The Book of Why hinges on a unique ability that human animals have. Our ability to imagine alternative, nonexistent worlds is what has set us on new pathways and allowed us to dominate the planet. We can think of what would happen if we acted in a certain manner, used a tool in a new way, or if two objects collided together. We can visualize future outcomes of our actions and of the actions of other bodies and predict what can be done to create desired future outcomes.
In the book he writes, “our ability to conceive of alternative, nonexistent worlds separated us from our protohuman ancestors and indeed from any other creature on the planet. Every other creature can see what is. Our gift, which may sometimes be a curse, is that we can see what might have been.”
Pearl argues that our ability to see different possibilities, to imagine new worlds, and to be able to predict actions and behaviors that would realize that imagined world is not something we should ignore. He argues that this ability allows us to move beyond correlations, beyond statistical regressions, and into a world where our causal thinking helps drive our advancement toward the worlds we want.
It is important to note that he is not advocating for holding a belief and setting out to prove it with data and science, but rather than we use data and science combined with our ability to think causally to better understand the world. We do not have to be stuck in a state where we understand statistical techniques but deny plausible causal pathways. We can identify and define causal pathways, even if we cannot fully define causal mechanisms. Our ability to reason through alternative, nonexistent worlds is what allows us to think causally and apply this causal reasoning to statistical relationships. Doing so, Pearl argues, will save lives, help propel technological innovation, and will push science to new frontiers to improve life on our planet.
Regression Coefficients

Regression Coefficients

Statistical regression is a great thing. We can generate a scatter plot, generate a line of best fit, and measure how well that line describes the relationship between the individual points within the data. The better the line fits (the more that individual points stick close to the line) the better the line describes the relationships and trends in our data. However, this doesn’t mean that the regression coefficients tell us anything about causality. It is tempting to say that a causal relationship exists when we see a trend line with lots of tight fitting dots around and two different variables on an X and Y axis, but this can be misleading.
In The Book of Why Judea Pearl writes, “Regression coefficients, whether adjusted or not, are only statistical trends, conveying no causal information in themselves.” It is easy to forget this, even if you have had a statistics class and know that correlation does not imply causation. Humans are pattern recognition machines, but we go a step beyond simply recognizing a pattern, we instantly set about trying to understand what is causing the pattern. However, our regression coefficients and scatter plots don’t always hold clear causal information. Quite often there is a third hidden variable that cannot be measured directly that is influencing the relationship we discover in our regression coefficients.
Pearl continues, “sometimes a regression coefficient represents a causal effect, and sometimes it does not – and you can’t rely on the data alone to tell you the difference.” Imagine a graph with a regression line running through a plot of force applied by a hydraulic press and fracture rates for ceramic mugs. One axis may be pressure, and the other axis may be thickness of the ceramic mug. The individual points represent the point at which individual mugs fractured We would be able to generate a regression line by testing the fracture strength of mugs of different thickness, and from this line we would be able to develop pretty solid causal inferences about thickness and fracture rates. A clear causal link could be identified by the regression coefficients in this scenario.
However, we could also imagine a graph that plotted murder rates in European cities and the spread of Christianity. With one axis being the number of years a city has had a Catholic bishop and the other axis being the number of murders, we may find that murders decrease the longer a city has had a bishop.  From this, we might be tempted to say that Christianity (particularly the location of a Bishop in a town) reduces murder. But what would we point to as the causal mechanism? Would it be religious beliefs adopted by people interacting with the church? Would it be that marriage rules that limited polygamy ensured more men found wives and became less murderous as a result? Would it be that some divinity smiled upon the praying people and made them to be less murderous? A regression like the one I described above wouldn’t tell us anything about the causal mechanism in effect in this instance. Our causal-thinking minds, however, would still generate causal hypothesis, some of which would be reasonable but others less so (this example comes from the wonderful The WEIRDest People in the World by Joseph Henrich).
Regression coefficients can be helpful, but they are less helpful when we cannot understand the causal mechanisms at play. Understanding the causal mechanisms can help us better understand the relationship represented by the regression coefficients, but the coefficient itself only represents a relationship, not a causal structure. Approaching data and looking for trends doesn’t help us generate useful information. We must first have a sense of a potential causal mechanism, then examine the data to see if our proposed causal mechanism has support or not. This is how we can use data and find support for causal hypothesis within regression coefficients.