Science and Facts

Science and Facts

Science helps us understand the world and answer questions about how and why things are the way they are. But this doesn’t mean science always gives us the most accurate answers possible. Quite often science seems to suggest an answer, sometimes the answer we get doesn’t really answer the question we wanted to ask, and sometimes there is just too much noise to gain any real understanding. The inability to perfectly answer every question, especially when we present science as providing clear facts when teaching science to young children, is a point of the confusion and dismissal among those who don’t want to believe the answers that science gives us.
In Spook: Science Tackles the Afterlife, Mary Roach writes, “Of course, science doesn’t dependably deliver truths. It is as fallible as the men and women who undertake it. Science has the answer to every question that can be asked. However, science reserves the right to change that answer should additional data become available.” The science of the afterlife (really the science of life, living, death, and dying), Roach explains, has been a science of revision. What we believe, how we conduct experiments, and how we interpret scientific results has shifted as our technology and scientific methods have progressed. The science of life and death has given us many different answers over the years as our own biases have shifted and as our data and computer processing has evolved.
The reality is that all of our scientific fields of study are incomplete. There are questions we still don’t have great answers to, and as we seek those answers, we have to reconsider older answers and beliefs. We have to study contradictions and try to understand what might be wrong with the way we have interpreted the world. What we bring to science impacts what we find, and that means that sometimes we don’t find truths, but conveniently packaged answers that reinforce what we always wanted to be true. Overtime, however, the people doing the science change, the background knowledge brought to science changes, and the way we understand the answers from science changes. It can be frustrating to those of us on the outside who want clear answers and don’t want to be abused by people who wish to deliberately mislead based on incomplete scientific knowledge. But overtime science revises itself to become more accurate and to better describe the world around us.
When to Stop Counting

When to Stop Counting

Yesterday I wrote about the idea of scientific versus political numbers. Scientific numbers are those that we rely on for decision-making. They are not always better and more accurate numbers than political numbers, but they are generally based on some sort of standardized methodology and have a concrete and agreed upon backing to them. Political numbers are more or less guestimates or are formed from sources that are not confirmed to be reliable. While they can end up being more accurate than scientific figures they are harder to accept and justify in decision-making processes. In the end, the default is scientific numbers, but scientific numbers do have a flaw that keeps them from ever becoming what they proport to be. How do we know when it is time to stop counting and when we are ready to move forward with a scientific number rather than fall back on a political number?
Christopher Jencks explores this idea in his book The Homeless by looking at a survey conducted by Martha Burt at the Urban Institute. Jencks writes, “Burt’s survey provides quite a good picture of the visible homeless. It does not tell us much about those who avoid shelters, soup kitchens, and the company of other homeless individuals. I doubt that such people are numerous, but I can see no way of proving this. It is hard enough finding the proverbial needle in a haystack. It is far harder to prove that a haystack contains no more needles.” The quote shows that Burt’s survey was good at identifying the visibly homeless people, but that at some point in the survey a decision was made to stop attempting to count the less visibly homeless. It is entirely reasonable to stop counting at a certain point, as Jencks mentions it is hard to prove there are no more needles left to count, but that always means there will be a measure of uncertainty with your counting and results. Your numbers will always come with a margin of error because there is almost no way to be certain that you didn’t miss something.
Where we chose to stop counting can influence whether we should consider our numbers to be scientific numbers or political numbers. I would argue that the decision for where to stop our count is both a scientific and a political decision itself. We can make political decisions to stop counting in a way that deliberately excludes hard to count populations. Alternatively, we can continue our search to expand the count and change the end results of our search. Choosing how scientifically accurate to be with our count is still a political decision at some level.
However, choosing to stop counting can also be a rational and economic decision. We may have limited funding and resources for our counting, and be forced to stop at a reasonable point that allows us to make scientifically appropriate estimates about the remaining uncounted population. Diminishing marginal returns to our counting efforts also means at a certain point we are putting in far more effort into counting relative to the benefit of counting one more item for any given survey. This demonstrates how our numbers can be based on  scientific or political motivations, or both. These are all important considerations for us whether we are the counter or studying the results of the counting. Where we chose to stop matters, and because we likely can’t prove we have found every needle in the haystack, and that no more needles exist. No matter what, we will have to face the reality that the numbers we get are not perfect, no matter how scientific we try to make them.
Who Are the Homeless?

Who Are the Homeless

In the United States we have many housing insecure individuals. We have many people who are chronically homeless, and are unlikely to ever get off the streets. We have many people who experience homelessness only transiently, possibly during an unexpected layoff or economic downturn. And we also have many people who find themselves in and out of homelessness. For each group of housing insecure individuals, their needs and desires of people differ. However, when we think about homelessness in America, we typically only think about one version of homelessness: the visibly homeless man or woman living in the streets.
In his book Tell Them Who I Am Elliot Liebow writes, “an important fact about these dramatically visible homeless persons on the street is that, their visibility notwithstanding, they are at best a small minority, tragic caricatures of homelessness rather than representatives of it.” When we think about the homeless we think about men and women who don’t work, who are smelly and dirty, and who appear to have mental disorders or drug addictions. This means that public policy geared toward homelessness is a reaction to this visible minority, not policy geared to help the many people who may experience homelessness in a less visible way.
People do not like the visibly homeless who live on the street. They feel ashamed to see them begging, feel frustrated by their panhandling, and are often frightened of them. The visibly homeless are not a sympathetic group, and are not likely to be the targets of public policy that supports them.
The less visibly homeless, however, are a population we are less afraid of and less likely to strongly dislike. But because we don’t see them, we don’t think of them when we consider policies and programs designed to assist the homeless. Their needs, their concerns, and the things that could help them find more stable housing are forgotten or simply unknown to the general public and the policymakers they elect. We are often unaware of the individuals who are homeless but still managing to work a job. We don’t think about those who experience temporary homelessness, sleeping in a car for a couple of weeks at a time between gig work. We don’t consider those who live in shelters until a friend or family member can take them in and support them until they can find work. Without acknowledging this less visible side of poverty, we don’t take steps to improve public policy and public support for those working to maintain a place to live. We allow the most visible elements of homelessness to be all we know about homelessness, and as a result our policy and attitudes toward the homeless fail to reflect the reality that the majority of the homeless experience.
Causal Illusions - The Book of Why

Causal Illusions

In The Book of Why Judea Pearl writes, “our brains are not wired to do probability problems, but they are wired to do causal problems. And this causal wiring produces systematic probabilistic mistakes, like optical illusions.” This can create problems for us when no causal link exists and when data correlate without any causal connections between outcomes.  According to Pearl, our causal thinking, “neglects to account for the process by which observations are selected.”  We don’t always realize that we are taking a sample, that our sample could be biased, and that structural factors independent of the phenomenon we are trying to observe could greatly impact the observations we actually make.
Pearl continues, “We live our lives as if the common cause principle were true. Whenever we see patterns, we look for a causal explanation. In fact, we hunger for an explanation, in terms of stable mechanisms that lie outside the data.” When we see a correlation our brains instantly start looking for a causal mechanism that can explain the correlation and the data we see. We don’t often look at the data itself to ask if there was some type of process in the data collection that lead to the outcomes we observed. Instead, we assume the data is correct and  that the data reflects an outside, real-world phenomenon. This is the cause of many causal illusions that Pearl describes in the book. Our minds are wired for causal thinking, and we will invent causality when we see patterns, even if there truly isn’t a causal structure linking the patterns we see.
It is in this spirit that we attribute negative personality traits to people who cut us off on the freeway. We assume they don’t like us, that they are terrible people, or that they are rushing to the hospital with a sick child so that our being cut off has a satisfying causal explanation. When a particular type of car stands out and we start seeing that car everywhere, we misattribute our increased attention to the type of car and assume that there really are more of those cars on the road now. We assume that people find them more reliable or more appealing and that people purposely bought those cars as a causal mechanism to explain why we now see them everywhere. In both of these cases we are creating causal pathways in our mind that in reality are little more than causal illusions, but we want to find a cause to everything and we don’t always realize that we are doing so. It is important that we be aware of these causal illusions when making important decisions, that we think about how the data came to mind, and whether there is a possibility of a causal illusion or cognitive error at play.
Stories from Bid Data

Stories from Big Data

Dictionary.com describes datum (the singular of data) as “a single piece of information; any fact assumed to be a matter of direct observation.” So when we think about big data, we are thinking about massive amounts of individual pieces of information or individual facts from direct observation. Data simply are what they are, facts and individual observations in isolation.
On the other hand Dictionary.com defines information as “knowledge communicated or received concerning a particular fact or circumstance.” Information is the knowledge, story, and ideas we have about the data. These two definitions are important for thinking about big data. We never talk about big information, but the reality is that big data is less important than the knowledge we generate from the data, and that isn’t as objective as the individual datum.
In The Book of Why Judea Pearl writes, “a generation ago, a marine biologist might have spent months doing a census of his or her favorite species. Now the same biologist has immediate access online to millions of data points on fish, eggs, stomach contents, or anything else he or she wants. Instead of just doing a census, the biologist can tell a story.” Science has become contentious and polarizing recently, and part of the reason has to do with the stories that we are generating based on the big data we are collecting. We can see new patterns, new associations, new correlations, and new trends in data from across the globe. As we have collected this information, our impact on the planet, our understanding of reality, and how we think about ourselves in the universe has changed. Science is not simply facts, that is to say it is not just data. Science is information, it is knowledge and stories that have continued to challenge the narratives we have held onto as a species for thousands of years.
Judea Pearl thinks it is important to recognize the story aspect of big data. He thinks it is crucial that we understand the difference between data and information, because without doing so we turn to the data blindly and can generate an inaccurate story based on what we see. He writes,
“In certain circles there is an almost religious faith that we can find the answers to … questions in the data itself, if only we are sufficiently clever at data mining. However, readers of this book will know that this hype is likely to be misguided. The questions I have just asked are all causal, and causal questions can never be answered from data alone.”
Big data presents us with huge numbers of observations and facts, but those facts alone don’t represent causal structures or deeper interactions within reality. We have to generate information from the data and combine that new knowledge with existing knowledge and causal hypothesis to truly learn something new from big data. If we don’t then we will simply be identifying meaningless correlations without truly understanding what they mean or imply.
Complex Causation Continued

Complex Causation Continued

Our brains are good at interpreting and detecting causal structures, but often, the real causal structures at play are more complicated than what we can easily see. A causal chain may include a mediator, such as citrus fruit providing vitamin C to prevent scurvy. A causal chain may have a complex mediator interaction, as in the example of my last post where a drug leads to the body creating an enzyme that then works with the drug to be effective. Additionally, causal chains can be long-term affairs.
In The Book of Why Judea Pearl discusses long-term causal chains writing, “how can you sort out the causal effect of treatment when it may occur in many stages and the intermediate variables (which you might want to use as controls) depend on earlier stages of treatment?”
This is an important question within medicine and occupational safety. Pearl writes about the fact that factory workers are often exposed to chemicals over a long period, not just in a single instance. If it was repeated exposure to chemicals that caused cancer or another disease, how do you pin that on the individual exposures themselves? Was the individual safe with 50 exposures but as soon as a 51st exposure occurred the individual developed a cancer? Long-term exposure to chemicals and an increased cancer risk seems pretty obvious to us, but the actual causal mechanism in this situation is a bit hazy.
The same can apply in the other direction within the field of medicine. Some cancer drugs or immune therapy treatments work for a long time, stop working, or require changes in combinations based on how disease has progressed or how other side effects have manifested. Additionally, as we have all learned over the past year with vaccines, some medical combinations work better with boosters or time delayed components. Thinking about causality in these kinds of situations is difficult because the differing time scopes and combinations make it hard to understand exactly what is affecting what and when. I don’t have any deep answers or insights into these questions, but simply highlight them to again demonstrate complex causation and how much work our minds must do to fully understand a causal chain.
Complex Causation

Complex Causation

In linear causal models the total effect of an action is equal to the direct effect of that action and its indirect effect. We can think of an oversimplified anti-tobacco public health campaign to conceptualize this equation. A campaign could be developed to use famous celebrities in advertisements against smoking. This approach may have a direct effect on teen smoking rates if teens see the advertisements and decide not to smoke as a result of the influential messaging from their favorite celebrity. This approach may also have indirect effects. Imagine a teen who didn’t see the advertising, but their best friend did see it. If their best friend was influenced, then they may adopt their friend’s anti-smoking stance. This would be an indirect effect of the advertising campaign in the positive direction. The total effect of the campaign would then be the kids who were directly deterred from smoking combined with those who didn’t smoke because their friends were deterred.
However, linear causal models don’t capture all of the complexity that can exist within causal models. As Judea Pearl explains in The book of Why, there can be complex causal models where the equation that I started this post with doesn’t hold. Pearl uses a drug used to treat a disease as an example of a situation where the direct effect and indirect effect of a drug don’t equal the total effect. He says that in situations where a drug causes the body to release an enzyme that then combines with the drug to treat a disease, we have to think beyond the equation above. In this case he writes, “the total effect is positive but the direct and indirect effects are zero.”
The drug itself doesn’t do anything to combat the disease. It stimulates the release of an enzyme and without that enzyme the drug is ineffective against the disease. The enzyme also doesn’t have a direct effect on the disease. The enzyme is only useful when combined with the drug, so there is no indirect effect that can be measured as a result of the original drug being introduced. The effect is mediated between the interaction of both the drug and enzyme together. In the model Pearl shows us, there is only the mediating effect, not a direct or indirect effect.
This model helps us see just how complicated ideas and conceptions of causation are. Most of the time we think about direct effects, and we don’t always get to thinking about indirect effects combined with direct effects. Good scientific studies are able to capture the direct and indirect effects, but to truly understand causation today, we have to be able to include mediating effects in complex causation models like the one Pearl describes.
Counterfactuals

Counterfactuals

I have written a lot lately about the incredible human ability to imagine worlds that don’t exist. An important way that we understand the world is by imagining what would happen if we did something that we have not yet done or if we imagine what would have happened had we done something different in the past. We are able to use our experiences about the world and our intuition on causality to imagine a different state of affairs from what currently exists. Innovation, scientific advancements, and social cooperation all depend on our ability to imagine different worlds and intuit causal chains between our current world and the imagined reality we desire.
In The Book of Why Jude Pearl writes, “counterfactuals are an essential part of how humans learn about the world and how our actions affect it. While we can never walk down both the paths that diverge in a wood, in a great many cases we can know, with some degree of confidence, what lies down each.”
A criticism of modern science and statistics is the reliance on randomized controlled trials and the fact that we cannot run an RCT on many of the things we study. We cannot run RCTs on our planet to determine the role of meteor impacts or lightning strikes on the emergence of life. We cannot run RCTs on the toxicity of snake venoms in human subjects. We cannot run RCTs on giving stimulus checks  to Americans during the COVID-19 Pandemic. Due to physical limitations and ethical considerations, RCTs are not always possible. Nevertheless, we can still study the world and use counterfactuals to think about the role of specific interventions.
If we forced ourselves to only accept knowledge based on RCTs then we would not be able to study the areas I mentioned above. We cannot go down both paths in randomized experiments with those choices. We either ethically cannot administer an RCT or we are stuck with the way history played out. We can, however, employ counterfactuals, imagining different worlds in our heads to think about what would have happened had we gone down another path. In this process we might make errors, but we can continually learn and improve our mental models. We can study what did happen, think about what we can observe based on causal structures, and better understand what would have happened had we done something different. This is how much of human progress has moved forward, without RCTs and with counterfactuals, imagining how the world could be different, how people, places, societies, and molecules could have reacted differently with different actions and conditions.
Alternative, Nonexistent Worlds - Judea Pearl - The Book of Why - Joe Abittan

Alternative, Nonexistent Worlds

Judea Pearl’s The Book of Why hinges on a unique ability that human animals have. Our ability to imagine alternative, nonexistent worlds is what has set us on new pathways and allowed us to dominate the planet. We can think of what would happen if we acted in a certain manner, used a tool in a new way, or if two objects collided together. We can visualize future outcomes of our actions and of the actions of other bodies and predict what can be done to create desired future outcomes.
In the book he writes, “our ability to conceive of alternative, nonexistent worlds separated us from our protohuman ancestors and indeed from any other creature on the planet. Every other creature can see what is. Our gift, which may sometimes be a curse, is that we can see what might have been.”
Pearl argues that our ability to see different possibilities, to imagine new worlds, and to be able to predict actions and behaviors that would realize that imagined world is not something we should ignore. He argues that this ability allows us to move beyond correlations, beyond statistical regressions, and into a world where our causal thinking helps drive our advancement toward the worlds we want.
It is important to note that he is not advocating for holding a belief and setting out to prove it with data and science, but rather than we use data and science combined with our ability to think causally to better understand the world. We do not have to be stuck in a state where we understand statistical techniques but deny plausible causal pathways. We can identify and define causal pathways, even if we cannot fully define causal mechanisms. Our ability to reason through alternative, nonexistent worlds is what allows us to think causally and apply this causal reasoning to statistical relationships. Doing so, Pearl argues, will save lives, help propel technological innovation, and will push science to new frontiers to improve life on our planet.
Laboratory Proof

Laboratory Proof

“If the standard of laboratory proof had been applied to scurvy,” writes Judea Pearl in The Book of Why, “then sailors would have continued dying right up until the 1930’s, because until the discovery of vitamin C, there was no laboratory proof that citrus fruits prevented scurvy.” Pearl’s quote shows that high scientific standards for definitive and exact causality are not always for the greater good. Sometimes modern science will spurn clear statistical relationships and evidence because statistical relationships alone cannot be counted on as concrete causal evidence. A clear answer will not be given because some marginal unknowns may still exist, and this can have its own costs.
Sailors did not know why or how citrus fruits prevented scurvy, but observations demonstrated that citrus fruits managed to prevent scurvy. There was no clear understanding of what scurvy was or why citrus fruits were helpful, but it was commonly understood that a causal relationship existed. People acted on these observations and lives were saved.
On two episodes, the Don’t Panic Geocast has talked about journal articles in the British Medical Journal that make the same point as Pearl. As a critique of the need for randomized controlled trials, the two journal articles highlight the troubling reality that there have not been any randomized controlled trials on the effectiveness of parachute usage when jumping from airplanes. The articles are hilarious and clearly satirical, but ultimately come to the same point that Pearl does with the quote above – laboratory proof is not always necessary, practical, or reasonable when lives are on the line.
Pearl argues that we can rely on our abilities to identify causality even without laboratory proof when we have sufficient statistical analysis and understanding of relationships. Statisticians always tell us that correlation is not causation and that observational studies are not sufficient to determine causality, yet the citrus fruit and parachute examples highlight that this mindset is not always appropriate. Sometimes more realistic and common sense understanding of causation – even if supported with just correlational relationships and statistics – are more important than laboratory proof.