Scientific Observations & Math

Scientific Observations & Math

My last post was about science and newness. Modern science values new information more than existing information and rewards research that pushes forward into new territories. What unites new science in any field with the historical information that the new science rests on, is mathematics. As Yuval Noah Harari writes in his book Sapiens, “mere observations, however, are not knowledge. In order to understand the universe, we need to connect observations into comprehensive theories. Earlier traditions usually formulated their observations into stories. Modern science uses mathematics.”
 
 
Mathematics are used to communicate observations because mathematics can be objective, precise, and evaluated for accuracy.  My experiences of reality and how I may interpret and communicate that reality is not likely to be the same as the way someone in New York City, Tokyo, or Kabul experiences, interprets, and communicates their immediate reality. However, if we chose to measure our worlds through data and agree on the scales to use, we can begin to bring our subjective experiences of reality into a unified and consistent framework. A lot of how we understand the world is subjective. For example, I run a lot and a lot of my friends run, so a three mile run sounds short to me. However, for someone who doesn’t run often and doesn’t have friends who run often, a three mile jog may as well be a 26 mile marathon. Mathematics escapes the subjective, goes beyond stories and narratives that we may develop from our subjective experiences. It ties our collective experiences together into something more objective. Mathematics allows us to go from stories to real theories.
 
 
That still doesn’t mean we all understand and interpret the numbers the same. In his recent book How to Make the World Add Up, Tim Harford shares an example of national statistics in the UK showing that the average rail car has only 100 passengers. However, in Harford’s experience, traveling at rush hour, the average rail car is completely packed with far more than 100 people. The statistics can be viewed through a different reference point, through the average passenger traveling at rush hour, or through the rail car traveling throughout the day. Without mathematics we could never describe this reality in a consistent and unified way. Our descriptions of the world would be based on narrative and story. Mathematics gives us a grounding through which we can understand the universe in a more comprehensive and generalizable manner.
Remembering Numbers

Remembering Numbers

A common theme throughout Yuval Noah Harari’s book Sapiens is the argument that Homo sapiens changed so quickly thanks to our brains that our evolution, both physiologically and psychologically, couldn’t keep up. Evolution is a slow process, but human technological and sociological change has been incredibly rapid. Our minds and bodies are still adapted to live in a world that Homo sapiens no longer inhabits.
 
 
As an example, Harari writes, “no forager needed to remember, say, the number of fruit on each tree in the forest. So human brains did not adapt to storing and processing numbers.” Math is hard, and part of the reason it is so hard is that our minds didn’t evolve to do lots of math.  Our foraging ancestors had incredible brains (as we still do) capable of keeping track of the social and political alliances within groups of 50 to 250 individuals – a huge number of potential combinations of friends, enemies, or frenemies. But foragers were not collecting taxes, were not trying to hang multiple pictures of different sizes equally on a wall, or trying to quickly remember which basketball player made a jump shot at the same time that another player committed a foul and tabulate a final score.
 
 
The human mind was not evolved for remembering numbers, and that is why recording and calculating numbers is so difficult. It is why we can be so easily confused by graphs and charts that are not well organized and put together. It is part of why it is so hard to save money now to retire later, and why credit card debt can be such an easy problem to fall into. We are good at remembering about 7 digits at once in our short term memory, but beyond that we easily become confused and start to lose track of information. The Agricultural Revolution made numbers more important beginning about 70,000 years ago, but our brains have not caught up. To make up for the difficulty of storing numbers in our heads we write numbers down on paper (or stone tablets in the distant past), use calculators to crunch numbers quicker than we can by hand, and rely on tools that can save numbers and data so that we don’t have to hold it all in our heads. Our brains simply are not up to the task of holding all the numbers we need to remember, so we have developed tools to do that for us. Don’t feel bad if you can’t remember tons of numbers, and don’t make fun of others who can’t do the same. 
Words and Formulas

Words and Formulas

Scientific journal articles today are all about formulas, and in The Book of Why Judea Pearl suggests that there is a clear reason why formulas have come to dominate the world of academic studies. In his book he writes, “to a mathematician, or a person who is adequately training in the mathematical way of thinking …. a formula reveals everything: it leaves nothing to doubt or ambiguity. When reading a scientific article, I often catch myself jumping from formula to formula, skipping the words altogether. To me, a formula is a baked idea. Words are ideas in the oven.”
Formulas are scary and hard to sort out. They use Greek letters and even in fields like education, political science, or hospitality management formulas make their way into academic study. Nevertheless, if you can understand what a formula is saying, then you can understand the model that the researcher is trying to demonstrate. If you can understand the numbers that come out of a formula, you can understand something about the relationship between the variables measured in the study.
Once you write a formula, you are defining the factors that you are going to use in an analysis. You are expressing your hypothesis in concrete terms, and establishing specific values that can be analyzed in the forms of percentages, totals, ratios, or statistical coefficients.
Words, on the other hand, can be fuzzy. We can debate all day long about specific words, their definitions, registers, and implications in ways that we cannot argue over a formula. The data that goes into a formula and information that comes out is less subjective than the language and words we use to describe the data and the conclusions we draw from the information.
I like the metaphor that Pearl uses, comparing formulas to baked ideas and words to ideas within an oven. Words allow us to work our way through what we know, to tease apart small factors and attempt to attach significance to each factor. A formula requires that we cut through the potentialities and possibilities to make specific definitions that can be proven false. Words help us work our way toward a specific idea and a formula either repudiates that idea or lets it live on to face another more specific and nuanced formula in the future, with our ideas becoming more crisp over time.
Mediating Variables

Mediating Variables

Mediating variables stand in the middle of the actions and the outcomes that we can observe. They are often tied together and hard to separate from the action and the outcome, making their direct impact hard to pull apart from other factors. They play an important role in determining causal structures, and ultimately in shaping discourse and public policy about good and bad actions.
Judea Pearl writes about mediating variables in The Book of Why. He uses cigarette smoking, tar, and lung cancer as an example of the confounding nature of mediating variables. He writes, “if smoking causes lung cancer only through the formation of tar deposits, then we could eliminate the excess cancer risk by giving smokers tar-free cigarettes, such as e-cigarettes. On the other hand, if smoking causes cancer directly or through a different mediator, then e-cigarettes might not solve the problem.”
The mediator problem of tar still has not been fully disentangled and fully understood, but it is an excellent example of the importance, challenges, and public health consequences of mediating variables. Mediators can contribute directly to the final outcome we observe (lung cancer), but they may not be the only variable at play. In this instance, other aspects of smoking may directly cause lung cancer. An experiment between cigarette and e-cigarette smokers can help us get closer, but we won’t be able to say there isn’t a self-selection effect between traditional and e-cigarette smokers that plays into cancer development. However, closely studying both groups will help us start to better understand the direct role of tar in the causal chain.
Mediating variables like this pop up when we talk about the effectiveness of schools, the role for democratic norms, and the pros or cons of traditional gender roles. Often, mediating variables are driving the concerns we have for larger actions and behaviors. We want all children to go to school, but argue about the many mediating variables within the educational environment that may or may not directly contribute to specific outcomes that we want to see. It is hard to say which specific piece is the most important, because there are so many mediating variables all contributing directly or possibly indirectly to the education outcomes we see and imagine.
Dose-Response Curves

Dose-Response Curves

One limitation of linear regression models, explains Judea Pearl in his book The Book of Why is that they are unable to accurately model interactions or relationships that don’t follow linear relationships. This lesson was hammered into my head by a statistics professor at the University of Nevada, Reno when discussing binomial variables. For variables where there are only two possible options, such as yes or no, a linear regression model doesn’t work. When the Challenger Shuttle’s O-ring failed, it was because the team had run a linear regression model to determine a binomial variable, the O-ring fails or it’s integrity holds. However, there are other situations where a linear regression becomes problematic.
 
 
In the book, Pearl writes, “linear models cannot represent dose-response curves that are not straight lines. They cannot represent threshold effects, such as a drug that has increasing effects up to a certain dosage and then no further effect.”
 
 
Linear relationship models become problematic when the effect of a variable is not constant over dosage. In the field of study that I was trained in, political science, this isn’t a big deal. In my field, simply demonstrating that there is a mostly consistent connection between ratings of trust in public institutions and receipt of GI benefits, for example, is usually sufficient. However, in fields like medicine or nuclear physics, it is important to recognize that a linear regression model might be ill suited to the actual reality of the variable.
 
 
A drug that is ineffective at small doses, becomes effective at moderate doses, but quickly becomes deadly at high doses shouldn’t be modeled with a linear regression model. This type of drug is one that the general public needs to be especially careful with, since so many individuals approach medicine with a “if some is good then more is better” mindset. Within physics, as was seen in the Challenger example, the outcomes can also be a matter of life. If a particular rubber for tires holds its strength but fails at a given threshold, if a rubber seal fails at a low temperature, or if a nuclear cooling pool will flash boil at a certain heat, then linear regression models will be inadequate for making predictions about the true nature of variables.
 
 
This is an important thing for us to think about when we consider the way that science is used in general discussion. We should recognize that people assume a linear relationship based on an experimental study, and we should look for binomial variables or potential non-linear relationships when thinking about a study and its conclusions. Improving our thinking about linear regression and dose-response curves can help us be smarter when it comes to things that matter like global pandemics and even more general discussions about what we think the government should or should not do.

Teaching Statistical Thinking

Teaching Statistical Thinking

“Statistical thinking is the most useful branches of mathematics for life,” writes Gerd Gigerenzer in Risk Savvy, “and the one that children find most interesting.” I don’t have kids and I don’t teach or tutor children today, but I remember math classes of my own from elementary school math lessons to AP Calculus in high school. Most of my math education was solving isolated equations and memorizing formulas with an occasional word problem tossed in. While I was generally good at math, it was boring, and I like others questioned when I would ever use most of the math I was learning. Gerd Gigerenzer wants to change this, and he wants to do so in a way that focuses on teaching statistical thinking.
Gigerenzer continues, “teaching statistical thinking means giving people tools for problem solving in the real world. It should not be taught as pure mathematics. Instead of mechanically solving a dozen problems with the help of a particular formula, children and adolescents should be asked to find solutions to real-life problems.” 
We view statistics as incredibly complicated and too advanced for most children (and for most of us adults as well!). But if Gigerenzer’s assertion that statistical thinking and problem solving is what many children are the most excited about, then we should lean into teaching statistical thinking rather than hiding it away and saving it for advanced students. I found math classes to be alright, but I questioned how often I would need to use math, and that was before smartphones became ubiquitous. Today, most math that I have to do professionally is calculated using a spreadsheet formula. I’m glad I understand the math and calculations behind the formulas I use in spreadsheets, but perhaps learning mathematical concepts within real world examples would have been better than learning them in isolation and with essentially rote memorization practice.
Engaging with what kids really find interesting will spur learning. And doing so with statistical thinking will do more than just help kids make smart decisions on the Las Vegas Strip. Improving statistical thinking will help people understand how to appropriately respond to future pandemics, how to plan for retirement, and how think about risk in other health and safety contexts. Lots of mathematical concepts can be built into real world lessons that lean into teaching statistical thinking that goes beyond the memorization and plug-n-chug lessons that I grew up with.