Words and Formulas

Words and Formulas

Scientific journal articles today are all about formulas, and in The Book of Why Judea Pearl suggests that there is a clear reason why formulas have come to dominate the world of academic studies. In his book he writes, “to a mathematician, or a person who is adequately training in the mathematical way of thinking …. a formula reveals everything: it leaves nothing to doubt or ambiguity. When reading a scientific article, I often catch myself jumping from formula to formula, skipping the words altogether. To me, a formula is a baked idea. Words are ideas in the oven.”
Formulas are scary and hard to sort out. They use Greek letters and even in fields like education, political science, or hospitality management formulas make their way into academic study. Nevertheless, if you can understand what a formula is saying, then you can understand the model that the researcher is trying to demonstrate. If you can understand the numbers that come out of a formula, you can understand something about the relationship between the variables measured in the study.
Once you write a formula, you are defining the factors that you are going to use in an analysis. You are expressing your hypothesis in concrete terms, and establishing specific values that can be analyzed in the forms of percentages, totals, ratios, or statistical coefficients.
Words, on the other hand, can be fuzzy. We can debate all day long about specific words, their definitions, registers, and implications in ways that we cannot argue over a formula. The data that goes into a formula and information that comes out is less subjective than the language and words we use to describe the data and the conclusions we draw from the information.
I like the metaphor that Pearl uses, comparing formulas to baked ideas and words to ideas within an oven. Words allow us to work our way through what we know, to tease apart small factors and attempt to attach significance to each factor. A formula requires that we cut through the potentialities and possibilities to make specific definitions that can be proven false. Words help us work our way toward a specific idea and a formula either repudiates that idea or lets it live on to face another more specific and nuanced formula in the future, with our ideas becoming more crisp over time.
Mediating Variables

Mediating Variables

Mediating variables stand in the middle of the actions and the outcomes that we can observe. They are often tied together and hard to separate from the action and the outcome, making their direct impact hard to pull apart from other factors. They play an important role in determining causal structures, and ultimately in shaping discourse and public policy about good and bad actions.
Judea Pearl writes about mediating variables in The Book of Why. He uses cigarette smoking, tar, and lung cancer as an example of the confounding nature of mediating variables. He writes, “if smoking causes lung cancer only through the formation of tar deposits, then we could eliminate the excess cancer risk by giving smokers tar-free cigarettes, such as e-cigarettes. On the other hand, if smoking causes cancer directly or through a different mediator, then e-cigarettes might not solve the problem.”
The mediator problem of tar still has not been fully disentangled and fully understood, but it is an excellent example of the importance, challenges, and public health consequences of mediating variables. Mediators can contribute directly to the final outcome we observe (lung cancer), but they may not be the only variable at play. In this instance, other aspects of smoking may directly cause lung cancer. An experiment between cigarette and e-cigarette smokers can help us get closer, but we won’t be able to say there isn’t a self-selection effect between traditional and e-cigarette smokers that plays into cancer development. However, closely studying both groups will help us start to better understand the direct role of tar in the causal chain.
Mediating variables like this pop up when we talk about the effectiveness of schools, the role for democratic norms, and the pros or cons of traditional gender roles. Often, mediating variables are driving the concerns we have for larger actions and behaviors. We want all children to go to school, but argue about the many mediating variables within the educational environment that may or may not directly contribute to specific outcomes that we want to see. It is hard to say which specific piece is the most important, because there are so many mediating variables all contributing directly or possibly indirectly to the education outcomes we see and imagine.
Dose-Response Curves

Dose-Response Curves

One limitation of linear regression models, explains Judea Pearl in his book The Book of Why is that they are unable to accurately model interactions or relationships that don’t follow linear relationships. This lesson was hammered into my head by a statistics professor at the University of Nevada, Reno when discussing binomial variables. For variables where there are only two possible options, such as yes or no, a linear regression model doesn’t work. When the Challenger Shuttle’s O-ring failed, it was because the team had run a linear regression model to determine a binomial variable, the O-ring fails or it’s integrity holds. However, there are other situations where a linear regression becomes problematic.
 
 
In the book, Pearl writes, “linear models cannot represent dose-response curves that are not straight lines. They cannot represent threshold effects, such as a drug that has increasing effects up to a certain dosage and then no further effect.”
 
 
Linear relationship models become problematic when the effect of a variable is not constant over dosage. In the field of study that I was trained in, political science, this isn’t a big deal. In my field, simply demonstrating that there is a mostly consistent connection between ratings of trust in public institutions and receipt of GI benefits, for example, is usually sufficient. However, in fields like medicine or nuclear physics, it is important to recognize that a linear regression model might be ill suited to the actual reality of the variable.
 
 
A drug that is ineffective at small doses, becomes effective at moderate doses, but quickly becomes deadly at high doses shouldn’t be modeled with a linear regression model. This type of drug is one that the general public needs to be especially careful with, since so many individuals approach medicine with a “if some is good then more is better” mindset. Within physics, as was seen in the Challenger example, the outcomes can also be a matter of life. If a particular rubber for tires holds its strength but fails at a given threshold, if a rubber seal fails at a low temperature, or if a nuclear cooling pool will flash boil at a certain heat, then linear regression models will be inadequate for making predictions about the true nature of variables.
 
 
This is an important thing for us to think about when we consider the way that science is used in general discussion. We should recognize that people assume a linear relationship based on an experimental study, and we should look for binomial variables or potential non-linear relationships when thinking about a study and its conclusions. Improving our thinking about linear regression and dose-response curves can help us be smarter when it comes to things that matter like global pandemics and even more general discussions about what we think the government should or should not do.

Teaching Statistical Thinking

Teaching Statistical Thinking

“Statistical thinking is the most useful branches of mathematics for life,” writes Gerd Gigerenzer in Risk Savvy, “and the one that children find most interesting.” I don’t have kids and I don’t teach or tutor children today, but I remember math classes of my own from elementary school math lessons to AP Calculus in high school. Most of my math education was solving isolated equations and memorizing formulas with an occasional word problem tossed in. While I was generally good at math, it was boring, and I like others questioned when I would ever use most of the math I was learning. Gerd Gigerenzer wants to change this, and he wants to do so in a way that focuses on teaching statistical thinking.
Gigerenzer continues, “teaching statistical thinking means giving people tools for problem solving in the real world. It should not be taught as pure mathematics. Instead of mechanically solving a dozen problems with the help of a particular formula, children and adolescents should be asked to find solutions to real-life problems.” 
We view statistics as incredibly complicated and too advanced for most children (and for most of us adults as well!). But if Gigerenzer’s assertion that statistical thinking and problem solving is what many children are the most excited about, then we should lean into teaching statistical thinking rather than hiding it away and saving it for advanced students. I found math classes to be alright, but I questioned how often I would need to use math, and that was before smartphones became ubiquitous. Today, most math that I have to do professionally is calculated using a spreadsheet formula. I’m glad I understand the math and calculations behind the formulas I use in spreadsheets, but perhaps learning mathematical concepts within real world examples would have been better than learning them in isolation and with essentially rote memorization practice.
Engaging with what kids really find interesting will spur learning. And doing so with statistical thinking will do more than just help kids make smart decisions on the Las Vegas Strip. Improving statistical thinking will help people understand how to appropriately respond to future pandemics, how to plan for retirement, and how think about risk in other health and safety contexts. Lots of mathematical concepts can be built into real world lessons that lean into teaching statistical thinking that goes beyond the memorization and plug-n-chug lessons that I grew up with.