Avoiding Gambles

Avoiding Gambles

“Most people dislike risk (the chance of receiving the lowest possible outcome), and if they are offered a choice between a gamble and an amount equal to its expected value they will pick the sure thing,” writes Daniel Kahneman in Thinking Fast and Slow. I don’t want to get too far into expected value, but in my mind I think of it as a discount on the total value of the best outcome of a gamble blended with the possibility of getting nothing. Rather than the expected value of a $100 dollar bet being $100, the expected value is going to come in somewhere less than that, maybe around $50, $75, or $85 dollars depending on whether the odds of winning the bet are so-so or are pretty good. You will either win $100 or 0, not $50, $75, or $85, but the risk factor causes us to value the bet at less than the full amount up for grabs.

 

What Kahneman describes in his book is an interesting phenomenon where people will mentally (or maybe subjectively is the better way to put it) calculate an expected value in their head when faced with a betting opportunity. If the expected value of the bet that people calculate for themselves is not much higher than a guaranteed option, people will pick the guaranteed option. The quote I used to open the post explains the phenomenon which you have probably seen if you have watched enough game show TV. As Kahneman continues, “In fact a risk-averse decision maker will choose a sure thing that is less than the expected value, in effect paying a premium to avoid the uncertainty.”

 

On game shows, people will frequently walk away from the big possibility of a pay off with a modest sum of cash if they are risk averse or if the odds seem really stacked against them. What is interesting is that we can study when people make the bet versus when people walk away, and observe patterns in our decision making. It turns out we can predict the situations that drive people toward avoiding gambles, and the situations which encourage them. It turns out that the reward has to be about two times the possible loss before people will make a gamble. If the certain outcome is pretty close to the expected outcome, people will pick the certain outcome. If there is no certain outcome, people usually need a reward that is at least 2X what they might lose before people will be comfortable with a bet. We might like to take chances and gamble from time to time, but we tend to be pretty risk averse and we tend to prefer guaranteed outcomes, even at a slight cost over the expected value of a bet, than to lose it all.
Regression to the Mean Versus Causal Thinking

Regression to the Mean Versus Causal Thinking

Regression to the mean, the idea that there is an average outcome that can be expected and that overtime individual outliers from the average will revert back toward that average, is a boring phenomenon on its own. If you think about it in the context of driving to work and counting your red lights, you can see why it is a rather boring idea. If you normally hit 5 red lights, and one day you manage to get to work with just a single red light, you probably expect that the following day you won’t have as much luck with the lights, and will probably have more red lights than than your lucky one red light commute. Conversely, if you have a day where you manage to hit every possible red light, you would probably expect to have better traffic luck the next day and be somewhere closer to your average. This is regression to the mean. Simply because you had only one red or managed to hit every red one day doesn’t cause the next day’s traffic light stoppage to be any different, but you know you will probably have a more average count of reds versus greens – no causal explanation involved, just random traffic light luck.

 

But for some reason this idea is both fascinating and hard to grasp in other areas, especially if we think that we have some control of the outcome. In Thinking Fast and Slow, Daniel Kahneman helps explain why it is so difficult in some settings for us to accept regression to the mean, what is otherwise a rather boring concept. He writes,

 

“Our mind is strongly biased toward causal explanations and does not deal well with mere statistics. When our attention is called to an event, associative memory will look for its cause – more precisely, activation will automatically spread to any cause that is already stored in memory. Causal explanations will be evoked when regression is detected, but they will be wrong because the truth is that regression to the mean has an explanation but does not have a cause.”

 

Unless you truly believe that there is a god of traffic lights who rules over your morning commute, you probably don’t assign any causal mechanism to your luck with red lights. But when you are considering how well a professional golfer played on the second day of a tournament compared to the first day, or when you are considering whether intelligent women marry equally intelligent men, you are likely to have some causal idea that comes to mind. The golfer was more or less complacent on the second day – the highly intelligent women have to settle for less intelligent men because the highly intelligent men don’t want an intellectual equal. These are examples that Kahneman uses in the book and present plausible causal mechanisms, but as Kahneman shows, the more simple though boring answer is simply regression to the mean. A golfer who performs spectacularly on day one is likely to be less lucky on day two. A highly intelligent woman is likely to marry a man with intelligence closer to average just by statistical chance.

 

When regression to the mean violates our causal expectation it becomes an interesting and important concept. It reveals that our minds don’t simply observe an objective reality, they observe causal structures that fit with preexisting narratives. Our causal conclusions can be quite inaccurate, especially if they are influenced by biases and prejudices that are unwarranted. If we keep regression to the mean in mind, we might lose some of our exciting narratives, but our thinking will be more sound, and our judgments more clear.
Base Rates Joe Abittan

Base Rates

When we think about individual outcomes we usually think about independent causal structures. A car accident happened because a person was switching their Spotify playlist and accidently ran a red light. A person stole from a grocery store because they had poor moral character which came from a poor cultural upbringing. A build-up of electrical potential from the friction of two air masses rushing past each other caused a lightning strike.

 

When we think about larger systems and structures we usually think about more interconnected and somewhat random outcomes that we don’t necessarily observe on a case by case basis, but instead think about in terms of likelihoods and conditions which create the possibilities for a set of events and outcomes. Increasing technological capacity in smartphones with lagging technological capacity in vehicles created a tension for drivers who wanted to stream music while operating vehicles, increasing the chances of a driver error accident. A stronger US dollar made it more profitable for companies to employ workers in other countries, leading to a decline in manufacturing jobs in US cities and people stealing food as they lost their paychecks.  Earth’s tilt toward the sun led to a difference in the amount of solar energy that northern continental landmasses experienced, creating a temperature and atmospheric gradient which led to lightning producing storms and increased chances of lightning in a given region.

 

What I am trying to demonstrate in the two paragraphs above is a tension between thinking statistically versus thinking causally. It is easy to think causally on a case by case basis, and harder to move up the ladder to think about statistical likelihoods and larger outcomes over entire complex systems. Daniel Kahneman presents these two types of thought in his book Thinking Fast and Slow writing:

 

Statistical base rates are facts about a population to which a case belongs, but they are not relevant to the individual case. Causal base rates change your view of how the individual case came to be.”

 

It is more satisfying for us to assign agency to a single individual than to consider that individual’s actions as being part of a large and complex system that will statistically produce a certain number of outcomes that we observe. We like easy causes, and dislike thinking about statistical likelihoods of different events.

 

“Statistical base rates are generally underweighted, and sometimes neglected altogether, when specific information about the case at hand is available.
Causal base rates are treated as information about the individual case and are easily combined with other case-specific information.”

 

The base rates that Kahneman describes can be thought of as the category or class to which we assign something. We can use different forms of base rates to support different views and opinions. Shifting the base rate from a statistical base rate to a causal base rate may change the way we think about whether a person is deserving of punishment, or aid, or indifference. It may change how we structure society, design roads, and conduct cost-benefit analyses for changing programs or technologies. Looking at the world through a limited causal base rate will give us a certain set of outcomes that might not generalize toward the rest of the world, and might cause us to make erroneous judgments about the best ways to organize ourselves to achieve the outcomes we want for society.
How We Chose to Measure Risk

How We Chose to Measure Risk

Risk is a tricky thing to think about, and how we chose to measure and communicate risk can make it even more challenging to comprehend. Our brains like to categorize things, and categorization is easiest when the categories are binary or represent three or fewer distinct possibilities. Once you start adding options and different possible outcomes, decisions quickly become overwhelmingly complex, and our minds have trouble sorting through the possibilities. In his book Thinking Fast and Slow, Daniel Kahneman discusses the challenges of thinking about risk, and highlights another level of complexity in thinking about risk: what measurements we are going to use to communicate and judge risk.

 

Humans are pretty good at estimating coin flips – that is to say that our brains do ok with binary 50-50 outcomes (although as Kahneman shows in his book this can still trip us up from time to time). Once we have to start thinking about complex statistics, like how many people will die from cancer caused by smoking if they smoke X number of packs of cigarettes per month for X number of years, our brains start to have trouble keeping up. However, there is an additional decision that needs to be layered on top statistics such as cigarette related death statistics before we can begin to understand them. That decision is how we are going to report the death statistics.  Will we chose to report deaths per thousand smokers? Will we chose to report the number of packs smoked for a number of years? Will we just chose to report deaths among all smokers, regardless as to whether they smoked one pack per month or one pack before lunch every day?

 

Kahneman writes, “the evaluation of the risk depends on the choice of a measure – with the obvious possibility that the choice may have been guided by a preference for one outcome or another.”

 

Political decisions cannot be escaped, even when we are trying to make objective and scientific statements about risk. If we want to convey that something is dangerous, we might chose to report overall death numbers across the country. Those death numbers might sound like a large number, even though they may represent a very small fraction of incidents. In our lives today, this may be done with COVID-19 deaths, voter fraud instances, or wildfire burn acreage. Our brains will have a hard time comprehending risk in each of these areas, and adding the complexity of how that risk is calculated, measured, and reported can make virtually impossible for any of us to comprehend risk. Clear and accurate risk reporting is vital for helping us understand important risks in our lives and in society, but the entire process can be derailed if we chose measures that don’t accurately reflect risk or that muddy the waters of exactly what the risk is.
Cause and Chance

Cause and Chance

Recently I have written a lot about our mind’s tendency toward causal thinking, and how this tendency can sometimes get our minds in trouble. We make associations and predictions based on limited information and we are often influenced by biases that we are not aware of. Sometimes, our brains need to shift out of our causal framework and think in a more statistical manner, but we rarely seem to do this well.

 

In Thinking Fast and Slow, Daniel Kahneman writes, “The associative machinery seeks causes. The difficulty we have with statistical regularities is that they call for a different approach. Instead of focusing on how the event at hand came to be, the statistical view relates it to what could have happened instead. Nothing in particular caused it to be what it is – chance selected it from among its alternatives.”

 

This is hard for us to accept. We want there to be a reason for why one candidate won a toss-up election and the other lost. We want there to be a reason for why the tornado hit one neighborhood, and not the adjacent neighborhood. Our mind wants to find patterns, it wants to create associations between events, people, places, and things. It isn’t happy when there is a large amount of data, unknown variables, and some degree of randomness that can influence exactly what we observe.

 

Statistics, however, isn’t concerned with our need for intelligible causal structures. Statistics is fine with a coin flip coming up heads 9 times in a row, and the 10th flip still having a 50-50 shot of being heads.

 

Our minds don’t have the ability to hold multiple competing narratives at one time. In national conversations, we seem to want to split things into 2 camps (maybe this is just an artifact of the United States having a winner take all political system) where we have to sides to an argument and two ways of thinking and viewing the world. I tend to think in triads, and my writing often reflects that with me presenting a series of three examples of a phenomenon. When we need to hold 7, 15, or 100 different potential outcomes in our mind, we are easily overwhelmed. Accepting strange combinations that don’t fit with a simple this-or-that causal structure is hard for our minds, and in many cases being so nuanced is not very rewarding. We can generalize and make substitutions in these complex settings and usually do just fine. We can trick our selves to believing that we think statistically, even if we are really only justifying the causal structures and hypotheses that we want to be true.

 

However, sometimes, as in some elections, in understanding cancer risk, and making cost benefit analyses of traffic accidents for freeway construction, thinking statistically is important. We have to understand that there is a range of outcomes, and only so many predictions we can make. We can develop aids to help us think through these statistical decisions, but we have to recognize that our brains will struggle. We can understand our causal tendencies and desires, and recognize the difficulties of accepting statistical information to help set up structures to enable us to make better decisions.
Statistical Artifacts

Statistical Artifacts

When we have good graphs and statistical aids, thinking statistically can feel straightforward and intuitive. Clear charts can help us tell a story, can help us visualize trends and relationships, and can help us better conceptualize risk and probability. However, understanding data is hard, especially if the way that data is collected creates statistical artifacts.

 

Yesterday’s post was about extreme outcomes, and how it is the smallest counties in the United States where we see both the highest per capita instances of cancer and the lowest per capita instances of cancer. Small populations allow for large fluctuations in per capita cancer diagnoses, and thus extreme outcomes in cancer rates. We could graph the per capita rates, model them on a map of the United States, or present the data in unique ways, but all we would really be doing is creating a visual aid influenced by statistical artifacts from the samples we used. As Daniel Kahneman explains in his book Thinking Fast and Slow, “the differences between dense and rural counties do not really count as facts: they are what scientists call artifacts, observations that are produced entirely by some aspect of the method of research – in this case, by differences in sample size.”

 

Counties in the United States vary dramatically. Some counties are geographically huge, while others are pretty small – Nevada’s is a large state with over 110,000 square miles of land but only 17 counties compared to West Virginia with under 25,000 square feet of land and 55 counties. Across the US, some counties are exclusively within metropolitan areas, some are completely within suburbs, some are entirely rural with only a few hundred people, and some manage to incorporate major metros, expansive suburbs, and vast rural stretches (shoutout to Clark County, NV). They are convenient for collecting data, but can cause problems when analyzing population trends across the country. The variations in size and other factors creates the possibility for the extreme outcomes we see in things like cancer rates across counties. When smoothed out over larger populations, the disparities in cancer rates disappears.

 

Most of us are not collecting lots of important data for analysis each day. Most of us probably don’t have to worry too  much on a day to day basis about some important statistical sampling problem. But we should at least be aware of how complex information is, and how difficult it can be to display and share information in an accurate manner. We should turn to people like Tim Harford for help interpreting and understanding complex statistics when we can, and we should try to look for factors that might interfere with a convenient conclusion before we simply believe what we would like to believe about a set of data. Statistical artifacts can play a huge role in shaping the way we understand a particular phenomenon, and we shouldn’t jump to extreme conclusions based on poor data.
Extreme Outcomes

Extreme Outcomes

Large sample sizes are important. At this moment, the world is racing as quickly as possible toward a vaccine to allow us to move forward from the COVID-19 Pandemic. People across the globe are anxious for a way to resume normal life and to reduce the risk of death from the new virus and disease. One thing standing in the way of the super quick solution that everyone wants is basic statistics. For any vaccine or treatment, we need a large sample size to be certain of the effects of anything we offer to people as a cure or for prevention of COVID-19. We want to make sure we don’t make decisions based on extreme outcomes, and that what we produce is safe and effective.

 

Statistics and probability are frequent parts of our lives, and many of us probably feel as though we have a basic and sufficient grasp of both. The reality, however, is that we are often terrible with thinking statistically. We are much better at thinking in narrative, and often we substitute a narrative interpretation for a statistical interpretation of the world without even recognizing it. It is easy to change our behavior based on anecdote and narrative, but not always so easy to change our behavior based on statistics. This is why we have the saying often attributed to Stalin: One death is a tragedy, a million deaths is a statistic.

 

The danger with anecdotal and narrative interpretations of the world is that they are drawn from small sample sizes. Daniel Kahneman explains the danger of small sample sizes in his book Thinking Fast and Slow, “extreme outcomes (both high and low) are more likely to be found in small than in large samples. This explanation is not causal.”

 

In his book, Kahneman explains that when you look at counties in the United States with the highest rates of cancer, you find that some of the smallest counties in the nation have the highest rates of cancer. However, if you look at which counties have the lowest rates of cancer, you will also find that it is the smallest counties in the nation that have the lowest rates. While you could drive across the nation looking for explanations to the high and low cancer rates in rural and small counties, you likely wouldn’t find a compelling causal explanation. You might be able to string a narrative together and if you try really hard you might start to see a causal chain, but your interpretation is likely to be biased and based on flimsy evidence. The fact that our small counties are the ones that have the highest and lowest rates of cancer is an artifact of small sample sizes. When you have small sample sizes, as Kahneman explains, you are likely to see more extreme outcomes. A few random chance events can dramatically change the rate of cancer per thousand residents when you only have a few thousand residents in small counties. In larger more populated counties, you find a reversion to the mean, and few extreme chance outcomes outcomes are less likely to influence the overall statistics.

 

To prevent our decision-making from being overly influenced by extreme outcomes we have to move past our narrative and anecdotal thinking. To ensure that a vaccine for the coronavirus or a cure for COVID-19 is safe and effective, we must allow the statistics to play out. We have to have large sample sizes, so that we are not influenced by extreme outcomes, either positive or negative, that we see when a few patients are treated successfully. We need the data to ensure that the outcomes we see are statistically sound, and not an artifact of chance within a small sample.
Probability Judgments

Probability Judgments

Julia Marcus, an epidemiologist at Harvard Medical School, was on a recent episode of the Ezra Klein show to discuss thinking about personal risk during the COVID-19 Pandemic. Klein and Marcus talked about the ways in which the United States Government has failed to help provide people with structures for thinking about risk, and how this has pushed risk decisions onto individuals. They talked about how this creates pressures on each of us to determine what activities are worthwhile, what is too risky for us, and how we can know if there is a high probability of infection in one setting relative to another.

 

On the podcast they acknowledged what Daniel Kahneman writes about in his book Thinking Fast and Slow – humans are not very good at making probability judgments. Risk is all about probability. It is fraught with uncertainty, with with small likelihoods of very bad outcomes, and with conflicting opinions and desires. Our minds, especially our normal operating mode of quick associations and judgments, doesn’t have the capacity to think statistically in the way that is necessary to make good probability judgments.

 

When we try to think statistically, we often turn to substitutions, as Kahneman explains in his book. “We asked ourselves how people manage to make judgments of probability without knowing precisely what probability is. We concluded that people must somehow simplify that impossible task and we set out to find how they do it. Our answer was that when called upon to judge probability, people actually judge something else and believe they have judged probability.”

 

This is very important when we think about our actions, and the actions of others, during this pandemic. We know it is risky to have family dinners with our loved ones, and we ask ourselves if it is too risky to get together with our parents, with siblings who are at risk due to health conditions, and if we shouldn’t be in the same room with a family member who is a practicing medical professional. But in the end, we answer a different question. We ask how much we miss our parents, if we think it is important to be close to our family, and if we really really want some of mom’s famous pecan pie.

 

As Klein and Marcus say during the podcast, it is a lot easier to be angry at people at a beach than to make probability judgments about a small family dinner. When governments, public health officials, and employers fail to establish systems to help us navigate the risk, we place the responsibility back onto individuals, so that we can have someone to blame, some sense of control, and an outlet for the frustrations that arise when our mind can’t process probability. We distort probability judgments and ask more symbolic questions about social cohesion, family love, and isolation. The answer to our challenges would be better and more responsive institutions and structures to manage risk and mediate probability judgments. The individual human mind can only substitute easier questions for complex probability judgments, and it needs visual aids, better structures, and guidance to help think through risk and probability in an accurate and reasonable manner.
Causal Versus Statistical Thinking

Causal Versus Statistical Thinking

Humans are naturally causal thinkers. We observe things happening in the world and begin to apply a causal reason to them, asking what could have led to the observation we made. We attribute intention and desire to people and things, and work out a narrative that explains why things happened the way they did.

 

The problem, however, is that we are prone to lots of mistakes when we think in this way. Especially when we start looking at situations that require statistical thinking. In his book Thinking Fast and Slow, Daniel Kahneman writes the following:

 

“The prominence of causal intuitions is a recurrent theme in this book because people are prone to apply causal thinking inappropriately, to situations that require statistical reasoning. Statistical thinking derives conclusions about individual cases from properties of categories and ensembles. Unfortunately, System 1 does not have the capability for this mode of reasoning; system 2 can learn to think statistically, but few people receive the necessary training.”

 

System 1 is our fast brain. It works quickly to identify associations and patters, but it doesn’t take in a comprehensive set of information and isn’t able to do much serious number crunching. System 2 is our slow brain, able to do the tough calculations, but limited to work on the set of data that System 1 is able to accumulate. Also, System 2 is only active for short periods of time, and only when we consciously make use of it.

 

This leads to our struggles with causal thinking. We have to take in a wide range of possibilities, categories, and ranges of combinations. We have to make predictions and understand that in some set of instances we will see one outcome, but in another set of circumstances we may see a different outcome. Statistical thinking doesn’t pin down a concrete answer the way our causal thinking likes. As a result, we reach conclusions based on incomplete considerations, we ignore some important pieces of information, and we assume that we are correct because our answer feels correct and satisfies some criteria. Thinking causally can be powerful and useful, but only if we fully understand the statistical dimensions at hand, and can fully think through the implications of the causal structures we are defining.
Detecting Simple Relationships

Detecting Simple Relationships

System 1, in Daniel Kahneman’s picture of the mind, is the part of our brain that is always on. It is the automatic part of our brain that detects simple relationships in the world, makes quick assumptions and associations, and reacts to the world before we are even consciously aware of anything. It is contrasted against System 2, which is more methodical, can hold complex and competing information, and can draw rational conclusions from detailed information through energy intensive thought processes.

 

According to Kahneman, we only engage System 2 when we really need to. Most of the time, System 1 does just fine and saves us a lot of energy. We don’t need to have to think critically about what we need to do when the stoplight changes from green to yellow to red. Our System 1 can develop an automatic response so that we let off the gas and come to a stop without having to consciously think about every action involved in slowing down at an intersection. However, System 1 has some very serious limitations.

 

“System 1 detects simple relations (they are all alike, the son is much taller than the father) and excels at integrating information about one thing, but it does not deal with multiple distinct topics at once, nor is it adept at using purely statistical information.”

 

When relationships start to get complicated, like say the link between human activities and long term climate change, System 1 will let us down. It also fails us when we see someone who looks like they belong to the Hell’s Angels on a father-daughter date at an ice cream shop, when we see someone who looks like an NFL linebacker in a book club, or when we see a little old lady driving a big truck. System 1 makes assumptions about the world based on simple relationships, and is easily surprised. It can’t calculate unique and edge cases, and it can’t hold complicated statistical information about multiple actors and factors that influence the outcome of events.

 

System 1 is our default, and we need to remember where its strengths and where its weaknesses are. It can help us make quick decisions while driving or catching an apple falling off a counter, but it can’t help us determine whether a defendant in a criminal case is guilty. There are times when our intuitive assumptions and reactions are spot on, but there are a lot of times when they can lead us astray, especially in cases that are not simple relationships and violate our expectations.