Random Clusters

Random Clusters

The human mind is not good at randomness. The human mind is good at identifying and seeing patterns. The mind is so good at patter recognition and so bad at randomness that we will often perceive a pattern in a situation where no pattern exists. We have trouble accepting that statistics are messy and don’t always follow a set pattern that we can observe and understand.
 
 
Steven Pinker points this out in his book The Better Angels of Our Nature and I think it is an important point to keep in mind. He writes, “events that occur at random will seem to come in clusters, because it would take a nonrandom process to space them out.” This problem of our perception of randomness comes into play when our music streaming apps shuffle songs at random. If we have a large library of our favorite songs to chose from, some of those songs will be by the same artist. If we hear two or more songs from the artist back to back, we will assume there is some sort of problem with the random shuffling of the streaming service. We should expect to naturally get clusters of songs by the same artist or even off the same album, but it doesn’t feel random to us when it happens. To solve this problem, music streaming services deliberately add algorithms that stop songs from the same artist from appearing in clusters. This makes the shuffle less random overall, but makes the perception of the shuffle feel more random to us.
 
 
Pinker uses lightning to describe the process in more detail. “Lightning strikes are an example of what statisticians call a Poisson process,” he writes. “In a Poisson process, events occur continuously, randomly, and independently of one another. … in a Poisson process the intervals between events are distributed exponentially: there are lots of short intervals and fewer and fewer of them as they get longer and longer.”
 
 
To understand a Poisson process, we have to be able to understand having many independent events and we have to shift our perspective to look at the space between events as variables, not just look at the events themselves as variables. Both of these things are hard to do. It is hard to look at a basketball team and think that their next shot is independent of the previous shot (this is largely true). It is hard to look at customer complaints and see them as independent (also largely true), and it is hard to look at the history of human wars and think that events are also independent (Pinker shows this to be largely true as well). We tend to see events as connected even when they are not, a perspective error on our part. We also look just at the events, not at the time between the events. If we think that the time between the events will have a statistical dispersion that we can analyze, it shifts our focus away from the actual event itself. We can then think about what caused the pause and not what caused the even. This helps us see the independence between events and helps us see the statistics between both the event and the subsequent pause between the next event. Shifting our focus in this way can help us see Poisson distributions, random distributions with clusters, and patterns that we might miss or misinterpret. 
 
 
All of these factors are part of probability and statistics which our minds have trouble with. We like to see patterns and think causally. We don’t like to see larger complex perspective shifting statistics. We don’t like to think that there is a statistical probability without an easily distinguishable pattern that we can attribute to specific causal structures. However, as lightning and other Poisson processes show us, sometimes the statistical perspective is the better perspective to have, and sometimes our brains run amok with finding patterns that do not exist in random clusters.
Sociopolitical Hierarchies and Biology

Sociopolitical Hierarchies and Biology

In the book Sapiens, Yuval Noah Harari makes the argument that studying biology is insufficient for understanding human society. We cannot understand the complex human societies and different cultures of the world purely by studying the biology of humans. Testing humans on physiological and psychological metrics does provide us with interesting information, but it doesn’t explain exactly why so many differences are seen across cultures and places. It also doesn’t explain why certain hierarchies exist within different cultures across the globe.
 
 
To understand complex societies, Harari argues, we have to understand history, context and circumstance, and power relations. By doing so, we can begin to understand the structures within societies that shape the institutions that humans have created, and that ultimately shape the behaviors, opportunities, incentives, and motivations for humans. “Since the biological distinctions between different groups of Homo sapiens are, in fact, negligible, biology can’t explain the intricacies of Indian society or American racial dynamics,” writes Harari.
 
The two examples that Harari uses to demonstrate culture and society relative to biology demonstrate how chance historical events created unique circumstances that shaped different institutions that are highly influential within certain societies, but are unrecognizable outside those societies. Brahmins and Shudras are not understood as different races, but as different castes within Indian society, with substantial discrimination between the two groups. Racial discrimination has been a driving factor of American economic and political society. However, caste systems are nearly completely absent in the United States and the racial discrimination in the United States is not present in India. The explanations for the caste system and the racial dynamics are not biologically based, but culturally based – dependent on power and institutions.
 
Harari writes, “most sociopolitical hierarchies lack a logical or biological basis – they are nothing but the perpetuation of chance events supported by myths.” We see this when we look at recent challenges in the replication of psychological studies. Many of the findings from the field of psychology have come from studies involving college age students in the United States. Such individuals represent a very small segment of humanity. Generalizing from studies involving American college students will give us an inaccurate picture of the world – a picture that is not based on true biology, but on chance cultural factors specific to a unique population. We can easily make the mistake of believing that what we observe, either through a psychological study of American college students or through our own experiences with people in our community, state, or country, reflects a biological reality. However, what we observe is often the result of cultural differences or institutions and power structures that we are not consciously aware of.
 
Harari explains that this is what has happened with the Indian caste system and American racial dynamics. Cultural factors, chance historical events, and subsequent policies and institutions have created differences among people that we can observe and measure. However those differences are not based in biology. It is a mistake to attribute those differences to something innate in Homo sapiens or to assume that the way things are is the way that things should be. Quite often, our sociopolitical hierarchies have no logical or absolute reason for being the way they are.
Nature Answers the Questions We Pose

Nature Answers the Questions We Pose

I have not read A Hitchhiker’s Guide to the Galaxy, but I know there is a point where a character asks what’s the meaning of life, the universe, and everything, and receives a response of 42. The answer was certainly not what anyone was expecting, but it was an answer. Much of science is like the answer 42. We ask grand questions of nature and receive answers we didn’t quite expect and can’t always make sense of.
In The Book of Why Judea Pearl writes, “Nature is like a genie that answers exactly the question we pose, not necessarily the one we intend to ask.” We learn by making observations about the world. We can make predictions about what we think will happen given certain conditions and we can develop and test hypotheses, but the answers we get may not be answers to the questions we intended to ask. I frequently listen to the Don’t Panic Geocast and the hosts often talk about scientific studies that go awry because of some unexpected interaction between lights, between an experimental set-up and the sun, or because an animal happened to have messed with equipment in the field. Real results are generated, but they don’t always mean what we think they do on first look. The hosts have a frequent line that, “any instrument can be a thermometer,” to note how subtle changes in temperature can cause misleading noise in the data.
Pearl’s quote is meant to demonstrate how challenging science can be and why so much of science has taken such a long time to develop. Humans have often thought they were receiving answers to the questions they were asking, only to find out that nature was answering a different question, not the one the scientists thought they had asked. Pearl states that randomness has been one of the ways that we have gotten past nature, but writes about how counter-intuitive randomized controlled trials were when first developed. No one realized that the right question had to be asked through experimental set-ups that involved randomness. On the benefits of randomness he writes, “first, it eliminates confounder bias (it asks Nature the right question). Second, it enables the researcher to quantify his uncertainty.”
In the book, Pearl takes observations and statistical methods combined with causal insights to a level that is honestly beyond my comprehension. What is important to note, however, is that nature is not obligated to answer the questions we intend to ask. It answers questions exactly as we pose them, influenced by seemingly irrelevant factors in our experimental design. The first answer we get may not be very reliable, but randomness and statistical methods, combined as Pearl would advocate, with a solid understanding of causality, can helps us better pose our questions to nature, to be more confident that the responses we get answer the questions we meant to ask.
Luck & Success - Joe Abittan

Luck & Success

I am someone who believes that we can all learn from the lessons of others. I believe that we can read books, listen to podcasts, watch documentaries, and receive guidance from good managers and mentors that will help us learn, grow, and become better versions of ourselves. I read Good to Great and Built to Last from Jim Collins, and I have seen value in books that look at successful companies and individuals. I have  believed that these books offer insights and lessons that can help me and others improve and adopt strategies and approaches that will help us become more efficient and productive overtime to reach large, sustainable goals.

 

But I might be wrong. In Thinking Fast and Slow, Daniel Kahneman directly calls into question whether books form authors like Jim Collins are useful for us at all. The problem, as Kahneman sees it, is that such books fail to account for randomness and chance. They fail to recognize the halo effect and see patterns where none truly exist. They ascribe causal mechanisms to randomness, and as a result, we derive a lesson that doesn’t really fit the actual world.

 

Kahneman writes, “because luck plays a large role, the quality of leadership and management practices cannot be inferred reliably from observations of success.” Taking a group of 20 successful companies and looking for shared operations, management styles, leadership traits, and corporate cultures will inevitably end up with us identifying commonalities. The mistake is taking those commonalities and then ascribing a causal link between these shared practices or traits and the success of companies or individuals. Without randomized controlled trials, and without natural experiments, we really cannot identify a strong causal link, and we might just be picking up on random chance within our sample selection, at least as Kahneman would argue.

 

I read Good to Great and I think there is a good chance that Kahneman is correct to a large extent. Circuit City was one of the success stories that Collins touted in the book, but the company barely survived another 10 years after the book’s initial publication. Clearly there are commonalities identified in books like Good to Great that are no more than chance, or that might themselves be artifacts of good luck. Perhaps randomness from good timing, fortunate economic conditions, or inexplicably poor decisions by the competition contribute to any given company or individual success just as much as the factors we identify by studying a group of success stories.

 

If this is the case, then there is not much to learn from case studies of several successful companies. Looking for commonalities among successful individuals and successful companies might just be an exercise in random pattern recognition, not anything specific that we can learn from. This doesn’t fit the reality that I want, but it may be the reality of the world we inhabit. Personally, I will still look to authors like Jim Collins and try to learn lessons that I can apply in my own life and career to help me improve the work I do. Perhaps I don’t have to fully implement everything mentioned in business books, but surely I can learn strategies that will fit my particular situation and needs, even if they are not broad panaceas to solve all productivity hang-ups in all times and places.
Regression to the Mean

Praise, Punishment, & Regression to the Mean

Regression to the mean is seriously underrated. In sports, stock market funds, and biological trends like generational height differences, regression to the mean is a powerful, yet misunderstood phenomenon. A rookie athlete may have a standout first year, only to perform less spectacularly the following year. An index fund may outperform all others one year, only to see other funds catch up the next year. And a tall man may have a son who is shorter. In each instance, regression to the mean is at play, but since we underrate it, we assume there is some causal factor causing our athlete to play worse (it went to his head!), causing our fund to earn less (they didn’t rebalance the portfolio correctly!), and causing our son to be shorter (his father must have married a short woman).

 

In Thinking Fast and Slow Daniel Kahneman looks at the consequences that arise when we fail to understand regression to the mean and attempt to create causal connections between events when we shouldn’t. Kahneman describes an experiment he conducted with Air Force cadets, asking them to flip a coin backwards over their head and try to hit a spot on the floor. Those who had a good first shot typically did worse on their second shot. Those who did poor on their first shot, usually did better the next time. There wasn’t any skill involved, the outcome was mostly just luck and random chance, so if someone was close one time, you might expect their next shot to be a little further out, just by random chance. This is regression to the mean in an easy to understand example.

 

But what happens when we don’t recognize regression to the mean in a random and simplified experiment? Kahneman used the cadets to demonstrate how random performance deviations from the mean during flight maneuvers translates into praise or punishments for the cadets. Those who performed well were often praised, only to regress to  the mean on their next flight and perform worse. Those who performed poorly also regressed to the mean, but in an upward direction, improving on the next flight. Those whose initial performance was poor received punishment (perhaps just a verbal reprimand) between their initial poor effort and follow-up improvement (regression).  Kahneman describes the take-away from the experiment this way:

 

“The feedback to which life exposes us is perverse. Because we tend to be nice to other people when they please us and nasty when they do not, we are statistically punished for being nice and rewarded for being nasty.”

 

Praise a cadet who performed well, and they will then perform worse. Criticize a cadet who performed poorly, and they will do better. Our minds overfit patterns and start to see a causal link between praise and subsequent poor performance and castigation and subsequent improvement. All that is really happening is that we are misunderstanding regression to the mean, and creating a causal model where we should not.

 

If we better understood regression to the mean, we wouldn’t be so shocked when a standout rookie sports star appears to have a sophomore slump. We wouldn’t jump on the bandwagon when an index fund had an exceptional year, and we wouldn’t be surprised by phenotypical regression to the mean from one generation to the next. Our brains are phenomenal pattern recognizing machines, but sometimes they see the wrong pattern, and sometimes that gives us perverse incentives for how we behave and interact with each other. The solution is to step back from individual cases and try to look at an average over time. By gathering more data and looking for longer lasting trends we can better identify regression to the mean versus real trends in performance over time.
Detecting Rules

Detecting Rules

Our brains are built to think causally and look for patterns. We benefit when we can recognize that some foods make us feel sick, when certain types of clouds mean rain is on the way, or when our spouse gives us a certain look that means they are getting frustrated with something. Being able to identify patterns helps us survive and form better social groups, but it can also lead to problems. Sometimes we detect patterns and rules when there aren’t any, and we can adopt strange superstitions, traditions, or behaviors that don’t help us and might have a cost.

 

Daniel Kahneman demonstrates this in his book Thinking Fast and Slow by showing a series of letters and asking us to think about which series of letters would be more likely in a random draw situation. If we had a bag of letter tiles for each letter of the English alphabet, and we selected tiles at random, we wouldn’t expect to get a word we recognized. However, sometimes through random chance we do get a complete word. If you played thousands of Scrabble games, eventually you might draw 7 tiles that make a complete word on your first turn. The reality is that drawing the letters MGPOVIT is just as statistically likely as drawing the letters MORNING.

 

For our brains, however, seeing a full word feels less likely than a jumble of letters. As Kahneman explains, “We do not expect to see regularity produced by a random process, and when we detect what appears to be a rule, we quickly reject the idea that the process is truly random.”

 

We can go out of our way trying to prove that something is behaving according to rule when it is truly random. We can be sure that a pattern is taking place, even when there is no pattern occurring. This happens in basketball with the Hot Hand phenomenon and in science when researchers search for a significant finding that doesn’t really exist in the data from an experiment. Most of the time, this doesn’t cause a big problem for us. Its not really a big deal if you believe that you need to eat Pringles for your favorite sports team to win a playoff game. It only adds a small cost if you tackle some aspect of a house project in an inefficient way, because you are sure you have better luck when you do your long approach versus a more direct approach to handling the task.

 

However, once we start to see patterns that don’t exist in social life with other people, there can be serious consequences. The United States saw this with marijuana in the early days of marijuana prohibition as prejudice and racial fear overwhelmed the public through inaccurate stories of marijuana dangers. Ancient people who sacrificed humans to bring about rain were fooled by false pattern recognition. We see our brains looking for rules when we examine how every act of our president influences political polls for the upcoming election. Our powerful pattern and rule detecting brains can help us in a lot of ways, but they can also waste our time, make us look foolish, and have huge externalities for society.
Cause and Chance

Cause and Chance

Recently I have written a lot about our mind’s tendency toward causal thinking, and how this tendency can sometimes get our minds in trouble. We make associations and predictions based on limited information and we are often influenced by biases that we are not aware of. Sometimes, our brains need to shift out of our causal framework and think in a more statistical manner, but we rarely seem to do this well.

 

In Thinking Fast and Slow, Daniel Kahneman writes, “The associative machinery seeks causes. The difficulty we have with statistical regularities is that they call for a different approach. Instead of focusing on how the event at hand came to be, the statistical view relates it to what could have happened instead. Nothing in particular caused it to be what it is – chance selected it from among its alternatives.”

 

This is hard for us to accept. We want there to be a reason for why one candidate won a toss-up election and the other lost. We want there to be a reason for why the tornado hit one neighborhood, and not the adjacent neighborhood. Our mind wants to find patterns, it wants to create associations between events, people, places, and things. It isn’t happy when there is a large amount of data, unknown variables, and some degree of randomness that can influence exactly what we observe.

 

Statistics, however, isn’t concerned with our need for intelligible causal structures. Statistics is fine with a coin flip coming up heads 9 times in a row, and the 10th flip still having a 50-50 shot of being heads.

 

Our minds don’t have the ability to hold multiple competing narratives at one time. In national conversations, we seem to want to split things into 2 camps (maybe this is just an artifact of the United States having a winner take all political system) where we have to sides to an argument and two ways of thinking and viewing the world. I tend to think in triads, and my writing often reflects that with me presenting a series of three examples of a phenomenon. When we need to hold 7, 15, or 100 different potential outcomes in our mind, we are easily overwhelmed. Accepting strange combinations that don’t fit with a simple this-or-that causal structure is hard for our minds, and in many cases being so nuanced is not very rewarding. We can generalize and make substitutions in these complex settings and usually do just fine. We can trick our selves to believing that we think statistically, even if we are really only justifying the causal structures and hypotheses that we want to be true.

 

However, sometimes, as in some elections, in understanding cancer risk, and making cost benefit analyses of traffic accidents for freeway construction, thinking statistically is important. We have to understand that there is a range of outcomes, and only so many predictions we can make. We can develop aids to help us think through these statistical decisions, but we have to recognize that our brains will struggle. We can understand our causal tendencies and desires, and recognize the difficulties of accepting statistical information to help set up structures to enable us to make better decisions.
Extreme Outcomes

Extreme Outcomes

Large sample sizes are important. At this moment, the world is racing as quickly as possible toward a vaccine to allow us to move forward from the COVID-19 Pandemic. People across the globe are anxious for a way to resume normal life and to reduce the risk of death from the new virus and disease. One thing standing in the way of the super quick solution that everyone wants is basic statistics. For any vaccine or treatment, we need a large sample size to be certain of the effects of anything we offer to people as a cure or for prevention of COVID-19. We want to make sure we don’t make decisions based on extreme outcomes, and that what we produce is safe and effective.

 

Statistics and probability are frequent parts of our lives, and many of us probably feel as though we have a basic and sufficient grasp of both. The reality, however, is that we are often terrible with thinking statistically. We are much better at thinking in narrative, and often we substitute a narrative interpretation for a statistical interpretation of the world without even recognizing it. It is easy to change our behavior based on anecdote and narrative, but not always so easy to change our behavior based on statistics. This is why we have the saying often attributed to Stalin: One death is a tragedy, a million deaths is a statistic.

 

The danger with anecdotal and narrative interpretations of the world is that they are drawn from small sample sizes. Daniel Kahneman explains the danger of small sample sizes in his book Thinking Fast and Slow, “extreme outcomes (both high and low) are more likely to be found in small than in large samples. This explanation is not causal.”

 

In his book, Kahneman explains that when you look at counties in the United States with the highest rates of cancer, you find that some of the smallest counties in the nation have the highest rates of cancer. However, if you look at which counties have the lowest rates of cancer, you will also find that it is the smallest counties in the nation that have the lowest rates. While you could drive across the nation looking for explanations to the high and low cancer rates in rural and small counties, you likely wouldn’t find a compelling causal explanation. You might be able to string a narrative together and if you try really hard you might start to see a causal chain, but your interpretation is likely to be biased and based on flimsy evidence. The fact that our small counties are the ones that have the highest and lowest rates of cancer is an artifact of small sample sizes. When you have small sample sizes, as Kahneman explains, you are likely to see more extreme outcomes. A few random chance events can dramatically change the rate of cancer per thousand residents when you only have a few thousand residents in small counties. In larger more populated counties, you find a reversion to the mean, and few extreme chance outcomes outcomes are less likely to influence the overall statistics.

 

To prevent our decision-making from being overly influenced by extreme outcomes we have to move past our narrative and anecdotal thinking. To ensure that a vaccine for the coronavirus or a cure for COVID-19 is safe and effective, we must allow the statistics to play out. We have to have large sample sizes, so that we are not influenced by extreme outcomes, either positive or negative, that we see when a few patients are treated successfully. We need the data to ensure that the outcomes we see are statistically sound, and not an artifact of chance within a small sample.