Hope in Big Data

Hope in Big Data

Most of us probably don’t work with huge data sets, but all of us contribute to huge data sets. We know the world of big data is out there, and we know people are working with big data, but there are not many of us who truly know what it means and how we should think about any of it. In The Book of Why, Judea Pearl argues that even many of those doing research and running companies based on big data don’t fully understand what it all means.
Pearl is critical of researchers and entrepreneurs who lack causal understandings but pursue new knowledge and information by pulling correlations and statistics out of large data sets. There are some companies that are taking advantage of the fact that huge amounts of computing power can give us insights into data sets that we never before could have generated, however, these insights are not always as meaningful as we are lead to believe.
Pearl writes, “The hope – and at present, it is usually a silent one – is that the data themselves will guide us to the right answers whenever causal questions come up.”
My last post was about the overuse of the phrase: correlation is not causation. Finding correlations and relationships in data is meaningless if we don’t also have causal understandings in mind. This is the critique that Pearl makes with the quote above. If we don’t have a way of understanding basic causal structures, then the phrase is right, correlations don’t mean anything. Many companies and researchers are in a stage where they are finding correlations and unexpected statistical results in big data, but they lack causal understandings to do anything meaningful with the data. In the world of public policy this feels like the saying, a solution in search of a problem or in the world of healthcare like a pay and chase scenario.
Pearl argues throughout the book that we are better at identifying causal structures than we are lead to believe in our statistics courses. He also argues that understanding causality is key to unlocking the potential of big data and actually getting something useful out of massive datasets. Without a grounding in causality, we are wasting our time with the statistical research we do. We are running around with solutions in the forms of big data correlations that don’t have a causal underpinning. It is as if we are paying fraudulent claims, then chasing down some of the money we spent and congratulating ourselves on preventing fraud. The end result is a poor use of data that we prop up as a magnanimous solution.
Correlation and Causation - Judea Pearl - The Book of Why - Joe Abittan

Correlation and Causation

I have an XKCD comic taped to the door of my office. The comic is about the mantra of statistics, that correlation is not causation. I taped the comic to my office door because I loved learning statistics in graduate school and thinking deeply about associations and how mere correlations cannot be used to demonstrate that one thing causes another. Two events can correlate, but have nothing to do with each other, and a third thing may influence both, causing them to correlate without any causal link between the two things.
But Judea Pearl thinks that science and researchers have fallen into a trap laid out by statisticians and the infinitely repeated correlation does not imply causation mantra. Regarding this perspective of statistics he writes, “it tells us that correlation is not causation, but it does not tell us what causation is.”
Pearl seems to suggest in The Book of Why that there was a time where there was too much data, too much humans didn’t know, and too many people ready to offer incomplete assessments based on anecdote and incomplete information. From this time sprouted the idea that correlation does not imply causation. We started to see that statistics could describe relationships and that statistics could be used to pull apart entangled causal webs, identifying each individual component and assessing its contribution to a given outcome. However, as his quote shows, this approach never actually answered what causation is. It never actually told us when we can know and ascertain that a causal structure and causal mechanism is in place.
“Over and over again,” writes Pearl, “in science and in business, we see situations where mere data aren’t enough.”
To demonstrate the shortcomings of our high regard for statistics and our mantra that correlation is not causation, Pearl walks us through the congressional testimonies and trials of big tobacco companies in the United States. The data told us there was a correlation between smoking and lung cancer. There was overwhelming statistical evidence that smoking was related or associated with lung cancer, but we couldn’t attain 100% certainty just through statistics that smoking caused lung cancer. The companies themselves muddied the water with misleading studies and cherry picked results. They hid behind a veil that said that correlation was not causation, and hid behind the confusion around causation that statistics could never fully clarify.
Failing to develop a real sense of causation, failing to move beyond big data, and failing to get beyond statistical correlations can have real harms. We need to be able to recognize causation, even without relying on randomized controlled trials, and we need to be able to make decisions to save lives. The lesson of the comic taped to my door is helpful when we are trying to be scientific and accurate in our thinking, but it can also lead us astray when we fail to trust a causal structure that we can see, but can’t definitively prove via statistics.
Statistical Artifacts

Statistical Artifacts

When we have good graphs and statistical aids, thinking statistically can feel straightforward and intuitive. Clear charts can help us tell a story, can help us visualize trends and relationships, and can help us better conceptualize risk and probability. However, understanding data is hard, especially if the way that data is collected creates statistical artifacts.

 

Yesterday’s post was about extreme outcomes, and how it is the smallest counties in the United States where we see both the highest per capita instances of cancer and the lowest per capita instances of cancer. Small populations allow for large fluctuations in per capita cancer diagnoses, and thus extreme outcomes in cancer rates. We could graph the per capita rates, model them on a map of the United States, or present the data in unique ways, but all we would really be doing is creating a visual aid influenced by statistical artifacts from the samples we used. As Daniel Kahneman explains in his book Thinking Fast and Slow, “the differences between dense and rural counties do not really count as facts: they are what scientists call artifacts, observations that are produced entirely by some aspect of the method of research – in this case, by differences in sample size.”

 

Counties in the United States vary dramatically. Some counties are geographically huge, while others are pretty small – Nevada’s is a large state with over 110,000 square miles of land but only 17 counties compared to West Virginia with under 25,000 square feet of land and 55 counties. Across the US, some counties are exclusively within metropolitan areas, some are completely within suburbs, some are entirely rural with only a few hundred people, and some manage to incorporate major metros, expansive suburbs, and vast rural stretches (shoutout to Clark County, NV). They are convenient for collecting data, but can cause problems when analyzing population trends across the country. The variations in size and other factors creates the possibility for the extreme outcomes we see in things like cancer rates across counties. When smoothed out over larger populations, the disparities in cancer rates disappears.

 

Most of us are not collecting lots of important data for analysis each day. Most of us probably don’t have to worry too  much on a day to day basis about some important statistical sampling problem. But we should at least be aware of how complex information is, and how difficult it can be to display and share information in an accurate manner. We should turn to people like Tim Harford for help interpreting and understanding complex statistics when we can, and we should try to look for factors that might interfere with a convenient conclusion before we simply believe what we would like to believe about a set of data. Statistical artifacts can play a huge role in shaping the way we understand a particular phenomenon, and we shouldn’t jump to extreme conclusions based on poor data.
Healthcare Safety and Data

Hospital Safety & Data

One problem with healthcare in the United States is that consumers don’t control their data and the information about them. Even the employers of healthcare consumers, who are paying for the services provided to patients and often responsible for whether patients have healthcare coverage at all, don’t have access to any of the healthcare data of the employees they pay to cover. Healthcare information is protected by providers and guarded by insurers.

 

A troubling result is that consumers and employers often don’t know much about the quality of care provided at a hospital or from a given provider, and don’t know about the safety record of providers and hospitals. Outcome measures are sometimes protected by law, and are other times hidden behind complex systems that prevent employers and consumers from finding and understanding the information.

 

Dave Chase compares the problem this creates to airline travel in his book The Opioid Crisis Wake-Up Call, “No corporate travel department would allow an employee to fly on an airline that suppressed its safety records (even if the FAA allowed it). In the same way, it’s unconscionable to blindly send an employee to a hospital with little or no information on its safety record. If the hospital suppresses that information, go elsewhere and tell your employees why.”

 

There are many ways in which we treat the healthcare system differently than other sectors for no apparent reason. I wrote about the way we don’t consider healthcare broker’s conflicts of interest in the same way we consider financial adviser’s conflicts of interest. In a similar example as above, we heavily scrutinize any spending by employees for lunches or hotel stays on trips, but we don’t apply the same scrutiny to hospital billing. Our failure to consider safety the way we would for employee travel, even though many employers spend more on their employees healthcare than on their travel, is a failure of how we think about the system.

 

I think that Robin Hanson and Kevin Simler explain a little of why this is in their book The Elephant in the Brain. We don’t know what medical care is effective and we don’t know which systems and providers are safe, but we do know when someone took time off work for care. We can signal our support for that individual with cards, balloons, and messages about how much we value them and hope they recover quickly. Much of our healthcare system and how we treat it is based on signaling. Accessing care shows others that we have resources and powerful allies who care about us. We also use healthcare to signal to others how much we care about them and what a valuable ally we would be to them. The result is costly, in terms of dollars and health and safety problems.

 

We have to get beyond this signaling mindset and approach to healthcare if we want to rein in prices and have a safe and effective system. If we want our healthcare to be sustainable for the long run, it can’t be built around signaling, but must actually be built around effective solutions. Employers have an important role to play by demanding the information they need to be accountable in providing valuable health benefits to employees. Hospitals, providers, and insurance companies can’t continue to monopolize and hide patient data, preventing employers and patients from making smart and economical healthcare decisions.
Data Liquidity

Data Liquidity in Healthcare

Another piece of Dave Chase’s Fair Trade for Health Care as outlined in his book The Opioid Crisis Wake-Up Call is what he calls Data Liquidity. It is the idea that you can access your data, see it, contribute to it, and take it someplace else if you want. The idea that you have control over your data – the data you produce in the world, the data which is about you – is a new and growing idea in the world.

 

Data Liquidity is a problem with all of tech right now, but it is especially important in the healthcare industry. Chase writes, “Care teams do their best work when they have the most complete view of a patient’s health status. Anything less comes with an increased risk of harm. Likewise, your employees should have easy access to their own information in a secure patient-controlled data repository  – including the right to contribute their own data or take it elsewhere.”

 

In the world of social media, people (at least in Europe) have demanded to have the right to see their data and have it completely removed from a company’s server if they desire. In the world of finance, there is increasing pressure on the big three credit rating companies to be more transparent in how they determine an individual’s credit score, and some lawmakers want to push the companies to change what they consider and evaluate when generating a credit score. Within healthcare, the debate is on who owns a patient’s medical records. Does the medical provider own the records? Does the patient own the records? What records does the insurance company own?

 

Chase argues that patients need to own their medical records and have access to and control over them. Since most people get their insurance through their employers, Chase argues that it is up to businesses and companies to demand data liquidity and transparency within the contracts they establish with insurers and healthcare systems. It is up to the businesses which contract with employers or health systems to set fair rules related to data that give employees data power and the ability to ensure all of their providers have access to all of their pertinent records.

 

From tech to finance to healthcare, people are starting to see the importance of controlling data, and Chase is hopeful that this revolution will improve healthcare quality, reduce unnecessary procedures, and reduce healthcare costs.
Healthcare Price Transparency

Healthcare Price Transparancy

Have you ever tried to figure out what a healthcare procedure is going to cost you before you have the procedure? Almost no one can give you a straight answer, and it takes a long time to get a number at all because the doctor’s office has to check with your insurance to see what their agreement is, what you still have left with your deductible, and where you stand relative to your out of pocket maximum. The result is that consumers have very little insight into what they are actually going to pay or owe when they go to a check-up, when they need a new prescription drug, or when they have a knee operation at a local hospital.

 

In his book The Opioid Crisis Wake-Up Call, Dave Chase addresses the lack of transparency in healthcare pricing. Specifically looking at the ways that insurance companies hide claims data from employers, Chase writes, “They want to maintain the status quo. This means protecting pricing opacity at all costs. If you could see the prices you actually pay, you might begin to wonder why a hospital with a large market share but mediocre quality outcomes is paid exponentially more than a smaller, high quality provider in the same network.”

 

Healthcare price transparency reveals disparities in our healthcare system and shows that healthcare costs are often not connected with quality or health outcomes. Cost is somewhat arbitrary and usually negotiated without the person who will actually be receiving the service. If we could see the costs, then we would be more likely to shop around, either for different insurance or for different healthcare providers with more reasonably prices for services and treatments. I think our health spending is generally rather inelastic, but nevertheless, if we better understood pharmacy pricing, basic medical services, and major surgery costs, we could start to move toward options that offered higher value.

Different Angles

In his book How to Win Friends and Influence People, Dale Carnegie quotes Henry Ford on seeing things from another person’s perspective: “If there is any one secret of success, it lies in the ability to get the other person’s point of view and see things from that person’s angle as well as from your own.” 

 

I am fascinated by the mind, our perception of the universe, and how we interpret the information we take in to make decisions. There is so much data and information about the world, and we will all experience that information and data in different ways, and our brains will literally construct different realities with the different timing and information that we take in. There may be an objective reality underlying our experiences, but it is nearly impossible to pinpoint exactly what that reality is given all of our different perspectives.

 

What we can realize from the vast amount of data that is out there and by our limited ability to take it all in and comprehend it is that our understanding of the universe is woefully inadequate. We need to get the perspectives of others to really understand what is happening and to make sense of the universe. Everyone will see things slightly differently and understand the world in their own unique way.

 

Carnegie’s book addresses this point in the context of business. When we are trying to make a buck, we often become purely focused on ourselves, on what we want, on how we think it is best to accomplish our goals, and on the narrow set of things that we have identified as necessary steps to get from A to B.

 

However, we are always going to be working with others, and will need the help of other people and other companies to achieve our goals. We will have to coordinate, negotiate, and come to agreement on what actions we will all take, and we will all bring our own experiences and motivations to the table. If you approach business thinking purely about what you want and what your goals are, you won’t be able to do this successfully. You have to consider the perspectives of the other people that you work with or rely on to complete any given project.

 

Your employees motivation will be different than the motivation of the companies who partner with you. Your goal might be to become or remain the leader in a certain industry, but no one cares if you are the leader in your space. Everyone wants to achieve their own ends, and the power of adopting multiple perspectives helps you see how each unique goal can align to compound efforts and returns. Remember, your mind is limited and your individual perspectives are not going to give you the insight you need to succeed in a complex world. Only by seeing the different angles with which other people approach a given problem or situation can you successfully coordinate with and motivate the team you will be working with.

How to Influence People

“The only way on earth to influence other people,” writes Dale Carnegie in his book How to Win Friends and Influence People, “is to talk about what they want and show them how to get it.”

 

Carnegie’s book is one that I have heard recommended over and over by successful guests on the various podcasts that I listen to. I was excited to read it to get real insight into how to be a more likable person and how to be a more influential person in the groups and organizations that I participate with. The book, however, doesn’t provide you with any hacks to trick people into being your friend or to slyly convince people to do what you want them to do. The book focuses on relationships and the importance of being sincere and present in your relationships with others in order to develop meaningful connections with the people around you. The quote above is part of that advice.

 

We don’t influence other people’s decisions by preaching at them, by constantly yelling at them when they do something we consider to be wrong, or by nagging them to do things the way we want. We influence other people by connecting their actions, behaviors, and beliefs to larger outcomes that the other person is aiming for. In the ultimate sense, we show how the other person’s behaviors, actions, and beliefs are either in or out of alignment with their personal values.

 

In 2016 I started the Masters in Public Administration program at the University of Nevada, Reno. For years I had heard my sister tell me about the benefits of universal health care. I had heard my parents and uncles talk about the ways that welfare lead to people staying home to play video games instead of working. I had listened to people talk about trickle down economics and the values of federalism, and I wanted to enter a masters program where I could learn how to sort out all the arguments people discussed regarding public policy and governance. I wanted concrete facts so I could make rational decisions on all these topics and tell my family members who was empirically correct and who was wrong.

 

What I learned, however, is that all of these policy discussions hinge on something deeper than the cold hard rational facts. They hinge on values. As I learned what the scientific research showed about universal healthcare, tax rates, and social welfare programs, I told my family members where their ideas seemed to make sense and where they seemed to be in conflict with the actual data. My empirical evidence has meant nothing to my family members and has not changed any minds. The data is only useful when it supports the position that people want to hold based on their values. Changing minds and influencing people, therefore has to be connected to the values they already hold or that they aspire to.

 

Carnegie’s quote at the start of this post is all about connecting to values. You have to talk to people about what they want to see in life, why they want to see those things, and what values are driving the ways they hope the world turns out to be. Then you need to show them how the things you support, the ideas you think the other person should hold, and how the actions that you hope they will take help get the other person and society closer to those values.

 

For whatever reason we don’t like to talk about our values openly. Partly this is because for many of us our number one value is our own self-interest, and we don’t want to say that directly. But we also make up excuses around issues of abortion, healthcare, and taxes where we claim that economics or good health are the values we care about, but really we care much more about identity, self-interest, and whether the world is fair to us. If we could discuss those values directly, rather than hiding behind economic BS, then maybe we could actually compromise or be less hateful of those who don’t agree with us. In the end, we should remember that it is our values which underlay everything we say or due (that includes me, you, and that person on social media you hate). If we want to try to shape the world for the better, we better understand what values are driving us, what values drive others, and how we communicate our values in terms of how we think the world should operate. We won’t influence people to live better if we are not up front about our values and can’t connect other people’s actions back to the values question.

An Illusion of Security, Stability, and Control

The online world is a very interesting place. While we frequently say that we have concerns about privacy, about how our data is being used, and about what information is publicly available to us, very few people delete their social media accounts or take real action when a data breach occurs. We have been moving more and more of our life online, and we have been more accepting of devices connected to the internet that can either be hacked or be used to tacitly spy on us than we would expect given the amount of time we spend expressing concern for our privacy.

 

A quick line from Tyler Cowen’s book The Complacent Class may explain the contradiction. “A lot of our contentment or even enthrallment with online practices may be based on an illusion of security, stability, and control.”

 

I just read Daniel Kahneman’s book Thinking Fast and Slow and in it he writes about a common logical fallacy, the substitution principle. When we are asked difficult questions, we often substitute a simpler question that we can answer. However, we rarely realize that we do this. Cowen’s insight suggests that we are using this substitution fallacy when we are evaluating online practices.

 

Instead of thinking deeply and critically about our privacy, safety, and the security of our personal or financial information in a given context, we substitute. We ask ourselves, does this website intuitively feel legitimate and well put together? If the answer is yes, we are more likely to enter our personal information, allow our online movements to be tracked, enter our preferences, and save our credit card number.

 

If matching technology works well, if our order is fulfilled, and if we are provided with more content that we can continue to enjoy, we will again substitute. Instead of asking whether our data is safe or whether the value we receive exceeds the risk of having our information available, we will ask if we are satisfied with what was provided to us and if we liked the look and feel of what we received. We can pretend to answer the hard questions with illusory answers to easier questions.

 

In the end, we land in a place where the companies and organizations operating on the internet have little incentive to improve their systems, to innovate in ways that create disruptive changes, or to pursue big leaps forward. We are already content and we are not actually asking the hard questions which may push innovation forward. This contentment builds stagnation and prevents us from seeing the risks that exist behind the curtain. We live in our illusion that we control our information online, that we know how the internet works, and that things are stable and will continue to work, even if the outside world is chaotic. This could be a recipe for a long-term disaster that we won’t see coming because we believe we are safely in control when we are not.

Avoiding Race

Michelle Alexander in her book, The New Jim Crow, directly addresses inconsistencies and inequities within our criminal justice system. The prison population in the United States has exploded relative to other countries, and minority racial populations have taken the brunt of our unusually high number of arrests. Alexander focuses throughout her book on the unequal levels of policing in white, black, and brown communities in the United States and the ways in which inequality has lead to policing patterns that favor white people and disadvantage black and brown people. Alexander also looks at the ways in which people with criminal backgrounds are excluded from society, and how exclusion shapes people’s behavior. She describes the ways that this then feeds back into group behaviors and creates a cycle of continually greater policing and arresting. Despite the evidence to demonstrate that our policing is out of control and unfairly targeting minority populations, our country has trouble addressing the reality of our system, and Alexander has ideas as to why.

 

In her book she writes, “The language of caste may well seem foreign or unfamiliar to some. Public discussions about racial caste in America are relatively rare. We avoid talking about caste in our society because we are ashamed of our racial history. We also avoid talking about race. We even avoid talking about class.” We believe that today race is not a limiting factor for individuals. White people have an idea in their mind that there are almost no racist individuals in the country. The success of many black and brown individuals in our country demonstrates that we have reached a place beyond racism, where individual effort, not race determines our success. The election of a black president and black sports figures and celebrities is a clear indication to white people that we have reached a post-racial point in society and this allows for the false view that black people bringing up race is the only thing preventing us from leaving race behind. This view however, is drawn entirely upon individuals, and neglects the way that race is shaped by institutions and larger groups. Individually we may have been able to move beyond racism, but as a larger society and within public and private institutions, we have not been able to eliminate disparate impacts for racial groups.

 

Policing and our prison populations demonstrate the way that we have not moved beyond racism within our institutions. Policies related to policing do not direct officers to over-police black and brown neighborhoods and do not instruct officers to arrest black and brown men at rates far higher than they arrest white men, but that is what we see happening when we look at the data describing who is arrested and where our police officers spend their time and effort. We find ways to explain the disparate outcomes that black and brown people face in our criminal justice system that have nothing to do with race, but our explanations avoid any discussion about the racial history that these groups have faced in our country’s history. For years our country allowed racial discrimination in employment, education, and housing, and these policies limited the economic mobility of racial groups while favoring and advantaging white groups. Wealth accumulation was far more challenging for black and brown people, and the effects of such discrimination have not completely gone away. Policing those who we placed in ghettos and policing those who we did not allow to grow economically is not a directly racists decision within our criminal justice system, it is just a side effect.

 

Alexander argues that we should have more discussion about the role that race historically played in our country so that we can better understand our current moment. She argues that we should look at race and at socioeconomic status (SES) as indicators of caste, because race, SES, and caste systems can accurately describe the inequities and realities of our system today. Our discussions avoid race and the idea of caste, but the data the supports the reality of the ideas we hide from.