IT World Canada


Yogi SchulzYogi Schulz

Published: January 4, 2017



While big data means a lot more information, it also means more false or misleading information.

There are many situations in business, and our civilization more generally, where big data is useless or even dangerous. For example, we’ve all heard the likely apocryphal story about beer and diaper sales. It usually goes like this: [Insert Major Retailer Name] found that beer and diaper sales were strongly correlated from a big data analysis of store sales transactions. The analyst concluded that [Pick one]:

  1. Packages of diapers are too heavy for recently pregnant women to handle so they ask their husbands to pick them up coming home from work. Husbands then use that opportunity to pick up beer.
  2. A diaper emergency occurs late in the evening and the husband is sent out while the new mother cares for the baby. Being annoyed, he also picks up a 12 pack to relax.
  3. [Insert some unflattering, stereotypical assumption about working class parents].

The brilliant analyst at [same major retailer as above] intuits that a simple relocation of beer next to diapers will lead to more purchases of beer and beer sales will improve by [insert higher %].

Even though it is almost certainly an urban legend, this story illustrates how useless or misleading data correlations can be.

Here are some situations that you should watch out for where big data will not help you and trying to apply big data concepts will lead to failure.

Data struggles with context

Human decisions are not discrete events. They are embedded in sequences of events and in contexts. Our human brain has evolved to account for this reality. We are really good at telling stories that weave together multiple causes and several contexts.

Data analysis cannot produce narrative and emergent thinking. For example the Facebook acquisition of WhatsApp or the Microsoft acquisition of Linkedin aren’t data-driven actions. These acquisitions occurred in pursuit of a strategic goal.

Therefore, strategic planning for businesses will always require a blend of data, experience and intuition. In this realm, an over-reliance on data is in fact dangerous.

Big data creates bigger haystacks

As businesses acquire more and more data, data analysts will find many, many more statistically significant correlations. Most of these correlations are spurious and deceive us when we’re trying to understand a situation. Spurious correlations grow exponentially as the volume of data we collect grows. As the haystacks or databases become larger, the insightful nuggets or needles we are desperately seeking continue to be buried elusively and perhaps imperceptibly deep inside.

Therefore, as managers, we need to become even more skeptical of recommendations based on correlations where causation is tenuous or non-existent.

Big data has trouble with big problems

If we are trying to solve small, contained problems such as which e-mail text produces the most clicks or sales, we can easily and cheaply conduct a randomized control experiment. We can use the results from the experiment to make a data-driven decision.

But if we are trying to solve a big problem like how to make a merger work or how to successfully launch a new product line, big data won’t help much. We won’t have an alternate business to use as a control group in either situation.

Therefore, as managers, we need to use the data we have but not expect it to guide the really big decisions that will need to be made.

Data struggles with social interaction

Our brains are excellent at social cognition. We are really good at reflecting each other’s emotional states, at detecting uncooperative behavior and at assigning value to concepts through emotion. The shift from facts to feelings as dominating factors in political discourse illustrates the power of social influences.

Conversely data analytics excels at measuring the quantity of social interactions but not their quality. Network scientists can rigorously map your interactions with the six co-workers you see during 76 per cent of your days. However, they can’t capture your devotion to the childhood friends you see only twice a year. Similarly, they can’t recognize Ernie’s love for Beatrice, whom he’s met just twice.

Therefore, when making business decisions with a significant social relationship component, it’s foolish to rely on data analytics.

Data favors memes over predicting success

Data analysis can easily detect memes that occur when large numbers of people take an instant liking to some topic, video or product and share their enthusiasm within a culture. What’s trending pages on the web, such as these on CBC, YouTube, or Twitter aggregate what’s resonating with web surfers.

However, many important and subsequently profitable products were initially unsuccessful because they were unfamiliar. For example, Coca Cola was originally invented as an alternative to morphine addiction, and to treat headaches, and relieve anxiety. Viagra was originally conceived as a treatment for hypertension, angina, and other symptoms of heart disease. Listerine was invented 133 years ago as a surgical antiseptic.

The need to overcome hesitancy with the unfamiliar is what makes product launches so difficult and more often unsuccessful than we care to admit. Data isn’t available to reduce this risk.

Data embeds values

Data is never raw. It’s always collected and structured according to somebody’s predisposition, immediate goal and personal values. The resulting data may look objective, but, in reality, value choices have been made subtly and generally imperceptibly all the way through from design of the data gathering to interpretation of the data collected.

As managers we need to be on guard against spin, leaps in logic and unstated assumptions. Data will never explain its embedded values.

What is your experience with restraining staff from trying to apply data to situations where it’s not relevant or even dangerously misleading?

Read more:
or visit for more Canadian IT News