Big Data Challenge Part I – The Digital and Analog Chasm
I am sharing my thoughts on Big Data challenges that an organization might face. I have divided the challenges in three broad terms. This blog will discuss the first challenge, Part I – The Digital and Analog Chasm.
We live in a digital world or do we?
You must be wondering what the meaning of the above statement is. Let me start with a simple experiment.
Tell me what you infer from the following string of numbers:
This is what they call “data”. What can you infer, anything?
Let me help you out a bit. How about now:
Still nothing? Let me try again. How about now:
12/07/1941 and 09/11/2001
Cool. You see that these are dates. But can you remember them? Is there anything significant about these dates? Let me make these dates unforgettable:
- Attack on Pearl Harbor (World War II): 12/07/1941
- Attack on World Trade Center: 09/11/2001
The long string of numbers had no meaning at first, but with few modifications, it creates an association to something meaningful. The string suddenly became unforgettable! That’s the way human brain works.
Wasn’t that cool? I borrowed this example from Joshua Foer’s interesting book on human brain and memory – “Moonwalking with Einstein”. The human brain is a miracle in terms of its memory, reasoning and decision-making capabilities but it is incapable of deciphering large and unrelated datasets. It can recognize patterns, it can find connections and it can interpret new things in old “schemas”. I will explain each in a moment.
Along with the propensity for pattern recognition, finding connections and interpreting “schema”, visual senses make up to 90% of our perception. The size, color and shapes of objects appeal to our sub conscious visual ability. Successful visuals can also represent large amounts of data in a small area. When it comes to brain analyzing data, there is certainly truth to the saying that a “picture is worth a thousand words” (and many thousands of numbers!).
As you can see, the way the human brain interprets data – patterns, connections, schema and visual perception are all analog in nature! Let me spend a few minutes describing each.
Our ability to recognize patterns and deduce meaning from it is hardwired into us. Even early, in our evolution, this was crucial for navigating and surviving a dangerous world. Our brain tells us that a pattern that is thin, long and moving in zigzag fashion might be a snake. It could be a rope moving in the wind but the brain does not have time to evaluate and always errors on the safe side. Michael Shermer, author of “The Believing Brain”, describes human brains as “evolved pattern-recognition machines that connect the dots and create meaning out of the patterns we think we see in nature”. Our ability to recognize patterns and find meaning helps us decipher a large amount of data but at the same time it has many shortcomings that one must be cognizant of. Our pattern recognition ability can help us solve a crime, find a new market, identify a challenge with a process and at the same time it may forces us to put people in groups based on religion or race, see “Mother Mary” in a piece of bread, spot a human face on Mars so on and so forth. Following is a list of some of the common examples of visual pattern recognition that we have come to know:
- A red car must go fast!
- Orange sign means danger
- Shape of the dog defines its demeanor.
- Clothes may define a profession.. and so on
In the example in the beginning, we were able to recognize the pattern of dates and also interpret them as the two worst attacks on the US.
Our brain is also good in connecting two or more seemingly unrelated things and creating insight that may have value. A famous example of a useful connection is the invention of printing press by Johann Gutenberg. The idea of printing itself was not new. Chinese had been experimenting with block printing for centuries. What Gutenberg did that revolutionized the world was to join two ideas of wine press and coin punch (in reality he combined several ideas but I am keeping it simple). The coin punch was used to leave an image on a small area while the wine press was used to apply force over a large area to crush grapes. Gutenberg imagined small coin punches arranged in a pattern and pressed by a wine punch and the “movable type” printing press was born. There are several business examples that illustrate connections. A modern example is insightful executive at an insurance company who realized that many of the motorcycle owners are middle-aged men with safe driving records and ride their bikes sparingly on few occasions. There was no point in combining them with traditional high-risk motorcycle riders. Human brain is pretty good in “connecting” middle age (seemingly calm demeanor), safe car driving record and occasional bike riding into a new “segment” they can go after with reduced premiums.
Schema and Visual Perception
Another useful brain characteristic to know is “schema”. It is easier for human beings to understand something new in terms of something known. Let me give you an example from Chip and Dan Heath’s delightful book “Made to Stick”. They give an interesting example of defining a “pomelo”. One could describe a pomelo as a “large citrus fruit with a thick, but soft rind”, or as “a pomelo is basically a super-sized grapefruit with a very thick and soft rind.” When you hear the first explanation that it is a large citrus fruit, you are still struggling to picture what a pomelo is but as soon as you hear that it is a super-sized grapefruit, you have a good mental picture. In this case we used the grapefruit “schema” to describe pomelo.
Human beings have been collecting and analyzing data for centuries. From Egyptian hieroglyphs to the current blogs, human beings have recorded their thoughts, observation, implications, interpretations and anything and everything under the sun. And, since the beginning of time, we have continued to analyze what was written, be it the Bible, Hammurabi’s code, Machiavelli’s Prince, gold prices, the stock market and so on and so forth. Storing and analyzing information (data) is almost our second nature. But something happened around the turn of the millennium.
The ability to “collect” data has increased exponentially while the ability to “analyze” data has remained more or less same. It is true that many statistical algorithms, analytical models and visualization techniques have emerged that help us translate huge data into “patterns”, “connections” and “schemas” but the fact still remains that data collection has gone “digital” while the ability to interpret it is still “analog”.
This is what I call the Digital and Analog Chasm.
It’s true that we live in a digital world. Technology is taking over all aspects of business and all sorts of business are becoming digital. In 1996 only 1 % of the world’s data was digital, everything else was analog. But by 2007 digital data skyrocketed to 96 % and the analog remained a small 6 % of the total volume.
See the graph below and you will see that something happened around the turn of the millennium. Any guesses?
It’s not difficult to guess. Before the year 2000 most of the data was analog – hard to believe but it’s true. We used to read books on paper, listen to music on cassettes, watch movies on video tapes, the TV and cable signals transmitted in analog, and our photographs were on analog films and printed on paper. Sure there were computers storing data in rows and columns but compared to the amount of analog data it was nothing. Let’s not forget that there were no smartphones, Twitter, Facebook and the whole social media.
But soon after year 2000, things started to change. The iPod became a sensation and the music became digital. Digital cameras became popular and our photographs became 0’s and 1’s. Digital camcorders were introduced and soon movies and videos became digital. The TV and cable transmission became digital. Multimedia was now stored on connected hard drives rather than on bookshelves. Kindle was introduced and it revolutionized the publishing business. Suddenly all our books became digital. Google became the premier search engine generating petabytes of web logs. A Harvard drop out launched Facebook. Twitter was introduced and several social media sites exploded all over the net. People started writing blogs. The world got truly connected. The proliferation of devices that are constantly collecting data and are connected to the internet has been growing steadily for last several years. We collectively call them Internet of Things (IoT). See the picture below:
Everything is logged and ready for interpretation. There are various logs – application logs, clickstream data, call logs, system logs, audit logs, sensor logs, blogs, and social media exchanges. These logs are in different formats – audio, video, text, numbers, binary and more. The data is generated at a tremendous speed. A research was done to estimate the data generated from the beginning of time through 2003. The estimated size of this was around 5 Exabyte. We generate that data every 2 days! See the graph below that highlights the digitization of all data.
The exponential growth in generating and storing of digital data has not matched with our ability to access and analyze data. The analytical methods, visualization techniques and our hardwired ability to recognized patterns, find “connections” and interpret “schema” have remained more or less the same. Sure, new tools and techniques will try to offset the gap but our evolutionary limitations of pattern matching will always be the weakest link.
So how do you fill the Digital and Analog Chasm? It is definitely more than statistical techniques and promising visualization tools. Agreed that the Big Data is a digital battle but it will be won the analog way! Yes, our ability to store data is critical but our ability to analyze and visualize is equally important. I will share solutions to overcome the chasm in my upcoming blogs.