Don’t Think Better; Think Different
One of the biggest challenges for organizations developing business strategies that leverage big data is contemplating how to think about big data. Organizations are accustomed to thinking about better – faster, cheaper, bigger – but they struggle when they have to think different, and that’s causing lots of problems.
Let’s examine some instances when organizations don’t need to think about better, but instead need to think different.
Don’t Think “What Happened”; Think “What Will Happen…”
Organizations, especially the business stakeholders, are having a hard time transitioning from “monitoring the business” to “predicting the business.” Business Intelligence (BI) and data warehouse environments provide the queries, reports and dashboards that the business stakeholders use to monitor the business: How many widgets did I sell last month? What were my gross sales last quarter?
But now we need the business stakeholders to start thinking about predictive questions (e.g., What will happen?) and prescriptive questions (e.g., What should I do?). Figure 1 provides an example of transitioning the business stakeholders from the traditional business monitoring questions, to the more predictive and prescriptive questions.
Action Item: Target a key business initiatives (e.g., improve customer lifetime value score, reduce teacher attrition, improve wind turbine predictive maintenance) and have your business stakeholders start by capturing the traditional “What happened” questions. Then have the business stakeholders brainstorm some predictive, “What will happen?” questions and some prescriptive, “What Should I do?” questions. Do this in a facilitated and collaborative environment so that the predictive and prescriptive question generation process feed off of the ideas of others and fuel the creative brainstorming process. Yeah, lots of Post-it notes one of my favorite tools.
Don’t Think RDBMS; Think Hadoop/HDFS…
In the world of big data, Hadoop/HDFS is a game-changer; it will forever change the way you think about storing, managing and analyzing data. And I don’t mean Hadoop as yet another data source for your data warehouse as positioned by many of today’s BI and RDBMS vendors:
“…will easily connect and interact with modern big data repositories from practically all vendors as well as Hadoop and other distributed data storage systems.”
I’m talking about Hadoop/HDFS as the foundation for your data warehouse and analytics environments. Leading data and analytic-driven organizations don’t want connectivity to or interoperability with Hadoop/HDFS; they want products that are tightly integrated within and run natively on Hadoop/HDFS. They want to take advantage of the massively parallel processing, cheap scale-out data architecture that can run hundreds, thousands, or even tens of thousands of Hadoop nodes.
These organizations want to leverage the new big data technology developments such as open source software (e.g., Hadoop, MapReduce, Yarn, Mahout, MADlib, R) where the cost of software acquisition and on-going licensing fees are a fraction of what it costs for proprietary software and modern scale-out data architectures built on commodity processors (see Figure 2).
Action Item: Start modernizing your data environment with the addition of new big data capabilities, such as a data lake. The data lake not only frees up expensive data warehouse resources by moving processing-heavy Extract, Transform, Load (ETL) processes off of the data warehouse to the data lake, but also enables an analytics environment where data scientists can self-provision analytic sandboxes to test out the value of new data sources and new analytic models.
Don’t Think of Data As A Cost; Think of Data As An Asset…
Data warehouses, due to the cost and technology challenges, taught business stakeholders to constrain their data desires. Business stakeholders got accustomed to working with 13 months of aggregated data instead of 15 to 20 years of every sales, claim, payment, order, return, admission, subscription, prescription, phone call, and credit card transaction. Why, because from an economic (and performance) perspective, there were no cost-effective technologies that could handle petabytes of data in a reasonable (life) time.
Big data enables a new data economic model where the prices to store, manage and analyze data is 20x to 50x cheaper than with traditional data warehouse technologies. My colleague, who was the VP of Analytics at a leading insurance company, stated that his analysis showed that it cost the same for 4TB of data on an enterprise data warehouse as it did for 200TB of data on Hadoop/HDFS. That’s a 50x cost advantage!
These big data economics are enabling organizations to integrate their detailed operational data with their wealth of unstructured data, such as consumer comments, email conversations, work orders, physician notes, technician comments, engineering specs, log files, clinical studies, product change notices, social media, newsfeeds, blogs, customer reviews, etc. The combined operational and unstructured data can greatly expand the insights teased out of the data about the organization’s strategic nouns – your customers, products, employees, students, marketing campaigns, stores, trucks, jet engines, wind turbines, etc.
In the end, this is allowing organizations to treat “data as an asset to be gathered and nurtured, versus a necessary evil.”
Action Item: Conduct an envisioning exercise that brings together the business and IT stakeholders to brainstorm what additional sources of data, both internal and external of the organization, could be brought into the analysis process. Don’t limit the thinking to the data that you already have (think external data sources) and don’t limit your thinking to only tabular data (think unstructured data sources).
Don’t Think Business Intelligence; Think Data Science
Data science is different than Business Intelligence (BI); resist the urge to try to make these two different disciplines one and the same. For example:
- BI identifies the questions to ask (How many students were in class last week?); data science identifies the hypotheses to test or the predictions to make (What is the impact of an increase in the value of a parent’s home on the student’s in-classroom performance?)
- BI operates with schema-on-load (you have to pre-build the data schema before you can generate your BI queries and reports); data science deals with schema-on-query (where data scientists custom design the data schema based upon the hypothesis they want to test or the prediction that they want to create).
- BI uses SQL, a declarative language that’s tightly coupled upon the underlying data model; data science leverages SQL plus procedural languages (MapReduce or Yarn using Java, Ruby on Rails, Perl, etc.) that yield the ability to create more complex data transformations (e.g., frequency, recency, sequencing) and build more complex analytic models (using SAS, R, MADlib, Mahout, etc.).
Figure 3 shows some of the differences between a BI analyst and a data scientist. And it’s more than just tools and techniques; it includes mindset and approach for discovering insights and quantifying relationships buried in data.
Action Item: Hire some true data scientists to complement your BI staff. Let the BI and data science teams collaborate to brainstorm, test and refine new variables that might be better predictors of business performance. Expose your BI staff to the tools, techniques and algorithms that a data scientist uses to identify areas of collaboration and begin the skills sharing process.
Don’t Think HIPPO; Think Collaboration
Unfortunately, it still exists today that the HIPPO in the room – the Highest Paid Person’s Opinion – rules most important decisions. We hear reasons such as “We’ve always done things that way” or “My years of experience tell me” as for why the HIPPO needs to drive the important decisions.
Unfortunately, that type of thinking has led to silo data fiefdoms and silo decisions. This type of thinking doesn’t empower the organization; instead it chokes off creative ideas. This type of thinking doesn’t empower the organization to explore what the data might be telling them about what’s driving the business. In the end, it’s a lost business opportunity.
The key to big data success is empowering cross-functional collaboration and exploratory thinking; to challenge long-held organizational rules of thumbs, heuristics and gut decision making.
Data science does not want to throw out the years of tribal knowledge that has been built up across the management, business and IT teams. Instead, data science wants to challenge those long-held traditions, to see which ones are still relevant in today’s market and see how those traditions can be improved (see Figure 4).
Action Item: Bring in a data science team to explore what might be buried in the data around a specific business initiative (keep it focused on a specific business initiative so it doesn’t morph into a science experiment). Run a vision workshop to brainstorm ideas that might help to improve business performance. Commit to testing these new ideas. This doesn’t mean you throw away the ways that you’ve done things in the past, it means that you augment what you’ve done in the past with new ideas that may yield new insights that lead to better business performance.
Don’t Think 3 Vs; Think 4 Ms
Many organizations are too infatuated with the technical innovations surrounding big data, and the 3 Vs of data volume, variety and velocity. But starting with a technology focus can quickly turn your big data initiative into a science experiment. You don’t want to be a technology in search of a use case.
Instead, focus on the 4 Ms of big data: Make Me More Money. Take a business-centric approach to your big data initiative. Start by identifying and focusing on the organization’s key business initiatives, that is, what is the organization trying to achieve from a business perspective over the next 9 to 12 months – reduce supply chain costs, improve supplier quality and reliability, reduce hospital acquired infections, improve student performance, etc.
Remember, organizations don’t need big data strategy; they need a business strategy that incorporates big data (see Figure 5).
Action Item: Identify what the organization is trying to accomplish over the next 9 to 12 months from a business perspective; the organization’s key business initiatives. Bring together the key business stakeholders and IT teams to understand how effective the organization has been at integrating data and analytics into key business initiatives. Brainstorm for a particular business initiative, the types of decisions that need to be made and the questions that need to be answered to enable those decisions. Then contemplate how new sources of data (internal and external, structured and unstructured) can be coupled with advanced analytics (descriptive, predictive, prescriptive) to uncover new insights that can be used to advance that business initiative.
Thinking better – faster, bigger, cheaper – is easy. In the IT world, we’ve been doing that for years. The challenge is to throw out your traditional thinking in order to think different; to approach the business opportunity for a different perspective. I’ve laid out 5 areas where organizations need to think differently about the big data opportunity and stop trying to “pave the cow path” by applying these new marvelous technology and business innovations using the same old approach.