Don’t Think Better; Think Different

Bill Schmarzo By Bill Schmarzo November 6, 2014

One of the biggest challenges for organizations developing business strategies that leverage big data is contemplating how to think about big data.  Organizations are accustomed to thinking about better – faster, cheaper, bigger – but they struggle when they have to think different, and that’s causing lots of problems.

Let’s examine some instances when organizations don’t need to think about better, but instead need to think different.

Don’t Think “What Happened”; Think “What Will Happen…”

Organizations, especially the business stakeholders, are having a hard time transitioning from “monitoring the business” to “predicting the business.”  Business Intelligence (BI) and data warehouse environments provide the queries, reports and dashboards that the business stakeholders use to monitor the business:  How many widgets did I sell last month?  What were my gross sales last quarter?

But now we need the business stakeholders to start thinking about predictive questions (e.g., What will happen?) and prescriptive questions (e.g., What should I do?).  Figure 1 provides an example of transitioning the business stakeholders from the traditional business monitoring questions, to the more predictive and prescriptive questions.

Figure 1: Transitioning the Business's Thinking

Figure 1: Transitioning the Business’s Thinking


Action Item:  Target a key business initiatives (e.g., improve customer lifetime value score, reduce teacher attrition, improve wind turbine predictive maintenance) and have your business stakeholders start by capturing the traditional “What happened” questions.  Then have the business stakeholders brainstorm some predictive, “What will happen?” questions and some prescriptive, “What Should I do?” questions.  Do this in a facilitated and collaborative environment so that the predictive and prescriptive question generation process feed off of the ideas of others and fuel the creative brainstorming process.  Yeah, lots of Post-it notes  one of my favorite tools.

Don’t Think RDBMS; Think Hadoop/HDFS…

In the world of big data, Hadoop/HDFS is a game-changer; it will forever change the way you think about storing, managing and analyzing data.  And I don’t mean Hadoop as yet another data source for your data warehouse as positioned by many of today’s BI and RDBMS vendors:

“…will easily connect and interact with modern big data repositories from practically all vendors as well as Hadoop and other distributed data storage systems.”

I’m talking about Hadoop/HDFS as the foundation for your data warehouse and analytics environments.  Leading data and analytic-driven organizations don’t want connectivity to or interoperability with Hadoop/HDFS; they want products that are tightly integrated within and run natively on Hadoop/HDFS.  They want to take advantage of the massively parallel processing, cheap scale-out data architecture that can run hundreds, thousands, or even tens of thousands of Hadoop nodes.

These organizations want to leverage the new big data technology developments such as open source software (e.g., Hadoop, MapReduce, Yarn, Mahout, MADlib, R) where the cost of software acquisition and on-going licensing fees are a fraction of what it costs for proprietary software and modern scale-out data architectures built on commodity processors (see Figure 2).

Figure 2: Modern Big Data Architecture

Figure 2: Modern Big Data Architecture


Action Item:  Start modernizing your data environment with the addition of new big data capabilities, such as a data lake.  The data lake not only frees up expensive data warehouse resources by moving processing-heavy Extract, Transform, Load (ETL) processes off of the data warehouse to the data lake, but also enables an analytics environment where data scientists can self-provision analytic sandboxes to test out the value of new data sources and new analytic models.

Don’t Think of Data As A Cost; Think of Data As An Asset…

Data warehouses, due to the cost and technology challenges, taught business stakeholders to constrain their data desires.  Business stakeholders got accustomed to working with 13 months of aggregated data instead of 15 to 20 years of every sales, claim, payment, order, return, admission, subscription, prescription, phone call, and credit card transaction.  Why, because from an economic (and performance) perspective, there were no cost-effective technologies that could handle petabytes of data in a reasonable (life) time.

Big data enables a new data economic model where the prices to store, manage and analyze data is 20x to 50x cheaper than with traditional data warehouse technologies.  My colleague, who was the VP of Analytics at a leading insurance company, stated that his analysis showed that it cost the same for 4TB of data on an enterprise data warehouse as it did for 200TB of data on Hadoop/HDFS.  That’s a 50x cost advantage!

These big data economics are enabling organizations to integrate their detailed operational data with their wealth of unstructured data, such as consumer comments, email conversations, work orders, physician notes, technician comments, engineering specs, log files, clinical studies, product change notices, social media, newsfeeds, blogs, customer reviews, etc.  The combined operational and unstructured data can greatly expand the insights teased out of the data about the organization’s strategic nouns – your customers, products, employees, students, marketing campaigns, stores, trucks, jet engines, wind turbines, etc.

In the end, this is allowing organizations to treat “data as an asset to be gathered and nurtured, versus a necessary evil.”


Action Item:  Conduct an envisioning exercise that brings together the business and IT stakeholders to brainstorm what additional sources of data, both internal and external of the organization, could be brought into the analysis process.  Don’t limit the thinking to the data that you already have (think external data sources) and don’t limit your thinking to only tabular data (think unstructured data sources).

Don’t Think Business Intelligence; Think Data Science

Data science is different than Business Intelligence (BI); resist the urge to try to make these two different disciplines one and the same.  For example:

  • BI identifies the questions to ask (How many students were in class last week?); data science identifies the hypotheses to test or the predictions to make (What is the impact of an increase in the value of a parent’s home on the student’s in-classroom performance?)
  • BI operates with schema-on-load (you have to pre-build the data schema before you can generate your BI queries and reports); data science deals with schema-on-query (where data scientists custom design the data schema based upon the hypothesis they want to test or the prediction that they want to create).
  • BI uses SQL, a declarative language that’s tightly coupled upon the underlying data model; data science leverages SQL plus procedural languages (MapReduce or Yarn using Java, Ruby on Rails, Perl, etc.) that yield the ability to create more complex data transformations (e.g., frequency, recency, sequencing) and build more complex analytic models (using SAS, R, MADlib, Mahout, etc.).

Figure 3 shows some of the differences between a BI analyst and a data scientist.  And it’s more than just tools and techniques; it includes mindset and approach for discovering insights and quantifying relationships buried in data.

Figure 3: Difference between Business Intelligence Analysts and Data Scientists

Figure 3: Difference between Business Intelligence Analysts and Data Scientists


Action Item:  Hire some true data scientists to complement your BI staff.  Let the BI and data science teams collaborate to brainstorm, test and refine new variables that might be better predictors of business performance.  Expose your BI staff to the tools, techniques and algorithms that a data scientist uses to identify areas of collaboration and begin the skills sharing process.

Don’t Think HIPPO; Think Collaboration

Unfortunately, it still exists today that the HIPPO in the room – the Highest Paid Person’s Opinion – rules most important decisions.  We hear reasons such as “We’ve always done things that way” or “My years of experience tell me” as for why the HIPPO needs to drive the important decisions.

Unfortunately, that type of thinking has led to silo data fiefdoms and silo decisions.  This type of thinking doesn’t empower the organization; instead it chokes off creative ideas. This type of thinking doesn’t empower the organization to explore what the data might be telling them about what’s driving the business.  In the end, it’s a lost business opportunity.

The key to big data success is empowering cross-functional collaboration and exploratory thinking; to challenge long-held organizational rules of thumbs, heuristics and gut decision making.

Data science does not want to throw out the years of tribal knowledge that has been built up across the management, business and IT teams.  Instead, data science wants to challenge those long-held traditions, to see which ones are still relevant in today’s market and see how those traditions can be improved (see Figure 4).

Figure 4: Creating a Collaborative Culture

Figure 4: Creating a Collaborative Culture


Action Item:  Bring in a data science team to explore what might be buried in the data around a specific business initiative (keep it focused on a specific business initiative so it doesn’t morph into a science experiment).  Run a vision workshop to brainstorm ideas that might help to improve business performance.  Commit to testing these new ideas.  This doesn’t mean you throw away the ways that you’ve done things in the past, it means that you augment what you’ve done in the past with new ideas that may yield new insights that lead to better business performance.

Don’t Think 3 Vs; Think 4 Ms

Many organizations are too infatuated with the technical innovations surrounding big data, and the 3 Vs of data volume, variety and velocity.  But starting with a technology focus can quickly turn your big data initiative into a science experiment.  You don’t want to be a technology in search of a use case.

Instead, focus on the 4 Ms of big data:  Make Me More Money.  Take a business-centric approach to your big data initiative.  Start by identifying and focusing on the organization’s key business initiatives, that is, what is the organization trying to achieve from a business perspective over the next 9 to 12 months – reduce supply chain costs, improve supplier quality and reliability, reduce hospital acquired infections, improve student performance, etc.

Remember, organizations don’t need big data strategy; they need a business strategy that incorporates big data (see Figure 5).

Figure 5: Big Data Business Model Maturity Index

Figure 5: Big Data Business Model Maturity Index


Action Item:  Identify what the organization is trying to accomplish over the next 9 to 12 months from a business perspective; the organization’s key business initiatives.  Bring together the key business stakeholders and IT teams to understand how effective the organization has been at integrating data and analytics into key business initiatives. Brainstorm for a particular business initiative, the types of decisions that need to be made and the questions that need to be answered to enable those decisions. Then contemplate how new sources of data (internal and external, structured and unstructured) can be coupled with advanced analytics (descriptive, predictive, prescriptive) to uncover new insights that can be used to advance that business initiative.


Thinking better – faster, bigger, cheaper – is easy.  In the IT world, we’ve been doing that for years.  The challenge is to throw out your traditional thinking in order to think different; to approach the business opportunity for a different perspective.  I’ve laid out 5 areas where organizations need to think differently about the big data opportunity and stop trying to “pave the cow path” by applying these new marvelous technology and business innovations using the same old approach.

Bill Schmarzo

About Bill Schmarzo

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

5 thoughts on “Don’t Think Better; Think Different

  1. Thank you Sir, agree with you.

    All the marketing fuzz around big data still needs to be translated in “how will this help me to make more money for my business”


  2. Don’t Think Business Functions; Think Business Initiatives

    In discussions this past week at the University of Iowa Tippie Business School, another important “think different” area was discussed, which is the importance of “Think business initiatives; not business functions.”

    When organizations think business functions (Sales, Marketing, Customer Service, Logistics, Inventory Management, Finance, Procurement, HR), they create silo’ed data, reporting and analytic solutions that service only their organization. We see this problem in how data warehouses are built, and the difficulty of creating cross-organizational views of the business.

    However, most important organizational initiatives are cross-functional; that is, the initiatives require two or more business functions to collaborate in order to achieve success. For example:

    • Customer Acquisition would need to involve Sales, Marketing and Finance
    • Fraud Detection would need to involve Sales, Operations and Finance
    • Predictive Maintenance would need to involve Operations, Inventory Management and Logistics

    We can leverage the need for a cross-functional view to avoid silo’ed data and analytic solutions. Organization’s key business initiatives are typically cross-functional and usually focused on delivering business value in a 9 to 12 month window, requirements that we can use to drive cross-organization collaboration and avoid silo’ed data and analytic solutions.