Big Data

The Elephant In the Room: Big Data Privacy Concerns

Bill Schmarzo By Bill Schmarzo CTO, Dell EMC Services (aka “Dean of Big Data”) June 17, 2014

By now, we’ve all heard the story. Target stores, through their advanced analysis of website activities, determine with some level of confidence that a visitor is a girl and that she’s likely pregnant.  Based upon this insight, Target starts sending baby-related coupons (prenatal care, baby room furniture, nursing products, etc.) to the girl. The girl’s father becomes outraged when he sees the Target coupons being sent to his 16-year old daughter. He complains to Target about their marketing, only to learn two weeks later that his daughter is pregnant[1].

The tragedy of this story is that many in the data science community perceive this as a huge success—that the Target data scientists were able to determine, based upon a person’s website activities, that a girl was pregnant before her father knew! However, the Target executives probably perceive this as a huge public relations failure, because just as Target knew that the girl was pregnant, Target also likely knew that she was under-aged (again, at some level of confidence). The problem was that Target management had not instituted the data and decision governance policies to guard for a situation where they know something about a visitor, but should not act on it.  Let me say this in a different way:

Just because you know or suspect something about a customer does NOT necessarily mean that you should act on that knowledge

There are numerous examples where an organization may suspect (with some level of confidence) something about their customers which the organization should not to act on, including:

  • Customer is researching cancer or some other serious ailment
  • Customer is researching a new job (if they have an existing job)
  • Customer is researching dating sites (if they are married)
  • Customer is researching divorce lawyers (if they got busted visiting dating sites)

All of these situations likely can be determined (with some level of confidence) by mining a customer’s keyword searches, social media postings and exchanges, email communications, website and blog visits (e.g., time on a site, frequency of visits, recency of visits, etc.).

As was unveiled by the recent EMC Privacy Index study, consumers are getting smarter – they intrinsically understand that there is a trade-off between digital convenience (i.e., knowing who I am and providing services personalized to my interests, desires and needs) and that same consumer’s willingness to trade privacy for greater convenience.  However, consumers are also becoming more diligent about what areas and for what reasons they are willing to trade convenience for privacy (see Figure 1).

Bill Blog Pic 1Figure 1: EMC Privacy Index Study, 2014

Companies are going to need to have a well-described, formal data and decision governance organization that clearly articulates the rules and regulations with respect to how they will and will not use this information. There is a clear and present danger to consumers, and to the organizations that are mining this wealth of consumer data, if improper or unethical use is made of the information.

Golden Rule of Personal Privacy: Do What’s Best for the Customer

Customer loyalty programs thrive because organizations give something to their customers in exchange for gathering information about the customer. I’m a member of several loyalty programs, and these loyalty programs give me discounts on goods, free coffee and pastries, free airline trips, hotel stays, and cash back that I can spend on goods in their stores or website. StarbucksCVSUnited AirlinesMarriott Hotels, and Sports Authority are just but a few of them. I give them information about me and my shopping and travel, and they pay me back in goods and services.

However, I’m hesitant to share any additional personal information because 1) they haven’t given me a compelling reason to share additional personal information, and 2) I don’t trust them to use that data in my best interests. Let me walk through an example.

Let’s say that you are a grocery chain, and you would love to know the following as the customer walks into your store:

1)   What’s on their shopping list?

2)   What’s their budget?

3)   If there is any particular event (birthday, barbeque, party) for which they are planning?

With that information, the grocery chain could create a set of recommendations that would allow the customer to optimize his/her budget, as well as recommend other items that might be useful for the upcoming event. That would be a real win for both the customer and the grocer. In fact, I would be willing to share that information with my grocer as long as I could be confident that the grocer was making recommendations that were in my best interest.

The minute they recommend their private label product as a replacement for the branded product that I have used for years (note: grocer’s make considerable more profit on private label products than they do on branded products), then they will have violated that golden rule of always looking out for my best interests.

Decision Governance:  Does It Pass The “Mom Test”

One way to address this issue is via a formalized “decision governance” organization. Much like data governance, decision governance should work with the business to decide what information they are seeking on their customers and clearly define when and where they will use that information. And if there ever is a situation that is not covered by the decision governance policies, then no action should be taken until the decision governance organization has decided what the proper action should be.

One simple way to test whether or not you should act on the insight that you gain about a customer would be the “Mom Test.” What would your mom think of your decision to use that information about a customer? In most cases, the “Mom Test” would quickly identify those things that it’s just not the right to do.

However, organizations can’t rely upon the “Mom Test,” so we need a more formal decision governance organization. There is much that we can learn from the world of data governance about how to properly and effectively set up a decision governance organization. Many of the data governance policies and procedures can be repurposed specifically to address how data and insight about a company’s customers will or will not be used.

Privacy Issues = Trust Issues

Trust is the heart of the privacy issue from a consumer’s perspective:

1)   Consumers don’t trust the organization to have the guidelines and governance in place to know when they should act, and when they should NOT act, on information that they may learn about us

2)   Consumers don’t trust the organization to focus on our best interests and instead of their best interests

3)   Consumers don’t trust the organization to refrain from selling our personal data to others, for their own gain

This privacy issue is only going to get bigger, especially as organizations become more proficient at mining Big Data and uncovering new information about their customers’ interests, passions, affiliations and associations. Stories like Target, where the data scientist community is so quick to declare victory, scares many executives because they can’t afford to risk the reputation of the organization (not to mention potential financial penalties and lawsuits) on the judgment of a few people. Again, just because you know something about someone does not necessarily mean that you should act on that knowledge.

Be sure to check out the EMC Privacy Index infographic below to see this data in action.

[1] How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did



Bill Schmarzo

About Bill Schmarzo

CTO, Dell EMC Services (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

One thought on “The Elephant In the Room: Big Data Privacy Concerns