Doug Cackett – InFocus Blog | Dell EMC Services https://infocus.dellemc.com DELL EMC Global Services Blog Thu, 20 Sep 2018 15:27:16 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.7 Industrializing the Data Value Creation Process https://infocus.dellemc.com/doug_cackett/industrializing-the-data-value-creation-process/ https://infocus.dellemc.com/doug_cackett/industrializing-the-data-value-creation-process/#respond Tue, 24 Jul 2018 12:30:33 +0000 https://infocus.dellemc.com/?p=35753 For organizations to maximize the data value creation process, it’s critical to have a clear line of sight from their business strategy and stakeholders through to the decisions that could be improved by applying machine learning and other techniques to the available data. In recent months, what we’ve increasingly seen is Chief Data Officers taking […]

The post Industrializing the Data Value Creation Process appeared first on InFocus Blog | Dell EMC Services.

]]>
For organizations to maximize the data value creation process, it’s critical to have a clear line of sight from their business strategy and stakeholders through to the decisions that could be improved by applying machine learning and other techniques to the available data.

In recent months, what we’ve increasingly seen is Chief Data Officers taking a more active role in facilitating that process, focusing more on desired business outcomes and value delivery, and in doing so transforming themselves into Chief Data Monetization Officers. See the related blog, Data Monetization? Cue the Chief Data Monetization Officer.

For those outcomes to be fully realized and to create value on a true industrial scale, organizations need to have a laser focus on the process – automating steps and reducing waste to dramatically reduce the overall cost and time to insight for the production of “analytical widgets” in our “Data Factory”. If you think about it, that’s exactly what we’ve seen happening in the manufacturing world since the very first Model T rolled off the production line at Ford – why should the world of data be any different?

The Data Factory process really consists of 3 key steps. In the rest of this blog, I’ll outline each step and suggest how we might do more to industrialize the process.

Figure 1: Data Value Creation Process

Step 1 – Discover

The first step in the value chain is to explore the data in order to find something, to discover a pattern in the data that we might be able to apply in a subsequent step to create value for the business. Without Discovery, all you have in the data lakes is lots of data. That’s lots of costs and not a lot of value. None in fact, so this is perhaps the most important step in the process.

Discovery could be just about anything but most often we will be looking to optimize a customer interaction, such as applying personalization elements to an application to make content or an offer more relevant and compelling. Applying the personalization comes in Step 2, but before we get there, we need to uncover the pattern that allows us to be more personal.

To find patterns in Discovery, the data scientist will iterate through a number of steps to prepare data, build a model and then test it until the best one is found. The process is iterative as many factors can be changed such as the way data is prepared, the algorithm used and its parameters. As a model is a representation of knowledge or reality, it is never perfect. The Data Scientist will be looking for the one that performs the best for that specific business problem.

You can think about the value at this stage as personal value. Value to the data scientist in what they have learned, not commercial value to the organization. For that to happen, we need to operationalize the pattern we found by applying the model. See step 2 below.

Testing Models with Machine Learning and Data Science

This isn’t meant to be a data science primer but before we move into the Monetize step, it might be helpful to briefly review some of the basics around Data Science.

To keep it simple, let’s imagine we have a classification problem where we are trying to predict which customers are most likely to respond to a particular marketing campaign and we are going to build a classification model using existing sales and customer data so we can do just that.

To avoid over-fitting the data and ensuring that the model is accurate in the future when new data is applied, we split our data and keep some back so we can test our model with data it has not seen during the training process. We can then tabulate the results into a “confusion matrix” and look at the type of errors made and the general classification rate. False positives are where the model predicted a purchase and no purchase was made and a false negative is the other way around.

Whether any model is good or bad is very contextual. In our case, the 200 false positives may be great if the cost of the campaign is low (email) but may be considered poor if the campaign is expensive or these are our best customers and they’re getting fed up with being plagued with irrelevant offers! The situation is similar with the false negatives. If this is your premium gateway product and there is any chance of someone purchasing it, you may decide this misclassification is OK, however if it’s a fraud problem and you falsely accuse 300 customers then that’s not so great. See the blog on Is Data Science Really Science for more on false positives.

Figure 2: Sample Model Prediction (Confusion Matrix)

When we score our testing data, the model makes a prediction of purchase or non-purchase based on a threshold probability, typically 0.5. As well as changing the model algorithm and parameters, one of the other things the Data Scientist might do is to alter the threshold probability or misclassification cost to see how it impacts the errors in the confusion matrix, making adjustments based on required business goals so the best overall model is found.

Another approach to optimizing marketing campaign effectiveness is to rank results using “expected value” which we calculate by multiplying the probability of a purchase by the expected (purchase) value, often using the customer’s average previous purchase value as a proxy.

For example, we might want to mail the top 10,000 prospects and maximize income from the campaign so we rank our customers by expected value and select the top 10,000. In this way, someone with a probability of 0.3 but average purchase value of $1000 would be higher in our list than someone with a much higher probability of 0.8 and lower average value of $100 (expected value of 300 vs 80).

I’ve just used a simple example here to avoid confusion – the real world is rarely that straight forward of course. We may need to boost or combine models or tackle unsupervised modeling techniques, such as clustering, that are non-deterministic and therefore require greater skills on the part of the Data Scientist in order to be effective.

Step 2 – Monetize

It’s worth noting that I’m using the word “monetize” here as shorthand for “creating value” from the data. I’m not suggesting selling your data, although that may be the case for a limited set of companies. It may also have nothing to do with actually making money – in the case of a charity or government organization the focus will be on saving costs or improving service delivery – but the same broad principles remain the same regardless.

It’s also worth noting that not all of the models coming out of the Discovery step will need to be operationalized into an operational system such as an eCommerce engine. It may be that the insights gained can simply help to refine a manual process. For example, a retailer might benefit from looking at the profile of customers purchasing a particular group of fashion products to see how it aligns to the target customer segment identified by the merchandising team.

Having said that, in most cases, more value is likely to be created from applying AI and machine learning techniques to automated processes given the frequency of those decision points.

We will focus more on this aspect in the remaining part of this blog.

For those problems where we are looking to automate processes, the next thing we need to do is to monetize our model by deploying it into an operational context. That is, we set it into our business process to optimize it or to create value in some way such as through personalization. For example, if this was an online shopping application we might be operationalizing a propensity model so we display the most relevant content on pages or return search results ranked in relevance order for our customers. It’s these kinds of data-driven insights that can make a significant difference to the customer experience and profitability.

What we need to do to operationalize the model will depend on a number of factors, such as the type of model, the application that will consume the results of the model and the tooling we’re using. At its simplest, commercial Data Science tooling like Statistica and others have additional scoring capabilities built in. At the other end of the spectrum, the output from the Discovery process may well just land into the agile development backlog for implementation into a new or existing scoring framework and associated application.

Step 3 – Optimize

I’ve already mentioned that no machine learning model is perfect and to further complicate things, its performance will naturally decay over time – like fine wines, some may age delicately, while others will leave a nasty taste before you get it home from the store!

That means we need to monitor our models so we are alerted when performance has degraded beyond acceptable limits. If you have multiple models and decision points in a process, one model may also have a direct impact on another. It is this domino effect of unforeseen events which makes it even more important not to forget this step.

Another area where the Data Scientist will have a role to play is in the refinement of model testing to ensure statistical robustness. To fast track the process, a Data Scientist may combine many branches of a decision tree into a single test to reduce the number of customers needed in the control group when A:B testing to understand model lift.

Having been alerted to a model that has been degraded through this kind of testing, we’ll need to refresh the model and then re-deploy as appropriate. In many cases, we may just need to re-build the model with a new set of data before deploying the model again. Given that the data and model parameters are going to remain unchanged, this type of task could readily be undertaken by a more junior role than a Data Scientist. If a more complete re-work of the model is required, the task will be put into the Data Scientist backlog funnel and prioritized appropriately depending on the criticality of the model and impact on profits.  Although there is more work involved than just a simple re-calibration, it will still likely be far quicker than the initial development given more is known about the independent variables and most, if not all, of the data preparation will have been completed previously.

Just like in the previous step, if you are using commercial Data Science software to deploy your models, some model management capability will come out of the box. Some may also allow you to automate and report on A:B testing across your website. However, in most instances, additional investments will be required to make the current operational and analytical reporting systems more agile and scalable to meet the challenges placed on them by a modern Digital business. If the business intelligence systems can’t keep pace, you will need to address the issue one way or another!

Industrializing the Process

Techniques and approaches used in modern manufacturing have changed immeasurably since Henry Ford’s day to a point where a typical production line will receive parts from all over the world, arriving just in time to make many different products – all managed on a production line that just doesn’t stop. Looking back at our 3 steps by comparison, it’s clear we have a lot to learn.

A well-worn phrase in the industry is that a Data Scientist will spend 80% of their time wrangling data and only 20% doing the Science. In my experience, Data Scientists spend the majority of their time waiting for the infrastructure, software environment and data they need to even start wrangling (see my related blog, Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It!). Delays brought about while new infrastructure is provisioned, software stacks built, network ports managed and data secured all add to the time and costs for each of the data products you’re creating. As a result, the process is often compromised with Data Scientists forced to use a shared environment or a standardized toolset. Without care and careful consideration, what we tend to do is to make what is after all, a data discovery problem, into an IT development one! There’s no ‘just in time’ in this process!

What if you could automate the process and remove barriers in the Discovery phase altogether?

The benefits could be huge!  Not only does that make best use of a skilled resource in limited supply (the Data Scientist), but it also means that downstream teams responsible for the Monetize and Optimize steps can schedule their work as the whole process becomes more predictable. In addition to the Data Science workload, what if the environment and toolchain required by the agile development team to Monetize our model (step 2) could also be automated?

Much can also be done with the data to help to accelerate the assembly process. Many types of machine learning models can benefit from data being presented in a “longitudinal” fashion. It’s typical for each Data Scientist to build and enhance this longitudinal view each time more is discovered about the data. This is another area that can benefit greatly from a more “industrialized view” of things – by standardizing data pre-processing (transformation) steps we improve quality, reduce the skills required and accelerate time to discovery. This is all about efficiency after all, but that also means we must add the necessary process so individual learning can be shared among the Data Science community and the standardized longitudinal view enhanced.

Back to Big Data Strategy

The point we started with was that creating value from data requires broader thinking than just a Big Data strategy. By looking in detail at the 3 steps in the value creation process, organizations can begin to unlock the potential value trapped in their data lakes and industrialize the process to eliminate costs and create greater efficiency with improved time to value.

At Dell EMC, we’re working with many of our customers to assess and industrialize their data value creation process and infrastructure. We’ve also created a standardized architectural pattern, the Elastic Data Platform, which enables companies to provide ‘just in time’ data, tools and environments for Data Scientists and other users to expedite the Discovery process. To learn more, check out this video featuring my colleague Matt Maccaux:

To learn even more about Data Monetization and Elastic Data Platform from Dell EMC experts, read the InFocus blogs:

Driving Competitive Advantage through Data Monetization

Avoid the Top 5 Big Data Mistakes with an Elastic Data Platform

Elastic Data Platform: From Chaos to Olympian Gods

 

The post Industrializing the Data Value Creation Process appeared first on InFocus Blog | Dell EMC Services.

]]>
https://infocus.dellemc.com/doug_cackett/industrializing-the-data-value-creation-process/feed/ 0
Applying Parenting Skills to Big Data: Play with Friends and Learn from Experience https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-play-with-friends-and-learn-from-experience/ https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-play-with-friends-and-learn-from-experience/#respond Tue, 12 Jun 2018 09:00:03 +0000 https://infocus.dellemc.com/?p=35521 This series of blogs was inspired by a discussion I had with a customer senior executive when I found myself exploring the topic of value creation, big data and data science through the lens of parenting skills – offering a different and relatable way of thinking through the challenges many organizations face. In the first […]

The post Applying Parenting Skills to Big Data: Play with Friends and Learn from Experience appeared first on InFocus Blog | Dell EMC Services.

]]>
This series of blogs was inspired by a discussion I had with a customer senior executive when I found myself exploring the topic of value creation, big data and data science through the lens of parenting skills – offering a different and relatable way of thinking through the challenges many organizations face.

In the first blog, Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries, I discussed the notion of ‘long term greedy’ and governance. The second one, Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It!, covered tools, discovery environments and industrializing the process to realize value from your data as quickly as possible. I’ll finish out the series focusing on roles and responsibilities, working with partners and learning from the analytics journeys and missteps of others.

A Familiar Pattern of Play

Watching kids play has to be one of the greatest treasures in life, and that’s never more so when they are playing with their best friends as they’ve already learned how to play together and what each brings to the party. They might fall into a familiar pattern of play as a result but it’s one that works and yields a happier, more productive experience for all involved.

Try this experiment at home – next time your kids are in full-on game mode with their best friends tell them you and your spouse want to play as well. See what happens to the game then? You’ve crashed their party and the kids are probably not too thrilled with the interruption or your lack of gaming skills!

We find the same to be true for big data solutions.To achieve full digital transformation in the organization, you need to apply data science to every aspect of the business. To do so affordably, you need to minimize the time taken to discover and operationalize insights and increase the cadence of those experiments. What we have found is that in order to achieve that, the people that aren’t directly involved need to stand back from the process so as not to disrupt or delay it.

Knowing Your Place in the Sandbox

In the case of enterprise big data, we’re talking about the IT and Security teams. They have a vital part to play in the enablement processes, in provisioning a safe place to play and the tooling that’s needed, but they have no role in what happens during the discovery process.

To be effective and maximize the speed and cadence for discovery and monetization, you need to architect and implement a platform that allows your IT and Security teams to stand well back from the process.

CEOs measure outcomes, not intermediate steps.The business doesn’t benefit at all until the pattern or whatever it is the data scientist is looking for in the data is actually implemented into a system of engagement. In turn, as a CDO I’m looking at eliminating waste and delays in the discovery – monetize and optimize chain.There is no greater waste than having your most valued asset – the data scientist – sit on their hands while waiting for an suitably sized environment or for the data they need to use, or the libraries they want to try or… or…

Get Out of the Plumbing Business

We have experience working with clients at every level of maturity in their Big Data journey and across all industries. Based on this experience, we have built a solution called the Elastic Data Platform (EDP), and while we offer a full portfolio of consulting services, we increasingly find ourselves talking about EDP with customers because it fills the gaps in what they are looking to achieve and enables them to use their existing Big Data infrastructure and investments. It helps them focus on outcomes rather than plumbing.

And just like your children when they play with friends, we have built the solution with our friends, filling some key gaps around standard Hadoop distributions. For instance, we use a tool from Blue Data to spin up Hadoop and other components almost instantly into Docker containers. You can choose between a variety of cluster sizes and configurations with ingress and egress edge nodes, various tools such as SAS Viya and connect these to back-end data sources through a policy engine and enforcement points that allow you to provide full fine-grained access control and redaction. Importantly, these clusters can be spun up and torn down in seconds.

Learn from Those Who Have Gone Before You

As well as learning from their friends and through play, it’s also important for kids to learn from their elders; people who have been there, seen it and done it all before. Importantly, kids learn both about specifics (look before crossing the road) and more general things that help to shape their views of the world. Both are important as that helps them learn while not getting hampered by things that are inevitably changing around them.

At Dell EMC, we work across a wide range of difficult and challenging environments in every industry. We see technologies on the leading edge of the wave as well as those that have already been well established. We also have a chance to stand back and understand what the fundamentals are – what works, and importantly, what doesn’t.

In many ways, the Elastic Data Platform along with a number of deployment patterns we have for Dell EMC’s underlying technologies underpins that experience. However, we also support our customers in a range of different engagement styles and specialties, whether it’s specifics around particular technologies such as modern AI platforms or current Hadoop tooling or at a much higher strategic level to shape your future direction.

Bringing It All Together

This wraps up my series on parenting skills and how they relate to big data and analytics.  I’ve hit on many of the key points I see organizations consistently challenged by. No doubt there are many other parallels we could draw, so let me know if you have any additional suggestions for the list!

The post Applying Parenting Skills to Big Data: Play with Friends and Learn from Experience appeared first on InFocus Blog | Dell EMC Services.

]]>
https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-play-with-friends-and-learn-from-experience/feed/ 0
Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It! https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-provide-the-right-tools-and-a-safe-place-to-playand-be-quick-about-it/ https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-provide-the-right-tools-and-a-safe-place-to-playand-be-quick-about-it/#respond Tue, 29 May 2018 09:00:03 +0000 https://infocus.dellemc.com/?p=35436 This series of blogs was inspired by a discussion I had with a customer senior executive when I found myself exploring the topic of value creation, big data and data science through the lens of parenting skills – offering a different and relatable way of thinking through the challenges many organizations face. In the first […]

The post Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It! appeared first on InFocus Blog | Dell EMC Services.

]]>
This series of blogs was inspired by a discussion I had with a customer senior executive when I found myself exploring the topic of value creation, big data and data science through the lens of parenting skills – offering a different and relatable way of thinking through the challenges many organizations face. In the first blog, Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries, I discussed the notion of ‘long term greedy’ and governance. In this blog, we will discuss tools, having a safe place to play and the importance of end-to-end measurement.

Providing the Right Tools

One of the things I’ve learned as a parent is that tools need to be appropriate. When my sons were growing up, they had tools that looked similar to mine yet were age and task appropriate – think plastic toolkit with a plastic hammer, screwdriver and workbench.

The same is true of our data environments. ETL tooling makes perfect sense for enterprise ETL. Standardized BI tooling also. But neither are particularly useful for data science which is a highly iterative and personal journey. Tools also impact the quality and speed at which you can produce things. That perhaps explains why professionals use professional tools and amateur DIY enthusiasts like me use cheaper ones. It also explains why the quality of the results is different!

The Data Scientist Toolkit

If the continued success of your business is a function of your Data Scientists’ ability to find and apply patterns in your data, then you had better make them as productive as you possibly can. Part of that is in giving them the tools that they need, and these are unlikely to be the same as the tools you currently use across your business.

For a modern Data Scientist, the toolkit might include Python and R in a notebook like Jupyter. However, if you were embarking on a facial recognition problem, it’s pretty clear that using pre-built tooling such as Tensorflow and DLIB would make a lot more sense that trying to build this capability yourself using primitives as they are more task-specific and productive.

Finding a Safe Place to Play

Where you play is also important. If my sons were going to make a mess, I strangely always preferred them to go to a friend’s house rather than play at ours. In data science, there are some real advantages to having a “clean” environment for each new problem that needs to be tackled. We’ll talk more about this later. But sometimes there may not be enough room for a particular problem so doing the work in Azure or GCP also makes sense, bringing the results back once playtime is over!

Data science is also about experimentation, in the same way that your kids learn as they play. Children not only learn things like balance and fine motor coordination, but also social skills such as collaboration and shared goals. As long as the kids are safe, you can relax and they are free to try new things. For some that might be as simple as jumping on a trampoline, for others an awesome double backflip.

Data Scientists will break things when they play. They will do things and try things that make absolutely no sense and sometimes things are just going to go “bang”. That’s fine. Or at least it’s fine as long as they have an isolated place to play and do their stuff.

Or, put another way: if you don’t give Data Scientists a safe place to play, the only other thing you can do is to either stop them experimenting – the opposite of what you really want – or put them all in the same isolated swamp and let them fight it out.

The equivalent here is having lots of kids share the same trampoline at the same time. If that happens, your enterprise might be safe, but collisions are bound to happen and that’s going to have a measurable impact on productivity. Since our goal is all about trying to make our Data Scientist more productive, that seems like the wrong way to go.

It’s Not Just About the Data Scientists

Up until now we’ve been focusing on data science, however there are other players in the organization that are equally important in our ecosystem.

Once the Data Scientist has done his or her job and discovered something in the data that your organization can create value from, the next task is to monetize that insight by operationalizing it in some form. So, along with the Data Scientists, you’ll have agile development teams that need to evolve the data science into something that is enterprise-grade. We will talk about this further in a future blog but the point to take away is that others will also need environments that offer a safe and isolated place to play and the quicker you can provide them, the better.

Speed Counts

When your kids run a race at school, you press the stopwatch at the beginning and stop it at the end. If it’s a relay involving multiple people, the time that’s recorded is the team time – not just one leg of the race.

Following that thought for a moment, the key time for us in business is from the identification of the business problem through the point at which we have found a pattern in the data to the implementation of that pattern in a business process – in other words, from the idea to operationalizing that insight. It’s not just the discovery, and not just operationalizing it. We measure the time from beginning to end and that includes the time taken to pass the baton between people in the team.

The Clock Is Ticking

So, now for a much told myth – Data Scientists spend 80% of their time preparing data and 20% actually doing data science. Hands up if you believe that? Frankly speaking I don’t, well, at least it’s not quite the whole truth!

In my experience, Data Scientists spend much more of their time waiting for an environment and/or data. By simply eliminating that completely non-productive time, you could push through so many more data science projects. I don’t mean to trivialize other aspects of the process as they are all important – however this issue stands out to me as by far the most critical.

If I was a Chief Data Monetization Officer, I’d be looking at how we need to work on speed in the business and measure that in metrics such as time to discover, time to operationalize and time to manage.

Then I’d look at the key blockers that cause delays in the process and architect those out if possible.Time to provision is what has to happen before the Data Scientist or agile development teams can do ANYTHING and I’ve found that often takes MONTHS in most organizations.

So What Does Good Look Like?

Photo of Usain Bolt courtesy of xm.com.

A friend of mine once came up with what I thought was a fantastic idea. She thought it was impossible to know just how exceptional sprinters like Usain Bolt were because everyone else in the race were also good. She suggested that they should randomly pick someone from the audience to run in lane 9. That way you’d have a reasonable comparison to mark “average” against.

If you want to know what good looks like in the world of big data and data science, it’s the ability to fully provision new analytics environments in minutes.

Months or more than a year is a more typical starting point for many and that’s a real problem. And remember, we’re measuring the time taken to give the Data Scientist everything they need, not just some of it – that includes the tools, in an infrastructure that is isolated from their peers and with the libraries and data they need.

Navigating Your Big Data Journey

At Dell EMC, we offer a comprehensive portfolio of Big Data & IoT Consulting services from strategy through to implementation and ongoing optimization to help our customers accelerate the time to value of their analytics initiatives and maximize their investments. We also help organizations bridge the people, process, and technology needed to realize transformational business outcomes. For example, one of our solutions, the Elastic Data Platform, enables Data Scientists to have tool flexibility and isolated environments to work in that can be provisioned in minutes.

In my next blog, I’ll discuss the value of trusted partners and how to benefit from the experience of others.

Stay tuned and happy parenting!

The post Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It! appeared first on InFocus Blog | Dell EMC Services.

]]>
https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-provide-the-right-tools-and-a-safe-place-to-playand-be-quick-about-it/feed/ 0
Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-focus-on-outcomes-and-set-boundaries/ https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-focus-on-outcomes-and-set-boundaries/#respond Tue, 15 May 2018 09:00:00 +0000 https://infocus.dellemc.com/?p=35341 I feel like I’ve spent much of my adult life “coaching” people in some form or another, both personally as well as professionally.  In one such conversation with a customer senior executive, I found myself struggling to explain some of the things I thought they needed to do to make their big data and data […]

The post Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries appeared first on InFocus Blog | Dell EMC Services.

]]>
I feel like I’ve spent much of my adult life “coaching” people in some form or another, both personally as well as professionally.  In one such conversation with a customer senior executive, I found myself struggling to explain some of the things I thought they needed to do to make their big data and data science program successful. In that moment, it occurred to me that there are many helpful parallels between big data projects and parenting – a topic that many of us can relate to!

Yes, I know on the face of it that sounds rather mad, but bear with me on this one! Over the course of the next few blogs I plan to touch on the following topics:

  • Long term greedy – focus on outcomes and make the right long term investments to get there
  • Governance – too much and too little will stifle progress
  • Tools to thrive and learn – let the situation and the person dictate their discovery tools
  • A safe place to play – you can expect things to get broken so plan accordingly
  • Speed counts – it’s the time from start to finish that matters, not just one leg of the race
  • Valued friendships – compliment your solution and fill gaps with tried and tested partners
  • Lessons learned – benefit from those who have gone before you

So, let’s get to it!

Focus on Outcomes

In my family, we have always tried to encourage our kids to adopt an approach of being “long term greedy”.  To me that means staying in it, whatever ‘it’ is, if it helps you achieve your longer term goal – even if you’re tempted to bail out sooner.  Say for example, you stay in a job despite frequent calls from recruiters because you believe long term it will provide the foundation of experience you need to open better doors later.  Long term greedy is about investing upfront in the foundational pieces that will inevitably lead to longer term outcomes and success.

So what does long term greedy look like from a business point of view in relation to big data and analytics?

Well, that depends on who you are.

If you’re the CEO of a bank, you’re focused on outcomes such as cost income ratios and return on assets employed, while the CEO in a retailer is focused on like-for-like sales and margin density rates. Every CEO knows they need to invest to grow top line measures in their business; and data has increasingly been seen as a critical area of investment with the potential to differentiate and drive competitive advantage.

The question though is what you should be investing in to achieve the desired outcomes.  I’m going to suggest that just investing in a data lake is not the right thing to do.  Data in of itself creates zero value for the business.  We create value by first Discovering something about the data that is of value to the business and then applying it in an operational context to Monetize it.  We should invest in the things that help to drive an increase in the rate at which we can churn through Discovery backlogs as well as the speed these can be implemented into operational systems.  A data lake may of course be an important part of what’s needed to get there but it’s not the outcome.

In fact, in many forward-looking companies the role of Chief Data Officer is continuing to evolve. Rather than just being a custodian and gatekeeper for data, they’ve become more of a Chief Data Monetization Officer – focused on building systems, processes and people that help drive value from the data. These are longer term investments, not just costs that have to be borne by the business.

Set the Right Boundaries

I’ve also learned as a parent that getting governance right is key to success in the long term. Too much governance, too tight a control and your kids won’t be able to go out and explore. They won’t have a chance to feel uncomfortable and know that’s OK. If you’re not careful, they won’t want to stray more than a foot from your side and that’s going to get old very fast.

On the other hand, we all know that kids need boundaries within which to operate. They need to understand what those boundaries are for their own safety and well-being, as well as others.

The same is true when it comes to data.  Too much governance and nobody can get access to the data let alone use it.  Too little and you find data duplicated all over the place.  The net result is that infrastructure, license and support costs spiral out of control along with the data and before you know it, you end up with a data swamp of questionable economic value and a long tail of costs for the business.

Navigating Your Big Data Journey

At Dell EMC, we offer a comprehensive portfolio of Big Data & IoT Consulting services from big data strategy through big data implementation, and ongoing optimization to help our customers accelerate the time to value of their analytics initiatives and maximize their investments. We also help organizations bridge the gap of people, process, and technology needed to realize transformational business outcomes, including defining a strategy and establishing governance.

In my next blog, I’ll discuss how the tools that help you learn and explore, along with having a safe place to play applies to enterprise big data success, for your data scientists in particular.

Stay tuned and happy parenting!

The post Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries appeared first on InFocus Blog | Dell EMC Services.

]]>
https://infocus.dellemc.com/doug_cackett/applying-parenting-skills-to-big-data-focus-on-outcomes-and-set-boundaries/feed/ 0