Big Data

Applying Parenting Skills to Big Data: Provide the Right Tools and a Safe Place to Play…and Be Quick About It!

Doug Cackett By Doug Cackett EMEA Big Data & IoT Solution Lead, Dell EMC Consulting May 29, 2018

This series of blogs was inspired by a discussion I had with a customer senior executive when I found myself exploring the topic of value creation, big data and data science through the lens of parenting skills – offering a different and relatable way of thinking through the challenges many organizations face. In the first blog, Applying Parenting Skills to Big Data: Focus on Outcomes and Set Boundaries, I discussed the notion of ‘long term greedy’ and governance. In this blog, we will discuss tools, having a safe place to play and the importance of end-to-end measurement.

Providing the Right Tools

One of the things I’ve learned as a parent is that tools need to be appropriate. When my sons were growing up, they had tools that looked similar to mine yet were age and task appropriate – think plastic toolkit with a plastic hammer, screwdriver and workbench.

The same is true of our data environments. ETL tooling makes perfect sense for enterprise ETL. Standardized BI tooling also. But neither are particularly useful for data science which is a highly iterative and personal journey. Tools also impact the quality and speed at which you can produce things. That perhaps explains why professionals use professional tools and amateur DIY enthusiasts like me use cheaper ones. It also explains why the quality of the results is different!

The Data Scientist Toolkit

If the continued success of your business is a function of your Data Scientists’ ability to find and apply patterns in your data, then you had better make them as productive as you possibly can. Part of that is in giving them the tools that they need, and these are unlikely to be the same as the tools you currently use across your business.

For a modern Data Scientist, the toolkit might include Python and R in a notebook like Jupyter. However, if you were embarking on a facial recognition problem, it’s pretty clear that using pre-built tooling such as Tensorflow and DLIB would make a lot more sense that trying to build this capability yourself using primitives as they are more task-specific and productive.

Finding a Safe Place to Play

Where you play is also important. If my sons were going to make a mess, I strangely always preferred them to go to a friend’s house rather than play at ours. In data science, there are some real advantages to having a “clean” environment for each new problem that needs to be tackled. We’ll talk more about this later. But sometimes there may not be enough room for a particular problem so doing the work in Azure or GCP also makes sense, bringing the results back once playtime is over!

Data science is also about experimentation, in the same way that your kids learn as they play. Children not only learn things like balance and fine motor coordination, but also social skills such as collaboration and shared goals. As long as the kids are safe, you can relax and they are free to try new things. For some that might be as simple as jumping on a trampoline, for others an awesome double backflip.

Data Scientists will break things when they play. They will do things and try things that make absolutely no sense and sometimes things are just going to go “bang”. That’s fine. Or at least it’s fine as long as they have an isolated place to play and do their stuff.

Or, put another way: if you don’t give Data Scientists a safe place to play, the only other thing you can do is to either stop them experimenting – the opposite of what you really want – or put them all in the same isolated swamp and let them fight it out.

The equivalent here is having lots of kids share the same trampoline at the same time. If that happens, your enterprise might be safe, but collisions are bound to happen and that’s going to have a measurable impact on productivity. Since our goal is all about trying to make our Data Scientist more productive, that seems like the wrong way to go.

It’s Not Just About the Data Scientists

Up until now we’ve been focusing on data science, however there are other players in the organization that are equally important in our ecosystem.

Once the Data Scientist has done his or her job and discovered something in the data that your organization can create value from, the next task is to monetize that insight by operationalizing it in some form. So, along with the Data Scientists, you’ll have agile development teams that need to evolve the data science into something that is enterprise-grade. We will talk about this further in a future blog but the point to take away is that others will also need environments that offer a safe and isolated place to play and the quicker you can provide them, the better.

Speed Counts

When your kids run a race at school, you press the stopwatch at the beginning and stop it at the end. If it’s a relay involving multiple people, the time that’s recorded is the team time – not just one leg of the race.

Following that thought for a moment, the key time for us in business is from the identification of the business problem through the point at which we have found a pattern in the data to the implementation of that pattern in a business process – in other words, from the idea to operationalizing that insight. It’s not just the discovery, and not just operationalizing it. We measure the time from beginning to end and that includes the time taken to pass the baton between people in the team.

The Clock Is Ticking

So, now for a much told myth – Data Scientists spend 80% of their time preparing data and 20% actually doing data science. Hands up if you believe that? Frankly speaking I don’t, well, at least it’s not quite the whole truth!

In my experience, Data Scientists spend much more of their time waiting for an environment and/or data. By simply eliminating that completely non-productive time, you could push through so many more data science projects. I don’t mean to trivialize other aspects of the process as they are all important – however this issue stands out to me as by far the most critical.

If I was a Chief Data Monetization Officer, I’d be looking at how we need to work on speed in the business and measure that in metrics such as time to discover, time to operationalize and time to manage.

Then I’d look at the key blockers that cause delays in the process and architect those out if possible.Time to provision is what has to happen before the Data Scientist or agile development teams can do ANYTHING and I’ve found that often takes MONTHS in most organizations.

So What Does Good Look Like?

Photo of Usain Bolt courtesy of xm.com.

A friend of mine once came up with what I thought was a fantastic idea. She thought it was impossible to know just how exceptional sprinters like Usain Bolt were because everyone else in the race were also good. She suggested that they should randomly pick someone from the audience to run in lane 9. That way you’d have a reasonable comparison to mark “average” against.

If you want to know what good looks like in the world of big data and data science, it’s the ability to fully provision new analytics environments in minutes.

Months or more than a year is a more typical starting point for many and that’s a real problem. And remember, we’re measuring the time taken to give the Data Scientist everything they need, not just some of it – that includes the tools, in an infrastructure that is isolated from their peers and with the libraries and data they need.

Navigating Your Big Data Journey

At Dell EMC, we offer a comprehensive portfolio of Big Data & IoT Consulting services from strategy through to implementation and ongoing optimization to help our customers accelerate the time to value of their analytics initiatives and maximize their investments. We also help organizations bridge the people, process, and technology needed to realize transformational business outcomes. For example, one of our solutions, the Elastic Data Platform, enables Data Scientists to have tool flexibility and isolated environments to work in that can be provisioned in minutes.

In my next blog, I’ll discuss the value of trusted partners and how to benefit from the experience of others.

Stay tuned and happy parenting!

Doug Cackett

About Doug Cackett


EMEA Big Data & IoT Solution Lead, Dell EMC Consulting

Doug leads the Dell EMC Big Data & IoT Consulting practice in EMEA, engaged in helping our customers deliver more value from their entire information management estate, including modernizing legacy enterprise data warehouses and operational data stores and fully exploiting the power of Big Data and IoT technologies.

Doug has a background and practical experience working with data mining and machine learning tools, as well as designing and delivering large scale information systems for many of the largest companies around the world. Through his combined experience, Doug is uniquely positioned to offer insights and perspective at the intersection of data science and information management.

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *