If you follow popular data science sites, you’ll notice an amusing trend. What do you think was the most popular post on KDnuggets in January? It wasn’t about the latest deep learning tools, or about Google’s recently released TensorFlow. It was: “20 Questions to Detect Fake Data Scientists”! The post clearly hit a nerve – I’ve seen similar questions crop up on many other forums. If you read some of the responses, you’ll see that people get very worked up about what makes a “real data scientist”. What’s interesting is that they’ll mention very tactical factors – having specific technical skills such as R, Tableau, Python or Spark. Or having implemented a recommendation engine.
Here’s the problem. If you’re insisting on a definitive screening list that applies to everybody, then you’re entirely missing the point of data science.
Do you hear someone being called a fake entrepreneur just because he doesn’t use QuickBooks for accounting? What about an accomplished guitarist who doesn’t play classical violin? Is he a fake musician? No. That’s ridiculous.
The Elephant and The Blind Men
How do we explain the “spot the fake data scientist” trend? Multiple factors are at play, but I like to use the Indian parable of the elephant and the blind men. You’ve probably heard the story. How they are completely convinced that they know what’s in front of them – a pillar, a rope, a wall or a fan. Each one might be correct – from his narrow point of view – but they are all missing the big picture.
Data science as a standalone field is a relatively new concept. It was a natural evolution for insights and methods from several mature disciplines – Computer Science, Robotics, Signal Processing, Operations Research and Industrial Engineering to name a few – applied to data. Ever since Harvard Business Review called data science the sexiest job of the 21st century, many want in on the action. One easy way for aspiring data scientists to do that is to define the field as narrowly as they can, while making sure they include the skills they already have. How convenient. But ultimately, a limited and self-defeating way of thinking.
And it shows. I’ve helped screen and interview candidates for companies looking to hire data scientists – I see that many are confused about what to look for. That makes them reluctant to hire anybody. What I’ve found important is to coach them on what skills are relevant to their specific domain, and help them recruit for those.
Data Science Skills Portfolio
I like to draw an analogy from the personal finance industry. If you diversify your portfolio investments across many asset classes, you can take advantage of the market’s upside, while also hedging against risks. Similarly, I think of a data science team as having a balanced skill portfolio. By taking a long term approach and investing in many skill dimensions, you can build a team well positioned to take on any opportunity, and also respond to any contingency. You don’t want to get too heavy in any one area, otherwise you’ll expose yourself to execution risk. Or miss key insights.
Here’s a framework I’ve found useful when thinking about a data scientist’s skillset:
- Machine Learning
Using computer algorithms to make decisions with data. I like to think of them as business rules engines that program themselves.
Understanding what’s inside large volumes of data. Knowing when to trust your models, and when the patterns or insights you are seeing in the data are real.
- Data Engineering
The infamous “data munging” or ETL work that takes up 80% of a data scientist’s time. Several tools have recently emerged that help streamline this process.
- Software Engineering
Writing high quality, production ready code. Emphasing best practices like unit tests, integration testing, monitoring and alerting systems.
- Data Visualization
Using the most effective visuals to convey key insights in the data. Designing easy to understand dashboards that show key indicators at a glance.
Connecting all the pieces in a strong narrative that persuades the executive team to implement your team’s recommendation. Domain expertise in your industry will be key to be effective at this.
I will dive into these skill areas in more detail in future posts, but for now, note how I don’t mention tactical implementation factors – you can easily change those when you get the right people on the team. The main point I want to get across is that recruiting based on “definitive lists” is bound to be an exercise in frustration. What you want to do instead, is to identify gaps in your current team that prevent you from getting to where you need to be, and hire to fill those gaps.