Last week, I shared some lessons learned from a Domino Data Science Pop-up that I attended a month ago. There were some very important discussions surrounding the world of data science today. One thread explored the differences between data science and data engineering.
I’ll admit that I was completely unaware of the engineering behind data science when we first launched Insight Data Science back in 2012. And I don’t believe that I was alone. We data scientists were too enamoured with the idea of having the sexiest job of the 21st century.
However, quietly under our radar, data engineering (our “slightly younger sibling”) was emerging, stretching its wings, and undergoing its own evolution. You can read about this from Maxime Beauchemin, data engineer at Airbnb).
So, what is the difference between a data scientist and data engineer? Companies often overlap these positions but understanding the distinction is essential to building your team and hiring the right resources.
Since Insight added a Data Engineering Program in recent years, we can compare it to the Data Science program to shed some light on these two important roles.
The key responsibilities for a data scientist are:
- Asking the right questions on any given dataset
- Being able to answer those questions – either through statistical analysis, machine learning, and/or data mining
- Clearly and effectively communicating any results to interested parties (either verbally or in writing)
Data scientists have a PhD because “it demonstrates that s/he has spent roughly 5 intense years in graduate training to either ask the right questions about data, performing data analysis, create statistical or mathematical models, and present results.”
A good data engineer:
- Gathers data, stores it, does batch/real-time processing on it, and serves it via an API to a data scientist (some companies may call this data infrastructure or data architecture)
- Has extensive knowledge on databases and best engineering practices
Data engineers should have very strong software engineering skills. They need to be able to quickly learn to use any of the big data tools on the market, as well as be able to improve the available tools if needed.
With all that said, the easy way to look at the two roles: data engineers enable data scientists to do their jobs more effectively.
So, for those of you looking to build out your data science team: before you hire your first data scientist, ask yourself, will he or she have the infrastructure to be successful? It just might be that you need to hire a data engineer first.