Last week, I shared some lessons learned from a Domino Data Science Pop-up that I attended a month ago.  There were some very important discussions surrounding the world of data science today. One thread explored the differences between data science and data engineering.

I’ll admit that I was completely unaware of the engineering behind data science when we first launched Insight Data Science back in 2012. And I don’t believe that I was alone. We data scientists were too enamoured with the idea of having the sexiest job of the 21st century.

However, quietly under our radar, data engineering (our “slightly younger sibling”) was emerging, stretching its wings, and undergoing its own evolution. You can read about this from Maxime Beauchemin, data engineer at Airbnb).

So, what is the difference between a data scientist and data engineer?  Companies often overlap these positions but understanding the distinction is essential to building your team and hiring the right resources.

Since Insight added a Data Engineering Program in recent years, we can compare it to the Data Science program to shed some light on these two important roles.

Data Science

The key responsibilities for a data scientist are:

Asking the right questions on any given dataset

Being able to answer those questions – either through statistical analysis, machine learning, and/or data mining

Clearly and effectively communicating any results to interested parties (either verbally or in writing)

Data scientists have a PhD because “it demonstrates that s/he has spent roughly 5 intense years in graduate training to either ask the right questions about data, performing data analysis, create statistical or mathematical models, and present results.”

Data Engineering

A good data engineer:

Gathers data, stores it, does batch/real-time processing on it, and serves it via an API to a data scientist (some companies may call this data infrastructure or data architecture)

Has extensive knowledge on databases and best engineering practices

Data engineers should have very strong software engineering skills. They need to be able to quickly learn to use any of the big data tools on the market, as well as be able to improve the available tools if needed.

With all that said, the easy way to look at the two roles: data engineers enable data scientists to do their jobs more effectively.

So, for those of you looking to build out your data science team: before you hire your first data scientist, ask yourself, will he or she have the infrastructure to be successful? It just might be that you need to hire a data engineer first.

Read Next

View More
Portfolio

Introducing our investment in Bedrock Materials – to produce a low-cost, easy-to-source alternative to lithium-ion batteries

As you may have read in our thesis update last week, we have been spending more and more time in the deep/hard tech space, making our first six investments since 2021. Today, we’re excited to share that we co-led the $9 million seed round of Bedrock Materials, a Chicago-based startup that manufactures the active materials for sodium-ion batteries. We co-led the round with our good friends at Trucks Venture Capital and Refactor Capital.

Introducing our investment in Bedrock Materials – to produce a low-cost, easy-to-source alternative to lithium-ion batteries
Angela
May 20, 2024
Version One

Announcing our investment in Antimatter

Today, we’re excited to share that we led the $2M seed round of Antimatter, whose mission is to build the world’s most valuable peer-to-peer learning network, centered around memes. Spacecadet and Ordinary Holdings joined us as new investors in this round, alongside existing investors Haystack and Compound.

Announcing our investment in Antimatter
Angela
May 30, 2023