Zero to One in Data Science: The Tophatter Story

By Angela, August 03, 2017

This is the last in a series of posts summarizing our speakers’ insights from the Data-driven Marketplace Meetup in May. You can check out previous posts summarizing talks from Bala Ganapathy on pulling levers to reach supply-demand equilibrium and from Jamie Davidson on Winning with Data.  

Every company wants to be data-driven. But what exactly does this mean, or more importantly, how does a small start-up with limited resources go about creating the infrastructure and expertise needed to become data-driven? It’s not as simple as hiring a data scientist and hoping all the pieces fall magically into place.

At our Marketplace Meetup, Jessica Zuniga shared Tophatter’s data journey. If you’re not familiar with Tophatter, it’s a mobile e-commerce marketplace expecting to make $100 million in revenue this year. It netted nearly $40 million in revenue in 2016. Much of this growth is due to how data-driven the company is, especially when it comes to recommendations.

Over the years, Tophatter has been refining its marketplace by making decisions about what products get to go into the marketplace (i.e. seller-side analytics to determine which items and sellers are good). Their value proposition to sellers is “don’t worry about finding the right customers, we’ll find them for you.” As a data scientist, this proposition is what excites Jessica. They’re essentially wanting to build a unique marketplace for each buyer.


The Goal

It’s also important to understand Jessica’s background***. She came to Tophatter from LinkedIn. In other words, she went from a large company with great data, great tools, and a large team to a mid-size startup with no data infrastructure or data background. Fortunately for all of us, she was willing to share her process in great detail.

1- Building the data pipes

Before you can start analyzing data and getting value from it, you need to build the right plumbing or data pipes. Jamie Davidson (Product & Data Platform at Looker) discussed this as well. When Jessica arrived at Tophatter, she had a lot of questions: How would she store the data? Process the data? Kick off and monitor jobs? Push to production?

For TopHatter, AWS was the answer. As Jessica explained, clusters of variable sizes can be generated, used and torn down with a simple Python API. Spark allows for fast data processing. And the Tophatter mobile app was already running on AWS, which facilitated integration. Jessica shared a few other nuts and bolts: she’s a big fan of Redshift for storage. Why? They have distributed warehouses and split queries into parallel jobs which makes it much faster to process very large data. And, she used Sqoop to get data to production.

2- Evaluating recommendations

With the data pipes in place, Jessica and Tophatter could build recommendations. Now the missing piece of the puzzle was how to evaluate the recommendations.

With their first generation of split testing, users were put into a bucket (A or B, or..) depending on the modulus of their user ID. This created several challenges. First, in order to analyze the data, they needed to know what the modulus were. They had no historical knowledge of when a user was in what bucket. And, only engineers could grow/shrink/create splits.

Today, Tophatter takes a different approach to split testing. Jessica created logic that assigns a user to a random split where bucket sizes are determined by the data scientist/product manager/analyst. Non-engineers can control the test and it stores historical data.

How to evaluate an A/B test? As Jessica explained, you need to choose the right metric. You may want to measure the LTV of a customer, but you don’t necessarily have a decade or two to get that measurement. Therefore, you need to find short and medium term proxies for LTV and understand their historical correlation. Jessica also stressed the importance of statistical significance calculations: you need to understand where there has been a true shift in a metric due to an experiment.

The Results

If you’re wondering about the impact: Jessica’s work at Tophatter resulted in a 30% increase in clicks! Definitely impressive, and they’re just at the beginning of this data journey.

*** Since this meetup, Jessica has moved on from Tophatter and is now onto her new adventure which she has yet to share publically. We wish her the best!

  • Armen Gulesserian

    Hey Angela, one thing that I would like to know more about are the “short and medium term proxies for LTV”. Can you please give an example of what metrics correlated with LTV used to measure impact in online marketplaces like TopHatter when historical data isn’t available?

Subscribe to our Blog via Email