Takeaways from our first V1 Data Team Hangout
Data / AI / MLGeorge Psiharis (the awesome VP of Business Operations at Clio) reached out with a great suggestion: connect the data teams across our portfolio companies with the goal of building a peer network to share best practices, and together push the bar on data science innovation.
A few weeks ago, we had our first virtual hangout with data scientists in our portfolio from San Francisco, Vancouver, Toronto, and New York. Some common problems surfaced during the discussion, and since I’m certain that our portfolio companies are not the only ones facing these issues, I want to share some of the key lessons from the discussion so we all can grow together…
Challenge 1: Data is fragmented and inconsistent across the organization
Let’s think about the typical (and simplified) data flow in a company: raw data is aggregated, normalized or processed and then stored in a data warehouse. That data is then consumed and interpreted in the form of dashboards.
As a company grows bigger, dashboards sprout up all over the organization with KPIs and metrics that mean different things to different people. In addition, different parts of the organization usually create their own dashboard for a view of more team-specific metrics (like sales, marketing, product, etc.). And in many cases, these team-specific dashboards and benchmarks aren’t perfectly aligned with higher-level business KPIs.
What’s the problem with this? Dashboards are essentially end points of data and the process of creating a dashboard usually involves transforming raw data which can compromise data integrity, consumption and interpretation. There’s a risk that teams will diverge on the definition of metrics the farther we get from the data warehouse. For example, one team might define an “active user” differently from another team in the same organization. In other words, each team’s dashboard is now a silo of information, resulting in data fragmentation and inconsistency.
So, how do we make sure that there is consensus on the definition of KPIs as dashboards become more fragmented? Can we set a gold standard? And how do you make sure that everyone is aligned?
For starters, the company’s data science team can be the oracle of definitions. In situations where there are already multiple dashboards in place across the company, the data team should review all dashboards and learn how the KPIs are being defined and used by each team. Then, they should come up with a company-standard ‘glossary of terms.”
Clio has set up a “data literacy course.” It’s mandatory training so that everyone is on the same page about core metrics. Since Clio is a SaaS company, they have everyone take a crash course on MRR, LTV, CAC, Churn, etc.
In the hangout, general consensus was that this kind of training should be part of the on-boarding process, and then as “a lunch and learn” or part of your All Hands meeting once or twice a year as a refresher for all employees. It can be helpful to administer a test afterwards in order to gauge comprehension. And then each team may need to have specialized training sessions with a deeper dive into their specific metrics.
Challenge Two: Scaling data science powers to meet the firehose of data pull requests
So, let’s say that you’ve got all your data centralized in one place and everyone sees the data science team as the “Oracle.” Sounds great, but the problem is there’s a never-ending list of Trello/Asana requests for data pulls from all around the organization. You’re a small data science team, so you try to create some kind of process like asking teams to fill out data request forms so you can assess the priority/urgency of each data pull. But everyone ends up resenting this extra administrative work since they want their data now – and in the meantime, new requests just keep coming in.
The question is how can you empower different parts of the organization with data? How can you scale your data science powers without necessarily scaling the team?
At Kinnek, they’ve assigned a “data liaison” to each team. It’s someone who’s already engaged with data (think analysts or the dashboard manager for sales, product, etc.). The data liaison serves as triage for their team and only escalates requests to the data science team when necessary. For example, marketing has email data – so if you need email data, you can go to the designated data liaison in marketing for help with this information. Think of it like a hub and spoke model (a hybrid centralized/decentralized model).
At Kinnek, the data science team develops these data liaisons to be more than analysts. The goal is to have each liaison be able to query directly from the warehouse. The company provides Python/SQL training sessions and refreshers, with “homework” being related to the business. The added bonus here is that employees enjoy the challenge and know that the company is invested in their development.
The key takeaway from the hangout? Education and development can help you address scalability issues in data science. Because the human element will always be the key to good data.