How Sin-Mei Tsai, Shippo’s VP of Engineering, Defines Code Quality

Six weeks ago, we kicked off a new series where we profile an awesome operator in our network to learn about his/her work and contributions to his/her organization and community at large.

Our inaugural guest was Bala Subramaniam and we asked him three questions on marketplaces.

This time, I am very excited to introduce Sin-Mei Tsai, VP of Engineering at Shippo, a Version One portfolio company. Over the years, I have had the pleasure of getting to know Sin-Mei over many lunches. She is a Silicon Valley veteran – having been the Senior Director of Engineering at Yodlee, then holding several CTO, Chief Architect, and other engineering leadership roles at a few startups before landing at Shippo in 2016. Without saying, we were thrilled to have her join the team because of her experience. And she has not only been instrumental to Shippo, but also an extremely valuable resource to other engineering leaders in the Version One family. In fact, every time there is an engineering-related question, the first person I go to is Sin-Mei.

With that said, I thought it would be great to share Sin-Mei’s insight and wisdom with our blog community. Instead of asking three questions as we did with Bala, we’re going to take a very deep dive into a critical topic: code quality.

What follows below are Sin-Mei’s words on how she (and Shippo) define code quality. As you’ll see, she is an incredible resource – so if you have any questions or thoughts on this topic, please leave a comment below!


How do you define code quality?

At Shippo, we give a lot of thought to our interview questions. When we interview managers, one of our standard questions is “How do you think about code quality?” or  “What does code quality mean to you?” Recently, I was pleasantly surprised to be asked this question by a candidate. Typically I answer these questions spontaneously, reflecting my stream of consciousness, and I liked what came to mind. At Shippo, we think of code quality from 3 different points of view. One of them is the orthodox definition of code quality, but we have expanded on code quality with a couple of points of views of our own.

Functional quality

The first and foremost category of quality for engineers is the mainstream definition of quality: “Did the code work as designed?” Most quality assurance practices address functional quality by identifying defects in code.

Defects are cheapest to address when found early. For that reason, unit testing is very important. At Shippo, we focus our unit testing on functional testing. Instead of testing whether the code behaves as coded, we focus on whether the code behaves as the user would expect. We try to use mock objects only when we must, such as in stubbing out an external dependency.

In our legacy codebase in Python, an interpreted language, the engineers rightly wanted to focus on code coverage. For an interpreted language, code coverage can help ensure that each line of code has been exercised and does not contain syntax errors. With a compiled language like Go, which is the language we are using in our new platform, we have the advantage that syntax errors are caught by the compiler. This allows us to focus on functional testing, which is a hybrid of unit testing in that the test focuses on a functional unit, a function, and black box testing where we exercise the function as the caller without mocking any internals. To help us perform functional testing without mock objects, we maintain a seed database that reflects the database schema and contains the data definitions that we would expect in production. This seed database provides the initial data set for any local and integration testing environments.

Besides unit testing, engineers also perform manual developer end-to-end testing. This end-to-end testing is performed to ensure that when we integrate different pieces of code, the code behaves as expected. It is in manual integration testing that we test some tricky combinations. Sometimes we find that the new code does not play well with old state, or with old code, which is a transient problem that happens during deployment.

To make doubly-sure that code behaves as expected, we also employ independent testing during the quality assurance phase. Even though developers are required to thoroughly test their code and regression test any impacted functionality, developers tend to be myopic in their testing. Developers will test from their own point of view, from having written the code. An independent tester will test cases that are unexpected. One such example that we ran into in the past was a new function to add addresses to an address book. An independent tester thought of the case to remove the last remaining address, whereas the developer assumed that it couldn’t happen because the requirements stated that it shouldn’t happen. At Shippo, the engineers take on the role of testers and test the code developed by their fellow engineers. So long as the engineer didn’t contribute to the code being tested, they are eligible to be the independent tester.

For API changes, we perform parity testing where we record API requests and responses from production, and then playback the recorded API requests against the new code and compare the responses from the system under test against actual responses recorded from production. Because the API is stateful (a POST or PUT changes the state), we have developed heuristics for performing the comparisons in a parity test. This is especially important because we support many versions of the API, including some with unexpected side effects or object representations, such as in the case of null vs. empty object.

Technical design quality

At Shippo, we also consider tech debt in our definition of code quality. Architectural and software design integrity is very important to us. The question that we ask is: “In adding or changing a piece of code, did we create any intentional tech debt?”

Not all code needs to be scalable. Sometimes, we intend to create a rapid prototype to gauge user interest in a new feature. Once the prototype has served its purpose, it can be thrown away. Code needs to be written considering its longevity in mind. Code that is intended to last longer than a prototype should be written in such a way that future developers can easily change or extend it. Therefore, readability and maintainability are important.

When the code we write is meant to last more than the next year or two, we rely on technical designs to ensure the code ages well. Technical design is one of the most rewarding parts of code development. Nothing feels as satisfying as an elegant design. While elegance may be subjective, a formal peer design review process can help steer a technical design in the right direction.

In a design review, we evaluate the proposed solution against the problem that it is meant to solve. In reviewing a design, we start with the end first. What will it be like to maintain the solution? If the existing infrastructure can already address the new problem, we want to understand whether the existing infrastructure or patterns should be reused. Creating unnecessary new infrastructure or patterns will create additional maintenance overhead for the team and reduce our ability to create value. If a new solution provides enough additional benefit over the old solution, then we might evaluate moving the old code to the new infrastructure.

In a peer design review, the designer shares with the team the benefits and drawbacks of the proposed solution, as well as any other solutions that were evaluated. Typically, the solution will be presented in graphical form, such as a system architecture with flow diagrams. And the reviewers will provide input with regards to scalability, fault tolerance, and maintainability.

Components that need to handle large volume need to be designed for scale. Post development, we make use of performance (stress) testing to ensure that the new component can withstand multiples of the expected volume. The engineer is responsible for understanding the load, for example, the mixture of API calls that the new system will receive once in production. The performance test entails comparing the performance characteristics of the old system and the new system. Sometimes, we will evaluate two designs using performance testing to verify that one design scales better than the other.

Value to customers

Above all, the purpose of product development is to create value for customers. Therefore one of the most important ways to evaluate code is whether it provides value to customers. The way we ensure that code will provide value for customers is by relying on customer-centric practices in creating our product development plans. Before even a line of code is written, our product team conducts customer research to get and vet product ideas.

The engineer’s job is first to understand the customer problem that we are trying to solve, and then to construct the solution for the problem. When building solutions, we are most interested in knowing whether the solution works for the customer. Whenever possible, we favor shipping an early iteration of the solution to get customer feedback.

Once the first iteration is in production, we observe customers using our product and collect customer feedback to gauge whether the solution is useful or desirable, whether it is easy to discover and use, and whether the user wants more of it. If the actual usage of the functionality meets the expected usage and engagement, then we know the code has met the most important quality metric of usefulness.

Our automated emails are a good example of a useful feature — we are constantly releasing variations and improvements to these emails.

As we confirm whether the solution works, we have more confidence in shipping follow-on iterations of the solution. For us, shipping follow-on iterations is a sign of a successful feature.

Conversely, sometimes a new solution can create unexpected outcomes. New functionality can get in the way of something users need to do. Even worse, sometimes new functionality can introduce unintended side effects and detract value or erode trust. We consider these to be cases of bad code quality.

To limit negative effects that arise from releasing new functionality, we roll out code in small increments. For critical functionality, we may start with 5% of the user population and slowly step up to 10%. We rely on instrumentation to provide us visibility as to how new functionality works for users, and as soon as we detect issues, we either fix the issue quickly or dial back the % to 0 if a fix cannot be released quickly.


At Shippo, we are known to be people-first. Our priorities are no different when it comes to code quality. In considering value to customers and technical design quality, we are keeping the customer and the future developer in mind when it comes to code quality.

In a recent conversation with our VP of Product, Yotam Troim, we talked about conducting post mortems on projects that run later than expected. It occurred to me that late code is a form of bad code. Perhaps this can be a topic for future conversation on code quality.

Read Next