Automated tracking of token and component adoption
Early on in Onfido’s design system journey all our data and metrics were manually collected. We were a scrappy team without dedicated resources, and we did what we could within those constraints.
The data we collected during this stage helped us to secure a budget and rebuild our design system to be better and more flexible. We had the bones of a great system that included design tokens, and around five high-quality components that we were convinced would be widely used if our product teams adopted them.
Each stage of design system maturity will have different metrics that make the most sense to track against your goals at the time. In this article, I’ll just be focussing on growing adoption.
Minimum viable integration
We started by having conversations with the three product teams we thought would get the most benefit from the system. The only blocker to adoption for them was the up-front effort to integrate the required libraries into their product. Once those libraries were in place, tokens or components could easily be used in new projects going forward, which would add value immediately. We didn’t need any existing components to be migrated to achieve this, and it could be done in a couple of hours (including training, resolving issues, deployment, and testing). We referred to this initial step as the minimum viable integration.
Once the libraries were integrated, we could then plan a series of more granular pieces of work to migrate individual components or tokens and spread that work out over time. This also made testing a lot easier than if we tried to change everything at once.
Actionable metrics
Given that this approach would stretch migrations out over multiple quarters, we needed solid ongoing metrics to help us measure progress against our goals.
We aimed to keep our metrics as actionable as possible. Most of the actions would be informing where our effort was best spent.
For us, the key questions we wanted to answer were:
How do we know if what we’re building is being adopted widely?
How do we focus our efforts in order to increase adoption?
From here, we refined our thinking further with more specific questions that would help inform the two key questions above:
How far are we from our target token adoption as an overall percentage?
Which products could we focus on to give us the biggest gain overall?
Which components have low traction across all products?
Which products have low component adoption?
Which products are shining examples of good practices that we can use as internal case studies to increase visibility and further encourage adoption?
Data mockups
The effort to answer these questions with manually collected data was too high based on how frequently this would need to happen, and the results would be too lossy, so we decided to take a more automated approach.
We started by mocking the data and charts in a Google sheet. We just made up all the data to start with. I approached this like I would any kind of early exploratory design project, and just used the tool that would get us something visual to discuss as quickly as possible. In this case, it was Google Sheets.
Tokens
For tracking token adoption we focussed on color tokens first, due to them being critical for our new theming capabilities, but also that they mapped to our designs in Figma cleanly and easily.
We decided to track Non-token colors (RGB, Hex, CSS keywords), Semantic theme tokens, and Base palette tokens, and display the data per product. This would allow us to get a rough idea of what % of each product’s colors were covered by our new tokens, and then target the products that would give us the most uplift in our global adoption.
We discussed whether we could actually get reasonable data for non-token values, and decided that with some specific regular expressions, we should be able to get something good enough.
Based on the (fake) chart above, we would target the Dashboard product first, for instance.
Components
Components proved a lot more complex. There was no simple way for us to identify with a regular expression what a non-system component was. A button could be implemented as a <button>
, a link <a class="button">
or an input <input type="submit">
or any number of language-specific abstractions or custom-named components that are visually or functionally button-like. This may be possible if you have a more consistent codebase in one consistent language, but that wasn’t the case for us.
For this reason, we decided we wouldn’t bother trying to track percentages as we do with tokens. This led us to discuss whether we should track how many instances of each component a product had, which led down another rabbit hole. For example: say you have a button
component in an app header, and that header appears on 30 different pages or interfaces, is that counted as 30 buttons or one button? We talked through a range of different scenarios like this.
In the end, we decided none of the approaches would actually give us data we could trust, or that was meaningful enough, so we decided to simply start with a product-level view:
How many products had adopted any given component at least once in their codebase?
Super easy to measure, and allowed us to ensure the components we were building were being used.
The first fake-data chart was kind of what we were looking for, but we realized we also needed to know which products were adopting each component. We visualized these as stacked bar charts, with a segment per product. Given we were tracking this data, we could also slice things so that we could see which products were using a lot of components vs just a few (or none at all).
Building it
Given we had a variety of products written in entirely different programming languages, there wasn’t an out-of-the-box solution available at the time that fit our needs, so we built something ourselves.
We already had Looker adopted company-wide for all kinds of metrics, so chose that as our tool to build and display charts based on the data we collected.
Note: There are other similar data tools such as Tableau, but if you have teams at your company who already deal with analytics and metrics, reach out to them and see if you can piggyback on their infrastructure.
The engineers on our team, Julius and Andre, architected and built a “code scraper” that ran nightly on our entire GitLab server. It uses GitLab's API to find all the repositories in our company, and filters for files that might contain tokens, colors, or components (like .css and .tsx) in each repository. It then uses a series of Regular Expressions to find specific strings of characters in those files. For example, we might look for tokens starting with ods-color-
, or import statements for components such as import {ComponentName} from ‘@onfido/castor’
.
Then we map and group the results to a format we can easily manipulate and push it to Looker.
From this raw data, we build out our graphs which you can see below. We set our target at 80% token adoption, and have a minimum bar of three products adopting a component to consider that component successful.
As you can see, our Looker dashboard resembles our initial fake-data mocks pretty closely. Having charts that track both absolute numbers of tokens and percentages helps us get a better picture of each product.
We also built an “orphan explore” view, which is a table listing all the individual non-token color values (e.g. #ffffff
), how many instances there were for each in a given product, and a list of files where they were detected. This let us drill down to find the biggest opportunities and help us write tickets with the teams.
Next steps
Like any part of a design system, this dashboard has gaps, flaws, and will always be a work in progress.
The biggest gap is the scraper only measures color tokens right now. There’s nothing other than time/priority stopping us from increasing the scope to include different types of tokens, we just want to do it carefully and thoughtfully without adding too much noise. Also because our design tooling doesn’t explicitly support sizing/spacing tokens, there’s a little less system rigor on that side, which I’m hoping to improve with the system as a whole next year.
Given the architecture of the scraper, we can’t easily track how these numbers improved over time. The scraper stores a snapshot, and overwrites the last one to not have data storage become a problem. Aggregation of some high-level metrics that we do track over time is something we may add in the future. For now, we decided the value was relatively low as historical data wasn’t as actionable, and we could fill the gap very easily with screenshots periodically. We are also getting close to moving out of the adoption phase and are looking to what our next important metrics are going to be.
The use of different variant combinations is another big gap. We’d like to track variants in some form, but need to better understand what’s most important and actionable here. Instinctually, you might think most-used variants or combinations, but actually, least-used might be more actionable. Then you could deprecate things that aren’t used, potentially. Needs more thought and discussion with the team.
Conclusion
The metrics most useful for your team will depend on your stage of maturity, available resources, organization, and system architecture. Work on these together as a team, and dedicate some time to discussing them.
You don’t need as many metrics as you probably think, so get comfortable with “good enough” metrics, focus on actionable ones, and get what you need to tell compelling stories to retain or grow investment in your system.
Thanks to the Castor squad at Onfido who worked on this: Andre Rabello (who helped with the technical aspects of this article), Corey Hume, Julius Osokinas & Mark Opland.
For other resources on measuring design system adoption and impact, I recommend the following articles:
Originally posted on Medium