I want to share with you a process that my scrum team at Ginkgo Bioworks has recently gone through (and continues to go through) to organize and rationalize our backlog of tech debt tickets.
What even is tech debt? To paraphrase our Software Architect Sam, tech debt is some suboptimal part of software that adds risk or cost. You incur tech debt for some short term gain in exchange for some later cost. (Software) engineering is the primary stakeholder, and tech debt does not violate user-facing requirements. The effort to resolve an item of tech debt has a clear deliverable, usually the elimination of said risk or cost.
Because tech debt is, by definition, not user-facing, we have found that it has been important to organize our tech debt tickets around impact and business value. A well organized backlog of tech debt has enabled us to advocate for, and make, well-informed decisions about allocating resources to paying down tech debt. We have also been able to focus our attention towards particular parts of our tech debt backlog with greater precision and confidence.
We follow this process to organize our tech debt tickets:
- Assign a category to every tech debt ticket in our backlog.
- Assess the approximate impact that completing each tech debt ticket would have.
- Estimate the level of effort it would take to complete each ticket.
- Sort and group the tech debt tickets by these dimensions.
Categories of Tech Debt
Assigning categories to tickets is an important first step to organizing the tech debt backlog because it surfaces the value of each ticket to the business. This is how we justify allocating resources to tech debt rather than implementing features, bug fixes, or other tasks where impact to users and other stakeholders is more immediately apparent. The categories should be chosen to make business value very clear. These are the categories that we use on my team.
A “bug factory” is an area of code that is often the source of (user-facing) bugs. It is convoluted enough (and perhaps without sufficient test coverage) that any attempts to make even small changes to it has a high likelihood of creating a bug. It is perhaps even fragile enough that making (obviously related or not) changes somewhere else would likely cause bugs. Code that is WET (short for “write everything twice” — code that has repeated parts to it) may fall into this category.
The Scalability/Performance category of tech debt covers code that is non-optimal in terms of performance (either on the compute or space dimensions, or both), but sufficient for the scale at the time the code was written. The negative business impact eventually could be application slowness perceived by the user, job completion times that become untenable, or application instability, as scale continues to grow. Scale may be measured in terms of number of users (simultaneous or not), size of dataset, etc.
This category of tech debt can be differentiated from “Bug Factory” in that “touching” such code is not necessary for negative business impact to manifest — it simply manifests as scale grows, perhaps unnoticeably until it is too late.
This category of tech debt includes code that is convoluted enough such that tasks which necessitate working with such code (where “working with” is actually altering such code or even just understanding it) takes an unexpectedly and/or unreasonably long time to complete. Business impact is material delay in time to delivery of future features.
This is different from “Bug Factory” in that working with such code does not usually result in bugs (at least no more than usual) — perhaps the code is convoluted but test coverage is good.
Merely “ugly” code is where its ugliness is largely aesthetic — it does not fall under the category of “Bug Factory” or “Developer Productivity”. Examples include incorrectly formatted code, incorrectly cased identifiers, deviation from coding standards where the only material impact is aesthetic consistency of code, etc. In some cases, this category may be hard to differentiate from “Developer Productivity”. (At what point does aesthetic inconsistency of code have a material impact on developer productivity?) Ugly code should not be categorized as a “Bug Factory” unless it can be empirically shown to be (or at least empirically shown to have the potential to be) a “Bug Factory”.
This is an important category to have because it keeps the software engineers honest when assessing the business value of a tech debt ticket.
Impact and Level of Effort
The next step in the process is assessing impact and level of effort. Level of effort is relatively straightforward. On my team, we used one of several online planning poker apps to estimate each ticket. We happened to use developer-weeks as our unit of effort, but we could just as well have used developer-days, story points, or any other unit.
Impact can be hard to quantify, but the category assigned to a tech debt ticket makes its impact tangible. For instance, the impact of a Bug Factory ticket could be related to the likelihood of causing a bug any time that area of code is modified, the frequency with which that area of code is modified, the severity of the bugs that are likely to result, or a combination of all of these. We didn’t overthink quantifying impact — we simply assigned each ticket a value of “high”, “medium”, or “low” impact.
The tech debt backlog is now described along several dimensions. In order to find the tickets with the most value with the least level of effort, we sorted the list of tickets by impact and level of effort (in descending and ascending order, respectively). Grouping the tickets by category has allowed us at a glance to pick tickets to work on based on circumstance and context. For example, if we anticipated the need to scale up in the near-to-medium term future, then we would have favored Scalability tickets. If the team was really feeling the productivity drain of having to work with an area of difficult code, then we would have worked on tickets in the Developer Productivity category. If our plans had been to build features in a Bug Factory section of the code, then we would have favored paying down Bug Factory debt.
Because our tech debt tickets are organized around business impact, they can be integrated into the team’s overall backlog as “first class citizens”. The value of each tech debt ticket should be nearly as plain as the value of a feature, bugfix, or any other ticket. The decision to allocate resources towards or away from tech debt will therefore be well-informed, and it becomes easier to build the case to spend time on tech debt.
To conclude, I’d like to observe that “tech debt” is an apt term. It accrues interest and becomes harder to pay off over time. By organizing and rationalizing our tech debt backlog, we have been able to pay it down effectively and efficiently.