programming

Caller-controlled Rollouts

James Socol

26 Jul 2023 • 2 min read

Slightly warm take warning: not everything needs to be feature-flagged.

New APIs or API endpoints, for example, that have no callers yet, don't benefit from the ability to toggle them off. If nothing is executing this code, then there's no behavior to disable.

Let's say we have an API for managing shortened links (because I worked on one a long time ago). There are a number of existing endpoints, like /v1/shortlinks and /v1/shortlinks/{id}. We've even got some metrics like /v1/shortlinks/{id}/clicks. If we want to ship a new endpoint to get data aggregated by country at /v1/shortlinks/{id}/geo, we can deploy it all the way to production—as long as we aren't publishing the docs—without flagging, since no one will know about it, except us. We can test it with real data.

Instead, we probably want to put a feature flag around the UI—assuming we have one. If this new endpoint is powering a new map component or by-country report in customer dashboards, we want to put a feature flag around that new UI so we can integrate and deploy as we build it.

The UI feature flag covers the API as well. Nothing will call the API as long as the feature flag is off, so nothing will execute that code.

This works for new parameters, too. If we have a protocol buffer description of a "User" resource, and we are making a (significant!) change to allow users to have multiple email addresses, the change might look like this:

  message User {
    string urn = 1;
-   string email = 2;
+   string email = 2 [ deprecated = true ];
    string name = 3;
+   repeated string emails = 4;
  }

We can't expect clients to update simultaneously, so for a while we have to support both—and we should prefer the new parameter, if it is set.

if msg.emails:
    emails = msg.emails
elif msg.email:
    emails = [msg.email]
else:
    raise APIError('either emails or email must be set')

The API can gracefully handle the transition as clients start moving to the new parameter. (In reality we probably want to prevent both from being set, and emit a metric or annotate a span to be able to see adoption. But for the sake of brevity I'll leave those for another day.)

This applies equally well at code boundaries as at network boundaries.

Some of my favorite commits to deploy are the ones that add a piece of well-tested code that is never called, and fundamentally doesn't change anything. (I realize this may be a hotter take than the previous one.)

Following the advice of Kent Beck to first "make the change easy" often means building a new utility or support for a new data access pattern, and then using it. When I needed to start sending emails with currency amounts in them, one of the first things I built was a currency formatter for the template engine. That was an encapsulated piece of functionality—contained, easy to unit test—and, because it was unused, a safe thing to deploy. As a bonus, I already had the formatter ready to go when I needed to display billing histories in the web app later.

You can even add new method parameters this way—as long as they are optional at first.

- def currency_format(amount):
+ def currency_format(amount, currency='USD'):
      # ...

All of these have feature flags around the callers. I'm going to send myself test emails, and put a release flag on the billing history page. But they do not need to worry about the flag at the callee. The new behavior is opt-in, so controlling the rollout at the caller is enough.

Sign up for more like this.