Souhaila_Serbout

What about Web API versioning ?
Insights from a large-scale empirical study

20 October 2023

In Web API design, versioning is a critical concept that involves managing changes and updates to an API over time while maintaining compatibility with existing clients. The choice of versioning strategy can significantly impact API users and developers.

None can deny that till today there is a disagreement about ‘what is the best way to version APIs?’. In 2014, Troy Hunt argued in his blog that API versioning is often done incorrectly, leading to unnecessary complexity and confusion. He claimed that the best way to version an API is to use the Accept header, which allows the client to specify the desired format and version of the response. This way, the API can evolve without breaking existing clients or requiring multiple endpoints. He also criticized some common alternatives, such as using query strings, path segments, or custom headers, and explains why they are inferior or problematic.

On the other hand, Martin Nally argues in his blog post in 2017 that content negotiation is not a good practice for API versioning, because it violates the principle of least astonishment and makes the API harder to use and document. Content negotiation is the process of selecting the best representation of a resource based on the client’s preferences, such as language or format. The blog post claims that content negotiation is suitable for media types, but not for API versions, because different versions of an API may have different behaviors and semantics that are not obvious from the media type alone. He suggests that using explicit version identifiers in the URL or the header is a better alternative, as it makes the API more predictable and transparent.

Google recommends following the “two-version in production” pattern when making client-breaking changes in an API. They also suggest using semantic versioning in the format of MAJOR.MINOR, where MAJOR and MINOR are numerical values. Users of Cloud Endpoints API management can deploy multiple versions of their API, and each version is accessible through an endpoint where the version identifier is added to the base path. In cases of backward-incompatible changes, Google advises creating a new OpenAPI specification, where the MAJOR version is incremented, and all paths associated with the version identifier (indicated by v* followed by the MAJOR version number) need to be updated accordingly.

But concretely, several year since the first Web API was published, what is the current state of practice? How do developers version their Web APIs? Do they follow the best practices? Or do they have their own versioning strategies?

Do web API developers adhere to semantic versioning principles? Is there a consistent versioning scheme maintained throughout the entire history of web API evolution? Or, perhaps, do some developers choose not to assign versions to their web APIs? 🤔

Looking at some well known examples of Web APIs we could already see the divergence in practices

Versioning in Vercel REST API

Vercel REST API adopts a unique versioning strategy where individual endpoints are versioned separately. Instead of incrementing the version with every change in response shapes, they employ straightforward v* version labels. A comprehensive documentation for these specifics in the changelog, and the provider ensures that multiple versions of endpoints can coexist simultaneously.

Versioning in GitHub REST API

Until November 2022, GitHub lacked an efficient versioning strategy for their REST API. Since the introduction of API v3 in 2011, the version identifier remained unchanged, regardless of the modifications made over the years. There was no straightforward method to access previous API v3 versions, as versioning was managed using a custom GitHub mime type.

However, this situation was rectified after more than a decade, with the adoption of a comprehensive versioning strategy. This strategy is based on header-based versioning and a calendar-based version identifier. The version identifier will now only be incremented in cases where a breaking change is implemented. Furthermore, the previous version of the API will receive support for 24 months after the release of a new version, allowing developers ample time to transition. Additionally, all changes will be meticulously documented in the changelog.

But,... what happens on a large scale?

Given that web APIs and their docs are not centrally deployed, we have employed a systematic approach based on documentation analysis to examine their versioning practices. And, since openAPI is the most popular API description language, we have focused on APIs that are described using the OpenAPI specification published in GitHub public repositories.

OpenAPI includes a specific required field {"version": string} in the info section pertaining to the API’s metadata. However, there are no constraints on the format used to represent the version identifier. Additionally, version identifiers can be embedded in the API endpoint addresses, which are stored in the server and path URLs. While the OpenAPI standard defines how developers describe their APIs, there is no centralized standard documentation manager service where developers can share API specifications. For example, SwaggerHub does not impose any rules on the format of version identifiers, nor does it require developers to upgrade them when publishing a new version of the API description. We aim to study the resulting variety of version identifier formats found in a large collection of OpenAPI descriptions.

We collected a dataset consisting of 7,114 Web APIs, obtained from 186,259 commits pushed to 3,090 open-source GitHub repositories, belonging to 2,899 GitHub repository owners. To obtain this dataset, we filtered from 567,069 detected potential OpenAPI descriptions to include only those that: (a) belonged to APIs with more than 10 commits in their history (11,408 APIs); (b) were described in JSON/YAML files that were parsable in all commits (10,062); and (c) had at least one path specified in one of the commits. (d) In our previous works we also found out that some developers use OpenAPI to describe data models. Thus we made sure to exclude those descriptions of JSON schemas without any API functionality. The resulting analyzed dataset contains 7,114 APIs.

How are Web APIs versioned in practice?

In this analysis, we focused on the two practices: Static versioning and Dynamic versioning.

Static versioning. The version information is statically included in the API description metadata or in the API endpoint URLs. We call these practices respectively Metadata-based versioning and URL-based versioning.

The majority of APIs (4,445 - 62.5%) and commits (102,986 - 55%) were found to have version identifiers located only in the info.version field of the API description metadata. The version identifiers were present in all of the server and path URLs, as well as in the info.version metadata for 453 commits belonging to 41 APIs. We did not observe any APIs that contained version identifiers in both path and server URLs but not in the info.version metadata field.

36% of the API were found to adopt path based versioning at least in one commit during their history, combined with metadata based versioning. However, only 1% out of the total number of API used Path-based versioning as the only versioning strategy.

Dynamic Versioning. The version information of an API can also be obtained dynamically by the client via a dedicated API endpoint. Instead of specifying the version statically in the info.version field value, clients may retrieve the API version dynamically by invoking the GET /version operation, as documented in the example {“version”: “see /version below”}. This approach was detected in 220 APIs in our collection, where 129 of them were dynamically versioned during their entire history, such as the ONS Address Index API.

Do developers stick to a consistent versioning strategy during the evolution of their APIs?

Our analysis further revealed that, in 6354 APIs, the version identifiers were consistently located in the same location throughout their entire history. However, we could observe 684 APIs in which the location of the version identifiers changed up to three times within their history. The most frequent change occurred in 233 APIs, where additional version identifiers were appended to the paths at some point in their history.

What are the commonly adopted version identifiers formats?

To classify the version identifiers, we employed a set of regular expression rules. These detectors were iteratively defined based on our observations to ensure that most of the samples could be labeled. We also distinguished between version identifiers used to describe preview releases and stable versions of the APIs. The complete list can be found in the replication package.

We ended up with the following formats classification :

Format Most Frequent Version Identifier #APIs Max Commits Avg Commits Median Commits Stdev Commits Max VC Avg VC Median VC Stdev VC
semver-3 1.0.0 3531 1031 28 17 37 496 4 0 17
semver-2 1.0 1093 3585 30 15 116 77 1 0 4
v* v1 489 692 42 20 74 4 0 0 0
date(yyyy-mm-dd) 2017-03-01 327 52 14 12 4 52 0 0 3
other v1b3 213 222 29 18 32 33 1 0 3
integer 1 48 143 27 17 24 113 5 0 20
vbeta v1beta1 115 360 136 35 146 3 0 0 1
date-preview* 2015-10-01-preview 72 47 13 12 5 2 0 0 0
semver-3# 1.0.0-oas3 33 215 32 15 41 18 2 0 4
vbeta.* v2beta1.1 26 30 24 24 4 12 3 3 4
latest* latest 25 137 27 15 28 2 0 0 0
v*alpha* v1alpha 18 339 56 24 91 3 0 0 1
semver-SNAPSHOT* 1.0.0-SNAPSHOT 18 172 32 16 38 36 5 0 9
semver-beta* v1.0-beta 17 113 40 29 29 9 1 0 2
vpbeta* v1p3beta1 9 347 162 35 153 3 1 0 1
*beta* 1beta1 7 37 15 11 9 0 0 0 0
beta* beta 7 47 26 26 12 0 0 0 0
semver-alpha* 1.0.0-alpha 7 48 23 15 15 2 0 0 1
semver-2# 1.3-DUMMY 6 24 16 15 5 3 2 2 1
semver (beta*) 1.0 (beta) 6 58 39 46 13 46 18 26 18
date(yyyy.mm.dd) 2019.10.15 6 24 22 24 4 24 20 24 9
#semver-3 2019.0.0 5 37 22 17 11 3 1 2 1
semver-rc* 1.0.0-rc1 4 190 60 20 75 8 4 5 3
semver-4 6.4.3.0 4 23 16 17 5 9 2 0 4
semver-rc*.* 2.0.0-RC1.0 4 85 54 63 26 0 0 0 0
valpha.* v2alpha2.6 3 26 23 22 2 4 1 0 2
alpha* alpha 2 35 26 35 9 0 0 0 0
dev* dev 2 172 91 172 81 0 0 0 0
date(yyyy-mm) 2021-10 2 14 13 14 1 2 1    

We grouped the formats into categories, depending on if it is employed to represent stable or preview releases, in the table bellow.

The results show that the most utilized format for versioning API releases is Semantic Versioning (SemVer) major.minor.patch, followed by a straightforward approach using an integer to denote the major version of the API, often accompanied by a V prefix. All types of preview release tags are most often found in the info.version` metadata, while release candidate and preview tags are never found as part of path or server URLs.

  Metadata-based Path URL Server URL All
Stable Release Format        
Major version number 29129 45310 14944 89383
SemVer 114663 788 18172 133623
Tag 845 1199 6 2050
Date 5447 299 27 5773
Other 1549 1354 0 2309
Preview Release Format        
**Develop 545 92 106 743
**Snapshot 964 0 11 975
**Preview 863 0 0 863
**Alpha 3003 2339 10 5352
**Beta 19410 15459 207 35076
Release Candidate 548 0 0 548

How does the adoption of Semantic Versioning change over time?

In this study, we assessed the prevalence of Semantic Versioning (SemVer) in API versioning by analyzing the utilization of the info.version field in stable releases of APIs that have been committed to GitHub between 2015 and 2022. Our findings showed a relatively high adoption rate of SemVer in stable releases, with a mean of 75.84% ± 4.79%. However, the adoption rate was lower for API preview releases, where the most commonly used formats did not conform to the SemVer format. Our analysis revealed a linear decline in the adoption of SemVer in preview releases from 2018 to 2022, with a significant increase in the use of simpler versioning formats, such as vbeta or valpha, which combine the major version number with a preview release tag.

In closing

Versioning in Web APIs is a fundamental practice to ensure their compatibility and ease their maintainability. In this empirical study we focused on version identifiers, observing their location, formats, and evolution. Out of 7114 APIs, the majority (5022) utilized static versioning in the API metadata, while a small fraction (220) supported dynamic discovery of the current version through a dedicated endpoint. In terms of version format, we identified 55 distinct formats used to distinguish stable and preview releases, with 535 APIs including preview versions in their Github histories. The number of preview releases pushed to Github showed an upward trend with a yearly average of 1858 commits. With regards to metadata versions, we found that 85% of the 6580 APIs which consistently used the same format throughout their lifespan utilized Semantic Versioning. The adopted version format was unstable in 534 APIs, with 30% switching to SemVer. Our analysis indicated a steady usage rate of SemVer for 75% (on average) of API releases, while preleases adopted more often less detailed formats that only reference the major version of the API, typically with a tag (e.g., “beta” being the most frequent) to indicate their purpose. We also observed the usage of the “two in production” evolution pattern in 175 APIs (56 with more than 2 versions). In these cases, the most prevalent format for version identifiers attached to the path was to reference only the major version, particularly among APIs with fewer than six coexisting versions.