Souhaila_Serbout

API Rate Limit Adoption
A pattern collection

30 October 2023

Numerous Web API design patterns and best practices have been proposed to enhance API quality. We specifically focus here on the Rate Limit pattern, aiming to prevent excessive API resource usage by specific clients, thereby improving performance and security properties.

While API Rate Limit is crucial for API management, its adoption lacks uniformity in implementation, which can pose challenges for API providers and developers. Some systems may not even support it. In response, API providers may resort to manual implementation or specialized tools to ensure proper rate limits. The lack of standardization in rate-limiting configurations leads to inconsistencies and hinders developer understanding and systematic studies of real-world systems.

The identified patterns in the collection pertain to the configuration and values of Rate Limit policies, the scope and granularity level of the enforced policies, the measures employed to counteract abusive clients, and different server-side implementation patterns. We structure the API rate limit adoption pattern collection by grouping together patterns with a common purpose.

Rate Limit Configuration Rate Limit Configuration Metrics Rate Limit Documentation Patterns API Rate Limit Communication Reaction to Rate Limit Exceeding Rate Limit Granularity Server-side Rate Limit Implementation

The API Rate Limit Pattern

Here, we provide a shortened version of the Rate Limit pattern, which is the main background of our work. For the complete pattern text, please refer to the Patterns for API Design book [p. 411].

Context: An API contract has been established with clients. An API Description specifying message exchange patterns and protocols has been defined. The API may also be offered without any contractual relationship.

Problem: How can the API provider prevent excessive usage by clients that may harm the provider’s operations or other clients?

Forces: There are several forces to consider when implementing a Rate Limit. These include the economic aspects, such as the cost of implementing and maintaining prevention measures and the potentially negative reactions from clients. Performance is a factor, as the service provider may want or be required to maintain high-quality service for all clients. Reliability is important, as actions must be taken to prevent API abuse from harming other clients. The impact and severity of the risks of API abuse must also be analyzed and weighed against the costs of prevention measures. Additionally, client awareness is important, as responsible clients need to be aware of their usage allowances to avoid being locked out of the API. Furthermore, API Rate Limiting plays a critical role in ensuring the security of an API system. It helps to protect against various types of attacks, including denial-of-service (DoS) attacks, which occur when an attacker sends a high volume of requests to an API in a short period, causing the API system to become overwhelmed and unavailable.

Solution: Implement a Rate Limit to prevent API clients from excessive usage.

Set the limit as a certain number of requests allowed per period . If the limit is exceeded, further requests can be declined, processed later, or served with best-effort guarantees using reduced resources. Customize the scope and period of the Rate Limit. Use tracking mechanisms such as tokens or monitoring tools to enforce the Rate Limit.

The Rate Limit pattern may be included in a Service Level Agreement , and the details of the Rate Limit may be tied to the client’s subscription level as described in the Pricing Plan pattern. In this case, the Rate Limit is used to enforce different billing levels of the Pricing Plan. Clients subject to a Rate Limit may be identified by their API Key.

Pattern collection

In the map bellow, we summarize the   Rate Limit adoption patterns we identified, grouping them   depending on their purposes. In the patterns map, we employ logical operators to emphasize the cases where certain patterns can be effectively co-adopted , instances where their combination is incompatible  , and situations where it is advisable to adopt all patterns as a standard practice .

Rate Limit Configuration

An API Rate Limit is set by defining the maximum number of requests an API can receive from a specific client during a defined time frame. The time window can be measured in minutes, hours, days, or months. The Rate Limit configuration values can be static: the same value independent of the API usage, or dynamic: the value adapts to the client’s behaviors and current system capacities.

Pattern 1: Static Rate Limit

Context: API providers need to set a predefined configuration of rate limits to prevent abusive consumption.

Problem: How can the API provider prevent excessive usage by clients that may harm the provider’s operations or other clients, without the need for complex differentiation based on consumer behavior or characteristics?

Solution: Static rate limits are set by the API provider and remain constant regardless of the number of requests made.

Solution details:

Predefined limits will be set in advance and will not change based on the current load or traffic on the network. Defining a static Rate Limit can be challenging, as it depends on various factors such as the service capacity, the expected demand, the request size and complexity, and the service level objectives (SLOs). One possible method to define a static Rate Limit is to use Little’s Law from queuing theory, which states that the average number of requests in a system (L) is equal to the average arrival rate (Λ) multiplied by the average response time (w). Therefore, L = Λ W. By rearranging this equation, we can obtain Λ = L/W, which means that the arrival rate should not exceed the ratio of the system capacity to the response time. This can be used as a guideline to infer a static Rate Limit value for a service. However, this method assumes that the arrival rate and the response time are known, constant, and independent, which may not be true in reality. Therefore, it is advisable to monitor the service performance and adjust the Rate Limit accordingly if needed.

Consequences:

[+] Simplicity: Static rate limits are easy to implement and understand, as the limit is a fixed value.

[+] Economic Aspects: A static Rate Limit is cheap to implement and maintain.

[+] Performance: A static Rate Limit implementation has a minimum performance overhead on the server side.

[+] Client Awareness: Clients can easily be informed and understand static rate limits.

[+] Predictability: The limit will not change, so clients relying on the Rate Limit know exactly what to expect.

[-] Economic Aspects: A static Rate Limit can not be customized in a fine-grained manner, e.g., per customer, or adjusted to specific situations. This can have economic consequences as some clients can perceive the Rate Limit as too restrictive.

[-] Inflexibility: Static rate limits may not be able to adapt to changing traffic conditions and may cause issues if the limit is set too low or too high.

[-] Client Performance and Reliability: The client can be negatively influenced by too restrictive rate limits. As static limits cannot be adapted to specific client needs, they can have a negative impact on the performance and reliability of some clients.

[-] Unfairness: The Rate Limit may be too restrictive for some clients and too lenient for others.

Pattern 2: Dynamic Rate Limit

Context: The demand for resources, their capacity and the behavior of consumers can vary dynamically, requiring adaptive Rate Limit strategies.

Problem: How can an API provider or system dynamically adjust and configure rate limits for incoming requests to effectively manage resource allocation, prevent overload, and adapt to changing traffic patterns?

Solution: Adjust the Rate Limit dynamically based on conditions such as traffic or system load.

Solution details:

Unlike static rate limits, which are predetermined and remain constant, dynamic rate limits are designed to be adjustable in real-time. This allows the system to respond dynamically to changes in demand, ensuring that the Rate Limit prevents resource overuse or abuse.

This approach is based on monitoring the API latency, which is the time taken to process a request. A sliding window of time is used to calculate the average latency of the service, which is then compared to a target latency representing the desired level of performance.

Suppose the average latency exceeds the target latency. In that case, the Rate Limit is reduced by a specific factor, while if the average latency is below the target latency, the Rate Limit is increased by a specific factor. This allows the Rate Limit to adapt to the changing conditions of the service and maintain a reasonable quality of service. This technique is inspired by feedback control theory and has been applied in various domains such as web servers, cloud computing, and network traffic management.

Consequences:

[+] Adaptability: Dynamic rate limits can adjust to changing conditions and prevent system overload.

[+] Economic Aspects: A dynamic Rate Limit can be customized in a fine-grained manner, e.g., per customer, or adjusted to specific situations. This can have economic benefits, as the rate limits can be adjusted according to economic circumstances.

[+] Fairness: Dynamic rate limits can ensure that resources are shared fairly among users and systems based on their current usage.

[+] Client Performance and Reliability: Due to adaptability, the dynamic Rate Limit has the potential to help achieve good client performance and reliability.

[-] Uncertainty: Clients relying on the Rate Limit may not know what to expect, as the limit may change based on conditions outside their control.

[-] Complexity: Implementing dynamic rate limits can be more complex and require more resources, as the system must continuously monitor its performance, adjust the limits accordingly, and manage the transition between different rate limit configurations.

[-] Economic Aspects: Due to the higher complexity, a dynamic Rate Limit can require more effort to be implemented and maintained.

[-] Performance: A dynamic Rate Limit implementation has a higher performance overhead on the server side than static variants.

Rate Limit Configuration Metrics

The configuration of the Rate Limit value is set based on a specific metric. Based on our analysis, we identified three configuration metrics-related variants: Request-based, Time-based, and Point-based:

Pattern 3: Request-based Rate Limit

Problem: How to effectively control the frequency of API requests made by a client within a specific timeframe?

Solution: Use a request-based rate-limiting strategy to measure and limit the frequency of API requests made by specific clients.

Solution details:

A Request-based Rate Limit can be implemented by using a rate limiter library or tool (such as express-rate-limit1 or koa-ratelimit2), that allows to divide time into smaller windows (e.g: second, minute, or hour) and restrict the number of requests that can be made within those windows based on capacity and projected traffic. The rate limiter would keep track of the number of requests made by each API consumer and restrict the requests once the limit is reached. The implementation should consider adopting a specific solution for when a client reaches the limit.

Consequences:

[+] Predictability: Clients can more easily predict their API usage and adjust their behavior accordingly.

[+] Implementation Simplicity: A request-based system can be more efficient than other types of rate limiting, as it does not require tracking the time taken by each request.

[-] Fairness: each client requests may impact the service differently.

Pattern 4: Time-based Rate Limit

Context: API or service provider is able to estimate the time it takes to process a particular type of request or operation.

Problem: How to prevent clients from sending too many highly-resource-consuming concurrent requests within a given short time window?

Solution: Estimate the processing times for different types of requests or operations, and set a time limit that a client is allowed to consume within a specific time frame. e.g, For every five minutes a client can only send a total number of requests that will need one minute of processing time.

Solution details:

Fix a time window duration window and restrict the number of concurrent requests that can be made within those windows based on capacity, projected traffic, and the estimated time taken by each of the requests sent during the same window.

In the extreme case, only one request can be processed from each client at a time window. Additional requests sent by the same client are rejected as long as the server is busy with the previous one. If the client request hangs or simply lasts beyond the time window duration, it can be aborted and an error is returned to the client.

The Frappe framework can be adopted to implement the time-based rate limit pattern. The framework implements fixed window rate-limiting based on time consumed by requests. The Rate Limit can be enabled by setting in the configuration file:“site_config.json" the values of the attributes limit and window in seconds. Where “limit" is the maximum that requests sent during a time window “window". e.g, In the following configuration example, the sum of the time taken by all the HTTP requests coming from a specific client to 600s within each 3600s time window.

{
  "rate_limit": {
    "limit": 600,
    "window": 3600
  }
}

Consequences:

[+] Scalability: By limiting the number of requests that can be made within a specific time window, the API can handle a larger number of requests and reduce the load on the backend servers.

[+] Predictable resource usage: Since each request takes a fixed maximum amount of time, the server can more easily predict and control the maximum resource usage.

[+] Performance: By limiting the time taken by each request, the server can ensure that requests do not monopolize resources and cause slowdowns for other clients.

[-] Usability: Certain types of requests may require more time to complete and may not be possible under a time-based rate limit.

[-] Unpredictability/User Satisfaction: Clients may be unable to predict the duration of their requests and may be disappointed if their requests are forced to stop when they take longer than the maximum amount of time allowed.

[-] Implementation complexity: Implementing time-based rate limiting can be more complex than other rate-limiting approaches since it requires tracking (on the server) and estimating (on the client) the time taken by each request.

Known uses: Among the Shopify APIs, only the Storefront API accepts requests that take up to 60 seconds per IP address.

Pattern 5: Point-based Rate Limit

Problem: How to efficiently manage resource allocation and prevent API overload when clients access multiple resources with a single request?

Solution: Assign to each request a specific point value based on its complexity and resources required, and restrict the total number of points that can be used within a certain period.

Solution details:

GraphQL APIs often employ points-based rate limit. Since in graphQL, a query can operate on several resources, when deciding the Rate Limit value, the complexity of all the queries that the API can handle should be taken into account. This differs from the other APIs, where every request invokes an endpoint that targets one specific resource.

Known uses:

A widely known example is the GitHub GraphQL API3. The API dynamically computes a rate limit score based on query complexity. The limit score of all the queries made in an hour should not exceed 5000 points/token ($token_limit=5000points/h$):

\[github\_query\_score =min(token\_limit, max(1, \frac{query\_cost}{complexity\_weight} ))\]

Where the $token_limit$ is the specific Rate Limit per used token (default value: 5000 points/h).

The $query_cost$ in the case of the GitHub API is computed based on the relative computational cost of resolving each field in the schema, which is, in other words, the number of calls needed to fulfill the query. Note that an individual query cannot exceed 500k nodes. Also, the minimum cost of a query is equal to 1, in the case of queries with depth equal to 1. By default, all fields in the GitHub GraphQL API have the same complexity weight, which is 1.

Clients track their Rate Limit status by querying fields on the rateLimit object. They can also compute query scores before determining whether they have sufficient points left to run a query.

Consequences:

[+] Customizability: The point values can be customized to reflect the relative importance of different API operations or queries or the availability of different server resources.

[+] Fairness: A points-based system can be more fair and flexible than other types of rate limiting, as it allows clients to make more requests for simpler operations and fewer requests for more complex ones.

[-] Implementation complexity: Implementing a points-based system can be more complex than other types of rate limiting, as it requires tracking the point values of each API operation or query.

[-] Cost Estimation: Service providers should provide their clients with a solution to accurately compute the server’s query execution cost to enable them to adapt to Rate Limit restrictions. While a dynamic cost computation can be more accurate, it can induce additional runtime costs. A static approach would not cause additional runtime overhead but may only provide clients with an estimate or bound on the expected costs.

Rate Limit Documentation Patterns

Rate Limit documentation patterns aim to provide guidelines for documenting API Rate Limit policies and guidelines making the documentation more accessible and easier to understand for developers. We identified two documentation patterns:

Pattern 6: Documentation in Natural Language

Context: API provider has decided to introduce a Rate Limit, chosen a static or dynamic configuration and selected a suitable metric to define the limit.

Problem: How to communicate to API client developers the Rate Limit configuration?

Forces:

Solution: Use natural language to describe the Rate Limit strategy.

Solution details: The information regarding rate limits can be incorporated into the natural language description of the API, such as its web page. Given the potential for misinterpretation in natural language, the presentation of this information must be clear, concise, and easily understandable. This should include a clear outline of the specific details of the rate limit, including the number of permissible requests per unit of time (such as per minute or day), the response that will be returned when the limit is exceeded (such as HTTP 429 "Too Many Requests"), which API elements are limited, and the period after which the limit will reset. It is also important to inform clients of any potential consequences that may arise if the limit is reached.

Consequences:

[+] Trust: Documenting explicitly the rate limit strategy improves the transparency of the API and increases trust among API client developers and users.

[+] Flexibility: The ability for providers to use natural language for describing their rate-limiting strategy in detail offers flexibility in tailoring the approach to the needs of their specific API.

[+/-] Developer Understandability: Natural language presents a high level of human readability and understandability, as it presents the information in a clear and accessible manner. However, it may require more time to grasp the information as the developer needs to go through the entire text unless it is structured in a manner that facilitates quick comprehension.

[-] Machine Readability: NL is not ideal for machine readability, as the information may not be presented in a structured or standardized format that automated systems can easily process. AI/NLP techniques might be used to extract rate limit information from textual API description.

[-] Standardization: The lack of standardization in the use of natural language to describe rate limits across different API providers can lead to confusion and difficulties for developers, as each provider may use different terminology, conventions, or methods for communicating their Rate Limit strategy, making it challenging to compare and understand the restrictions imposed by different APIs.

Known uses:

The default Rate Limit for eBay APIs is presented in a well-structured table, which enhances the ease of information comprehension. The Rate Limit metric used by eBay is consistent across all APIs, measured in terms of the number of calls per hour. This provides clear and straightforward information for developers to understand and adhere to the rate limit. In contrast, GitHub APIs use purely natural language to describe their rate-limiting strategy, making it more challenging for developers to determine the Rate Limit value. The information is dispersed throughout the API documentation and requires a thorough reading of multiple paragraphs to grasp the rate-limiting approach fully.

This is also observed for Meta APIs, where rate-limit strategies are described entirely in natural language, highlighting the diversity of rate-limiting strategies used. The lack of a comprehensive metamodel for describing rate-limiting strategies across different APIs highlights the need for a unified approach to this aspect of API development.

Pattern 7: Documentation in Machine-Readable Format

Context: API provider has decided to introduce a Rate Limit, chosen a static or dynamic configuration and selected a suitable metric to define the limit.

Problem: How to provide automated access to Rate Limit information to API clients?

Solution: Use a well-structured, machine-readable language to fully detail the Rate Limit strategy.

Solution details: For static configurations, Rate Limit values can be conveniently included in machine-readable API descriptions. Providing Rate Limit information in machine-readable documentation allows developers to use tools that can read the API rate limit strategy, assuming the metadata is represented following agreed-upon conventions or standards. It also provides developers with a systematic approach to compare the limitations of various APIs that serve similar purposes and plan their integration accordingly. This enables them to make informed decisions and select the API that best meets their specific requirements. This increased level of transparency and comparability can significantly aid developers in their API integration and usage decisions, leading to a more positive experience.

Consequences:

[+] Machine readability: The ability to automatically parse the Rate Limit strategy out of an API description requires that an agreed-upon metadata representation is followed.

[-/+] Human readability: It might be difficult to read in case Rate Limit metadata is encoded with complex structured languages such as XML. Still, machine-readable Rate Limit descriptions can also serve to automatically generate human-readable descriptions in natural language.

Known uses: Analyzing 248,566 OpenAPI descriptions revealed that only 4,179 contained keywords related to rate limits. This low number indicates a weak adoption of structured expression formats for statically documenting and communicating information about rate limits. It highlights the tendency of developers to focus primarily on functional aspects of APIs, neglecting to provide detailed information about the limitations imposed on usage.

Even API gateway cloud providers, such as AWS and Azure, offer the ability to import API endpoint details, including resources, methods, responses, and descriptions, as well as the mapping between API operations and backend functions, through the use of OpenAPI Specification (OAS) descriptions (AWS, Azure). However, they lack the capability to import and export Rate Limit and usage plan configurations in a machine-readable format. These settings can only be manually configured through forms found in the dedicated web-based user interface, limiting the level of automation and programmatic control that can be exercised over these critical aspects of API management.

API Rate Limit Communication

Developers integrating their client applications with APIs adopting a rate limit require accurate and up-to-date information about their API usage level to track and optimize their API usage and avoid Rate Limit violations. Without a communication mechanism to retrieve Rate Limit details, developers may struggle to obtain the necessary information, leading to inefficient integration, excessive costs and potential disruptions.

We identified two patterns related the how API providers can communicate the current up-to-date Rate Limit state to their clients: Usage of counter headers, and usage of an endpoint to report Rate Limit state.

Pattern 8: Usage of Counter Headers

Context: Clients invoke an API with static or dynamic rate limit configuration.

Problem: How to provide clients with their usage tracking information together with up-to-date information about dynamic API limits?

Forces:

Solution: Include Rate Limit information in the response header of every Web API call.

Solution details: The headers such as X-Rate-Limit-Limit and X-Rate-Limit-Remaining can be used to notify the clients about the dynamic Rate Limit value. The X-Rate-Limit-Limit header reports the total allowed number of requests in the current time window, and the X-Rate-Limit-Remaining header shows the remaining number of requests that can be made before reaching the limit. Customized headers can also be used to transmit the same or other metrics.

Known uses:

Analyzing the response headers schemas included in the OpenAPI descriptions, we detected a total of 316 APIs that dynamically convey information about the API, endpoint, or provider limits through dedicated response headers.

The GitHub REST API embeds the following usage counters in the responses headers of all query operations:

 x-ratelimit-limit: 60
 x-ratelimit-remaining: 56
 x-ratelimit-used: 4
 x-ratelimit-reset: 1372700873

In the case of Shopify, all their REST APIs use a specific header field to report how many requests the client has made over the total number of allowed requests per minute. If the limit is exceeded, a Retry-After header is sent with the number of seconds to wait until retrying the query.

 X-Shopify-Shop-Api-Call-Limit: 32/40

Consequences:

[+] Performance: Clients can receive immediate feedback on their usage of the API and can adjust their requests accordingly, reducing the number of unnecessary requests and improving overall API performance.

[+] Reliability: By receiving up-to-date information on their usage level, clients can avoid surprises and detect whether they are still compliant with the rate limit without getting completely blocked from accessing the API and can plan their usage accordingly, ensuring reliable access to API resources.

[-] Interoperability: Client developers may find it difficult to understand the meaning of the Rate Limit headers unless they are properly documented. Additionally, some clients may not anticipate how to deal with Rate Limit headers in response payloads, resulting in unexpected errors.

[-] Maintainability: Rate Limit headers can add complexity to API documentation and implementation, requiring additional maintenance effort to ensure accurate metering and consistent usage.

Related Patterns:

Pattern 9: Rate Limit Reporting Endpoint

Context: Clients are not about to invoke an API with a dynamic rate limiting configuration, but nevertheless they would want to discover if their previous usage lies within the limits. Service providers offering access with dynamic rate limits need to inform clients about changes.

Problem: How can API providers ensure that client developers have easy access to accurate and up-to-date Rate Limit details?

Solution: Add an endpoint that the clients can use to explicitly retrieve API Rate Limit settings and API usage counters.

Solution details: Dynamic Rate Limit values can be retrieved through one of the API’s endpoints, such as a dedicated endpoint for checking the current Rate Limit status. This endpoint can return information such as the current rate limit value, the time window for the rate limit, and the remaining number of requests. This information can be returned in the response body in a structured format, such as JSON or XML, and can be accessed by the client through a GET request. Notifying the Rate Limit value provides a programmatic way for clients to check the Rate Limit status and can be helpful for automation and monitoring. This way, client applications can check the Rate Limit status before making requests to the API and take appropriate actions like waiting, caching, or prioritizing requests.

Note that these endpoints for retrieving API Rate Limits can be rate-limited themselves.

Consequences:

[+] Transparency: By providing transparent and up-to-date Rate Limit information, API providers can enhance the overall developer experience and foster a collaborative relationship with developers.

[+] Performance: Including counter headers enables developers to monitor their Rate Limit consumption in real-time, empowering them to make informed decisions about their API usage and avoid Rate Limit violations.

[-] Security: If the Rate Limit endpoint is not secured properly, it can become a target for abuse and exploitation, leading to security vulnerabilities.

Known uses: Widely known web APIs have endpoints for Rate Limit information:

Rate Limit Granularity

When establishing a Rate Limit for an API, various levels of granularity can be utilized to regulate the frequency at which requests are made. The degree of granularity selected establishes the precision of the Rate Limit application and aids in ensuring that the API is utilized efficiently and responsibly. This section categorizes the pattern variants related to granularity levels according to their scope into client and resource groups. The distinction is based on whether the Rate Limit restriction is enforced to restrict a specific client or a specific resource.

Client

Pattern 10: Rate Limit Value at the Level of User Account

Context: The source of client requests can be distinguished by authenticating the sender user account.

Problem: How to control the usage of the API by individual users, especially if some users are making significantly more requests than others?

Solution: Set customized quotas for each user or group of users.

This level of API Rate Limiting is appropriate when an application requires different rate limits for different users. It can be used when user accounts are attached to different usage plans.

Consequences:

[+] Flexibility: When a Rate Limit is defined on the user accounts level, API can also provide customized usage quotas or limits based on each user’s specific needs or usage patterns.

[-] Unfairness: In the case of a static Rate Limit value, some users might not need all the resources allocated to them.

Pattern 11: Rate Limit Value per IP Addresses

Context: API provider or service needs to control and restrict the rate at which incoming requests are made from individual IP addresses

Problem: How to prevent abusive usage from single IP addresses?

Solution: Limit the number of requests that can be made from a single IP address in a given period.

Solution details:

This pattern can be implemented for instance using the basic configuration of NGINX. Inside the Nginx configuration, define a limit zone, which is typically defined in the HTTP block. A limit zone specifies the key for rate limiting, the maximum burst, and the rate limit. e.g,:

    http {
        limit_req_zone $binary_remote_addr zone=rate_limit_zone:10m rate=10r/s;
    }

Consequences:

[+] Authentication-free: This strategy does not require any client authentication.

[-] Unfairness: Legitimate users sharing the same IP address might be affected by intended limitations.

[-] Efficiency: Adopting a rate-limiting strategy based only on the IP addresses is inefficient because an abusive user can still use the API from multiple IP addresses.

Pattern 12: Rate Limit Value per API Key

Context: Every client can obtain a specific unique key using a tokens generator offered by the API provider.

Problem: How to control the usage of the API by a specific client?

Solution: Distinguish between the clients based on API Keys, and set a Rate Limit taking it into account.

Consequences:

[+] Precision: providers can distinguish traffic originating from different client applications even if these share the same source IP address.

[-] Precision: Providers are unable to distinguish requests from different users of the same client application.

[-] Security: API keys may be leaked into access logs or code repositories and are susceptible to theft or unauthorized use if not properly secured.

Known uses:

This is the most common level of API Rate Limiting, where all requests from an application identified by its API key are subject to the same rate limit. This level is appropriate when an application does not require user-specific rate limits or when it is difficult to identify individual users (such as with anonymous or public applications). In the case of Github API, this rate-limiting approach is combined with rate-limiting based on the IP Address.

Resource Granularity

Pattern 13: Rate Limit Value for Service Providers

Context: A provider manages several services consumed by multiple clients. One client might need to combine multiple services for the same provider.

Problem: As a provider, how can all the services under my ecosystem adhere to a consistent set of usage guidelines?

Solution: All the APIs of a given provider use the same rate-limiting configuration.

Consequences:

[+] Simplicity: Simplified billing or uniform pricing models encourage users to try and use one or more services offered by the same provider.

[-] Usability: Consistent user experience across all APIs and services of the provider.

[-] Performance: Users might need to access some services more than others.

Known uses:

Pattern 14: Rate Limit at the API Level

Context: All API features have uniform costs, and there are no predictable hotspots as clients invoke them with uniform probability.

Problem: How to control the usage of an entire API, independently of which features are being used?

Solution: Track usage of API features globally and set limits on the entire API, giving the same weight to each request, no matter which API feature it invokes.

Consequences:

[+] Maintainability: Using the same Rate Limit simplifies the management and configuration of rate-limiting rules, making it easier to maintain and update the API’s rate-limiting system as a whole.

[+] API monetization strategy: Having the same Rate Limit on all the API operations makes it easier to set a pricing plan that is not confusing for clients and easier to track.

[-] Scalability: Different endpoints or functionalities within an API may have varying resource requirements. Applying a uniform Rate Limit may hinder the ability to scale certain critical endpoints independently, potentially leading to performance bottlenecks and inefficient resource allocation.

Known uses:

Both the Search API and Files API of Stripe allow up to 20 operations per second. It should be noted that this Rate Limit applies to both reading and writing operations without distinguishing between the two. However, the rate limiter for writing operations is separate from the rate limiter for reading operations.

Pattern 15: Rate Limit at Endpoint or Operation Level

Context: Some API endpoints may be more susceptible to abuse than others. For example, certain endpoints are particularly resource-intensive, such as those that involve complex calculations or database queries.

Problem: How to control the usage of specific features of the API, especially if some features are being used significantly more than others?

Solution: Track usage of specific API endpoints or operations and set limits according to their specific costs.

Consequences:

[+] Precision: Endpoint-level rate limiting can be used to ensure that critical endpoints are not overloaded with requests, which can negatively impact the performance and availability of the entire API.

[-] Understandability: Clients need to be clearly informed about which limits are applied to which endpoints.

Known uses:

Google Analytics Reporting and Configuration web APIs have different default limits depending on whether the endpoint is a writing or reading endpoint. Google also allows users to request additional quotas per each project, for each of the read and the write requests separately4.

Pattern 16: Rate Limit Entity Level

Context: Some specific resources may be more susceptible to abuse than others.

Problem: How to control the usage of specific operations accessing a specific resource?

Solution: Track the endpoints accessing the same resource and set limits according to their specific costs.

Consequences:

[+] Efficient resource utilization: Resource-based rate limiting ensures that a specific resource is not overwhelmed by requests and helps to optimize its usage.

[-] Scalability: If the API is experiencing high traffic, it may be challenging to scale the Rate Limit effectively, as different resources may have varying usage patterns and requirements.

Known uses:

Provider Reaction to Rate Limit Exceeding

In some cases, even after being temporarily blocked due to exceeding the API Rate Limit, certain clients may persist in attempting to make requests above the set limit. This can cause strain on the API and negatively impact its performance. It may be necessary to implement additional measures to prevent such clients from bypassing the rate limit, such as IP address blacklisting or more sophisticated anti-bot or denial of service prevention mechanisms. We identified a set of Rate Limit adoption patterns related to the providers’ reaction to some clients’ abusive behaviors. We classified them into two categories, depending on whether the goal of the provider is to prevent the abusive clients from consuming the API or to mitigate their behavior.

Abusive behavior Termination

Pattern 17: IP Address Blocking

Context: Clients are identified by their IP Address.

Problem: How to effectively terminate abusive behaviors from clients identified by their IP address?

Solution: If an identified client exceeds a predefined limit on the number of requests they can make, further requests from that client should be blocked

Consequences:

[+] Simplicity: simple to implement.

[-] Unfairness: Legitimate users may be mistakenly blocked if the IP address is flagged for abusive behavior.

Known uses: When exceeded the limits, GitHub blocks IP addresses of non-authenticated clients. The sent response has a 403 code with a x-xss-protection header that informs the clients that they are being blocked.

'x-ratelimit-limit': '60',
'x-ratelimit-remaining': '0',
'x-ratelimit-reset': '1689004335',
'x-ratelimit-resource': 'core',
'x-ratelimit-used': '60',
'x-xss-protection': '1; mode=block'

The response also includes a message to inform clients that the Rate Limit value is higher for authenticated requests:

RequestError [HttpError]:  API Rate Limit  exceeded for <IP-ADDRESS>. (But here’s the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Pattern 18: User Account Blocking

Context: Systems where clients are primarily identified by their user accounts. In such systems, each client typically has a unique user account, which serves as the means of identification.

Problem: How can an application or service precisely terminate behaviors of specific clients associated with user accounts?

Solution: Block requests from any client belonging to a particular user or account if they exceed the rate limit.

Consequences:

[+] Precision: Only requests from specific users are blocked.

[-] Account impersonation: Abusive users can still overload the system by using multiple accounts, effectively impersonating different users to evade the rate limiting measures.

Pattern 19: API Key Revocation

Context: Systems where API keys are used for authentication and authorization of clients accessing the API. This pattern addresses the need to revoke API keys under certain circumstances.

Problem: How effectively and fairly terminate behaviors of clients authenticated using an API key?

Solution: Block requests from a particular API Key if they exceed the rate limit.

Consequences:

[+] Precision: only requests from malicious client developers are blocked.

[-] Unfairness: Legitimate users may be mistakenly blocked if the API Key of their client application is flagged for abusive behavior.

Additional considerations. Client applications driven by human users exceeding the Rate Limit could imply that the application is being scraped: automatically accessed with request traffic growing beyond what human users can be expected to generate. In this case, before blocking the client either based on their IP address, user account, or API key, a CAPTCHA challenge could be displayed to the users if they exceed the rate limit. Then block the user, the API Key, or the IP address in case the client fails to pass the CAPTCHA challenge.

In addition to Rate Limit Communication patterns, in the case where the system adopts a strategy that terminates definitively all clients of abusive behaviors, it is also necessary to explicitly notify them that they will be blocked if they exceed this limit. This information can also statically be part of Rate Limit Documentation.

Abusive behavior mitigation

Pattern 20: Request Throttling

Context: An API is exposed to various clients, and there is a need to regulate the rate of incoming requests to maintain quality of service, prevent abuse, and allocate resources efficiently.

Problem: How to control the rate of incoming requests to a web API to prevent abuse?

Solution: Instead of completely blocking requests from abusive clients, the solution is to throttle the bandwidth allocated to transmit requests or responses to specific clients. Throttling limits the rate at which data can flow between the client and the server, effectively slowing down the client’s access.

Consequences:

[+] Fairness: Prevent abusive behavior while still allowing legitimate clients to access the API.

[-] Efficiency: Not effective against abusive users who might distribute requests across multiple clients to bypass the throttling rate.

Pattern 21: Request Queuing

Context: A system where loosing clients requests will affect the compliance of the system with its intended services.

Problem: How to handle requests when the Rate Limit is exceeded, ensuring that clients do not lose their requests and maintaining fairness in processing?

Solution: When the Rate Limit is exceeded, queue the requests and process them in order when the Rate Limit is no longer exceeded.

Consequences:

[+] Reliability: Ensures that clients do not lose their requests when they are rate-limited, and they do not need to resend them when they are allowed to.

[-] Timeliness: The queued requests’ responses might differ from those that would have been sent when the requests were made.

[-] Capacity: The queued requests take up space on the server, which may run out of storage capacity in case of abusive clients.

Known uses:

Pattern 22: Rate limit with a Reset Time

Context: An API is exposed to various clients, and there is a need to regulate the rate of incoming requests to maintain quality of service, prevent abuse, and allocate resources efficiently.

Problem: How to prevent excessive requests from specific clients without definitely terminating their access to the service?

Solution: Block requests from a specific client until a predefined time has passed since the Rate Limit was exceeded, ensuring fair usage and providing clients with information about when they can resume requesting.

Consequences:

[+] Transparency and usability: Clients are aware of when they can start sending their requests again.

[-] Bursting: Abusive users can still cause a spike in traffic if they schedule all their requests to occur immediately after the Rate Limit reset time.

Server-side Rate Limit Implementation

Rate Limiter Positioning

We have identified distinct variants of the Server-Side implementation of the Rate Limit pattern, which we classified depending on the rate limiter component, its positioning, and scope. The solutions depend on the type of system architecture in which the API is situated. By identifying the relevant runtime variant, developers can effectively enforce rate limiting at the level of the system’s interfaces.

Pattern 23: Internal Rate Limiter

Context: A service provider has access to their own infrastructure and systems, allowing them to exert control over the rate of incoming requests.

Problem: How to implement a controllable rate limiter that does not rely on external services?

Solution: Rate Limit is completely implemented as part of the server-side code that intercepts clients’ requests.

Solution details: In the case of monolith service architectures, a rate limiter can be implemented as part of the backend service.

In the case of microservices architectures, the Rate Limit policy can also be implemented internally within the backend architecture. The exact placement of the rate limiter also depends on the chosen infrastructure to implement the microservices architecture:

In a service mesh architecture, the Rate Limit policy is placed in the control plane to ensure centralized and consistent enforcement of the Rate Limit. It also provides dynamic configuration capabilities, enabling easy management, updates, and adjustments without requiring changes to individual services. The goal is to employ the control plane to also gather Rate Limit and traffic-related data and have a centralized view of the Rate Limit impact on the architecture.

In this kind of architecture, different decisions can be adopted to implement rate limits internally within the backend:

Consequences:

[+] Customization: Since it is part of the provider’s code, it can be tailored to match the specific rate limits required by the application. It can also be customized to handle different types of requests differently.

[+] Integration: This can be integrated more seamlessly with the application codebase, making it easier to update as the application evolves.

[-] Increased complexity: It requires additional development effort and maintenance overhead for the application and can add complexity to the application codebase.

Known uses:

Pattern 24: External Rate Limiter

Context: The solution to rate limiting is readily available from an external system.

Problem: How to accelerate and ease the implementation of a rate-limiting solution?

Solution: Rate Limit is implemented using third-party services or library.

Solution details:

Select a suitable rate-limiting service or tool, configure the rate-limiting rules, integrate with the API, test, monitor, and fine-tune as needed.

Consequences:

[+] Speed of implementation: It can save the time of implementing the rate limiter component from scratch.

[+] Compliance: It can help ensure compliance with regulations and industry standards related to rate limiting, such as the Payment Card Industry Data Security Standard.

[-] Security: It may introduce new security risks, such as the exposure of sensitive data to external service providers or the risk of service providers being compromised.

[-] Reliability: It makes the system dependent on external factors such as network connectivity and service availability. In addition, it may introduce additional latency and overhead in the request processing.

Known uses:

A hybrid approach can increase the system’s reliability by combining an External rate limiter and an Internal rate limiter. The goal is to append additional functionalities to the internal rate limiter by selecting a suitable external rate-limiting service or tool that provides the additional rate-limiting functionalities to integrate with the internal rate limiter codebase. This approach can provide a backup mechanism in case the external rate limiter fails or experiences performance issues and reduce single points of failure in the rate-limiting process. However, developers and maintainers can face challenging integration problems occurring between the Internal rate limiter and External rate limiter

Rate Limiter Scope

Pattern 25: Global Rate Limiter

Context: A multi-service architecture where ensuring uniform rate control across all interconnected services and components is paramount for preventing system overload.

Problem: How to enforce a shared Rate Limit value for all service instances of a system?

Solution: Rate Limit is implemented using a Front Proxy to handle the overall amount of incoming traffic to a system (can be a set of composed systems).

Solution details: This approach entails identifying the operational thresholds of the system that necessitate the implementation of a global rate limiter. The Rate Limit Front Proxy can be enforced at either the application layer by restricting the number of requests or at the network layer by configuring the network equipment to regulate the flow of traffic passing through them.

Consequences:

[-] Inflexibility: It can be difficult to adjust or fine-tune to specific resource consumption rates.

Known uses:

Pattern 25: Local Rate Limiter

Context: A microservices architecture where the use of a local rate limiter is crucial for ensuring each microservice can independently manage its incoming requests.

Problem: How to define and enforce different Rate Limit values for each service instance?

Solution: Rate Limit is implemented as a part of the service mesh using Edge Proxies.

Consequences:

[+] Increased control: Local rate limiting gives more control over the rate-limiting logic, allowing for more fine-grained control and customization of the rate-limiting rules for each service instance.

[-] System complexity: Implementing local rate limiters requires additional code and configuration, which can increase the system’s complexity and potentially introduce bugs or performance issues.

Combining both Global and Local rate limiters can provide better control and flexibility over resource utilization in a system. Global rate limiters can ensure that the overall load on the system is kept within acceptable levels, preventing system overload and potential downtime. Meanwhile, local rate limiters can provide more fine-grained control over specific service instances, allowing for more efficient use of resources and improved performance.

Known uses:

Pattern Definition Approach

This section describes our approach to extracting Rate Limit pattern variants. First, we provide details about the static pattern variants. Secondly, we give an overview of the experimentation used to demonstrate these variants’ impact on performance and reliability.

Static Perspective

Representing Rate Limit pattern in OpenAPI

The current version of the OpenAPI language specification does not include predefined constructs to describe API Rate Limit values, even though OpenAPI descriptions can still include details such as the maximum number of requests an API can handle per unit of time and the time interval in which these requests can be made. There are various methods to include Rate Limit information in the OpenAPI documentation. Still, the easiest ones to detect are when using the x-* extension mechanism, which includes extensions such as x-rate-limit, as shown in the example in the listing .

paths:
  /items:
    get:
      description: Returns a list of items
      responses:
        200:
          description: Successful response
        x-rate-limit:
          limit: 1000
          interval: hour

Note that the keys attached to the x- prefix are not previously known. Thus we defined various detectors based on our observations of samples of API descriptions containing responses featuring the 429 (Too Many Requests) HTTP status code and keywords matching the regular expression:

/rate limit|rateLimit|rate-limit|ratelimiting|throttling/gi

In OpenAPI, the response header section conveys various information to the client after each request, including Rate Limit information. Although the Rate Limit values are not typically described statically in the headers schema, the presence of specific headers can indicate the existence of Rate Limit constraints for specific endpoints. For example, the header may include fields such as X-Rate-Limit-Limit to describe the maximum number of requests allowed per unit of time and the time interval in which these requests can be made, and X-Rate-Limit-Remaining to indicate the number of remaining possible requests to make. Thus, analyzing the header section of the OpenAPI description can provide valuable insights into the API’s Rate Limit practices and help identify which endpoints are subject to rate-limiting constraints, e.g.:

responses:
    200:
        description: OK
        headers:
          X-Rate-Limit-Limit:
            description: The maximum number of
            requests per minute
            type: integer
          X-Rate-Limit-Remaining:
            description: The number of remaining
            requests in the current minute
            type: integer
   429:
        description: Too Many Requests
        content:
           application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                  retryAfter:
                    type: integer

The OpenAPI specification allows developers to attach descriptive information to each component of their API through the use of description fields. These fields are intended to be written in natural language and provide valuable information about the API, including its rate-limiting strategy. However, the lack of formatting conventions for writing these descriptions poses a challenge for systematic analysis. The information within these description fields can be difficult to extract and analyze, as it often lacks structure and consistency.

Rate Limit can also be used as a security scheme in OpenAPI by defining a security definition in the securityDefinitions section in the case of Swagger 2.x, and in the components/securitySchemes in the case of the OpenAPI 3.x specification, and then including a reference to the security definition in the security section of an operation.

When Rate Limit is used as a security scheme in Swagger 2.x as shown below:

paths:
  /items:
    get:
      description: Returns a list of items
      responses:
        200:
          description: Successful response

securityDefinitions:
  rateLimit:
    type: apiKey
    in: header
    name: X-Rate-Limit
    description: Maximum number of requests
    allowed in a given time frame

When Rate Limit is used as a security scheme in OpenAPI 3.x, as shown bellow:

paths:
  /items:
    get:
      description: Returns a list of items
      responses:
        200:
          description: Successful response

components:
  securitySchemes:
    RateLimit:
      type: apiKey
      name: X-Rate-Limit
      in: header

Based on these Rate Limit information locations in OpenAPI descriptions, we designed detectors to analyze the use of this pattern from various perspectives. These detectors were used to search a large collection of APIs to identify instances of the Rate Limit pattern.

Once the APIs that employed rate limiting were identified, we conducted a comprehensive analysis of their specifications to understand the strategies used by developers. This included evaluating the specific parameters and configurations for rate limiting, examining the response codes and messages returned, and analyzing the methods and paths subject to rate limiting.

API Case Studies

In addition to the machine-readable documentation analysis in this study, we performed a manual analysis of well-known APIs providers, such as: eBay (36), Shopify (4), New York Times (10), GitHub (2), LinkedIn (1), Twilio (1), Stripe (1), Trello (1), Flickr (1).

The goal behind analyzing both APIs from the same provider and distinct providers is to see how the rate-limiting strategy is defined across different providers or within the same provider’s APIs. The APIs we selected belong to different domains.

Runtime perspective: Experimentation Overview

In addition to these API studies based on static analysis, we have studied metrics and indicators based on runtime monitoring of APIs. We mainly investigated API patterns that impact properties observable at runtime. One study focuses on the impact of the API Rate Limit pattern on the reliability properties of API clients through an analytical model that considers specific workload configurations and rate limits and predicts success and failure rates. We used the observability and monitoring tools, Grafana12 and Prometheus13, which are already integrated with Istio to calculate those success and failure rates. In another study, we studied the performance impact of the API Request Bundling pattern by using a regression model and multivariate regression analysis on a microservice-based open-source business application with realistic workload scenarios. The regression model predicts the total round trip time of a request based on server-side parameters like the type of the method and the number of calls, using and not using Request Bundle. In those studies and others, we have experimented with different perspectives when implementing Rate Limit and related patterns. As a result, we derived different variations of the use of those patterns with regard to their positioning and scope.

In the Rate Limit empirical study, we wanted to evaluate its impact on the reliability of microservice-based applications from an API Client perspective. For that purpose, we developed an analytical model based on client workload parameters to predict the success and failure rates. We developed workload benchmark scenarios based on the typical interactions extracted in a previous study. We set up the experiment simulating 20 different configurations in two environments: private cloud and Google cloud. We repeated the experiment more than 50 times to validate a proposed analytical model that measures the impact of Rate Limit on the reliability of APIs. Many of the pattern variants extracted in the previous section were used to evaluate the impact of Rate Limit on the performance and reliability of such an infrastructure by building up a robust prediction model.

In conclusion

The patterns we have identified are related to the documentation and communication of API Rate Limits, and the metrics used to statically or dynamically define their values. Other variants are related to the level of granularity at which a Rate Limit strategy can be applied (i.e., how clients are identified and resources scoped), and implementation-related variants about the placement of a rate limiter (internal vs. external, local vs. global). We finally distinguish how to mitigate or stop abusive behavior as a reaction to Rate Limit violations: e.g., by blacklisting or throttling clients, temporarily or permanently.

Our study enhances understanding of the current state of web API Rate Limit pattern as it provides valuable insights for web API designers, developers, and researchers.

  1. https://github.com/express-rate-limit/express-rate-limit 

  2. https://github.com/koajs/ratelimit 

  3. https://docs.github.com/en/graphql/overview/resource-limitations 

  4. https://developers.google.com/analytics/ 

  5. https://tyk.io/deployment-api-gateway/ 

  6. https://www.envoyproxy.io 

  7. https://kuma.io 

  8. https://redis.com/ 

  9. https://www.krakend.io/ 

  10. https://cloud.spring.io/spring-cloud-gateway 

  11. https://konghq.com/ 

  12. https://grafana.com 

  13. https://prometheus.io