Mar 01, 2023 - 4 min read

Rate Limiting for a Single Process

The idea of rate limiting for a single process is that there is one single process running some application code and it needs to prevent something from happening too often. A concrete example of this is a single binary serving HTTP requests. These requests can be grouped on the sender's IP address, and it may be advantageous to put some back pressure on requests if they are coming too quickly from a particular sender. Large-scale systems are rarely this simple and will commonly have multiple stateless servers handling requests which complicates rate limiting, but for now we'll stick to the single server example.

We'll end up with something satisfying this trait, and there are several ways to achieve it:

trait RateLimiter {
    // Create a new RateLimiter that will allow `limit` requests per `window` time period.
    fn new(window: time::Duration, limit: usize) -> Self;
    // If a request is allowed or not.
    fn allowed(&mut self) -> bool;
}

Fixed Window

A fixed-window rate limiter keeps a count of requests received within the last window period. If the count gets too high, it will start denying requests. The start time of a window begins when the first request arrives, and it remains "fixed" there until enough time has passed for a new window to be initialize and the count reset. This simple strategy has a major drawback: Right when the window ticks over and a new window is created, the entire allowed count of requests could be received and allowed nearly simultaneously before starting to be denied again. For a sustained request rate significantly greater than the allowed rate, this will result in bursts of requests being allowed every "window" period of time. Another drawback is a scenario where most of the requests for the previous window come right at the end of the window and a similarly large number of requests come right at the beginning of the next window, essentially allowing through twice as many requests in a short time as should be allowed.

struct FixedWindow {
    window_start: Instant,
    count: usize,
    window: time::Duration,
    limit: usize,
}

impl RateLimiter for FixedWindow {
    fn new(window: time::Duration, limit: usize) -> Self {
        FixedWindow {
            window_start: Instant::now(),
            count: 0,
            window,
            limit,
        }
    }

    fn allowed(&mut self) -> bool {
        let now = Instant::now();

        if now.duration_since(self.window_start) > self.window {
            self.window_start = now;
            self.count = 0;
        };

        if self.count >= self.limit {
            return false;
        };

        self.count = self.count + 1;
        true
    }
}

Moving Window

The next logical step is to allow the window to move. We can remember the rate of requests from the previous window and use that during our next window to estimate what the moving rate of requests is. If that moving rate is too high the request will be denied. This can be a pretty rough approximation, but already has an obvious advantage over the fixed window strategy: If the previous window hit its request limit, new requests during the next window will only be allowed to come in right at the allowable rate. This is much closer to ideal behavior. There can still be bursts of requests right when the limiter is initialized, or if so much time has passed between requests that the previous window has to be "forgotten".

struct MovingWindow {
    prev_start: Instant,
    prev_count: usize,
    this_start: Instant,
    this_count: usize,
    window: time::Duration,
    limit: usize,
}

impl RateLimiter for MovingWindow {
    fn new(window: time::Duration, limit: usize) -> Self {
        let now = Instant::now();

        MovingWindow {
            prev_start: now,
            prev_count: 0,
            this_start: now,
            this_count: 0,
            window,
            limit,
        }
    }

    fn allowed(&mut self) -> bool {
        let now = Instant::now();

        // Cycle the current window values into the previous window repeatedly until we "catch up"
        // to the present time. In cases where more than two windows duration have passed since the
        // start of this window period this will cycle through twice and essentially reset the
        // counter.
        while now.duration_since(self.this_start) > self.window {
            self.prev_start = self.this_start;
            self.prev_count = self.this_count;
            self.this_start = self.prev_start + self.window;
            self.this_count = 0;
        }

        let this_period = now.duration_since(self.this_start);
        let last_period = self.window - this_period;

        let hits_from_last_period =
            (self.prev_count * last_period.as_micros() as usize) / self.window.as_micros() as usize;

        if self.this_count + hits_from_last_period >= self.limit {
            return false;
        }

        self.this_count = self.this_count + 1;

        true
    }
}

Token Bucket

A token bucket rate limiter is able to provide consistent rate limiting while being a little more precise than the moving window limiter discussed above. Tokens are accumulated at a rate equal to the allowable request rate (limit per window in this and the previous examples) and each allowed request deducts a token. If there are no tokens available the request is denied. Tokens are not allowed to accumulate indefinitely and have a ceiling of limit to prevent excessively large bursts of requests from being allowed.

struct TokenBucket {
    tokens: usize,
    last_hit: Instant,
    window: time::Duration,
    limit: usize,
}

impl TokenBucket {
    fn new_tokens(&self, now: Instant) -> usize {
        // Calculate the number of new tokens that should be accumlated based on the provided time.
        // This is the time elapsed since the last token calculation times the rate of token
        // accumulation.
        let elapsed = now.duration_since(self.last_hit);

        // Rate as tokens per microsecond, which will probably be a very small number.
        let rate = self.limit as f64 / self.window.as_micros() as f64;

        // Microseconds elapsed * tokens per microsecond yields tokens to output.
        (elapsed.as_micros() as f64 * rate) as usize
    }
}

impl RateLimiter for TokenBucket {
    fn new(window: time::Duration, limit: usize) -> Self {
        TokenBucket {
            tokens: 0,
            last_hit: Instant::now(),
            window,
            limit,
        }
    }

    fn allowed(&mut self) -> bool {
        let now = Instant::now();

        // Accumulate tokens at the rate of limit / window (tokens per time)
        let new_tokens = self.new_tokens(now);

        // Only adjust the last hit time if at least one token was accumulated.
        if new_tokens > 0 {
            // Limit tokens to self.limit
            self.tokens = std::cmp::min(self.tokens + new_tokens as usize, self.limit);
            self.last_hit = now; // Based on accumulation of tokens
        }

        if self.tokens == 0 {
            return false;
        }

        self.tokens = self.tokens - 1;

        true
    }
}

Tiered Rate Limiters

Even the moving window and token bucket limiters provide only coarse control over rates of requests. It is often desirable to allow a modest rate of requests over a long time period while also preventing a large burst of requests to use up that entire allotment in a much shorter amount of time. For example, a limit of 60 requests per hour being hit with one request coming in every minute compared to 60 requests coming in within a single random second every hour. These scenarios cannot be differentiated between with the limiters we've put together so far.

A way to solve this is to use multiple rate limiters. One rate limiter has the longer-term limit, with the second limiter providing a safeguard for a shorter time period. A simple composition of the limiters we've already built would work for this, perhaps by initializing the long-term limiter as new(time::Duration::from_secs(3600), 60) (60 requests per hour) and a short-term limiter as new(time::Duration::from_secs(5), 10) (10 requests per 5 seconds, or about 2 requests per second).