Chapter 78: Idempotency keys and exactly-once semantics

The previous chapters built the front half of the request lifecycle: gateway, auth, identity, rate limits, backpressure. This chapter deals with what happens when something goes wrong in flight and the client — or the gateway, or the middleware — retries. A retry is supposed to look identical to the original from the server’s point of view; when it does not, duplicate charges happen, double-sent emails happen, and two copies of the same record land in the database. The defense against this is idempotency, and it is one of the most asked-about topics in senior API design interviews.

The chapter starts with the theorem that makes the problem hard (two generals), explains why “exactly once” is a marketing term, and then gets into the practical answer: the idempotency key pattern, storage models, retention windows, replay semantics, and the failure modes that bite in production.

Outline:

The two generals problem.
Why exactly-once does not exist.
Idempotency as the practical answer.
The Idempotency-Key HTTP pattern.
Storage models: Redis, Postgres, both.
The lifecycle of an idempotency record.
Request fingerprinting and mismatch detection.
Replay safety of side effects.
Retention windows and cleanup.
Failure modes and corner cases.
Exactly-once in stream processing.
The mental model.

78.1 The two generals problem

Two generals need to coordinate an attack. They are on opposite hills, with the enemy in the valley between them. The only way to communicate is messengers who must cross the valley — and any messenger might be captured. General A sends “attack at dawn.” The messenger may or may not arrive. If A doesn’t get a confirmation, A doesn’t know whether to attack. So B sends a confirmation. But B doesn’t know if the confirmation arrived, so B doesn’t know whether A will attack. So A must confirm the confirmation. And so on.

The formal result (Akkoyunlu et al., 1975): it is impossible for two parties to reach agreement over an unreliable channel with any finite number of messages. Every additional round of confirmation just pushes the uncertainty down one level; it never eliminates it. The last message in any finite exchange is always unconfirmed, and the sender can’t be sure the receiver got it.

The two generals result is why exactly-once delivery is impossible on an unreliable network; idempotency is the practical substitute — make duplicate delivery harmless rather than trying to prevent it.

This is not a theoretical curiosity. It is the reason you cannot build “exactly once” on any network that drops messages. A client sends a payment request. The server processes it. The server sends a response. The response is lost. The client has no way to tell whether the server processed the request or dropped it — from the client’s perspective, the two are indistinguishable. If the client retries, the payment might be processed twice. If the client does not retry, the payment might never be processed. There is no protocol that fixes this with just pairs of messages over a lossy channel.

The two generals problem is the bedrock under everything in this chapter. Every retry, every at-least-once delivery, every distributed transaction lives in the shadow of this impossibility. The question is not “how do we eliminate the uncertainty” — that is proven impossible — but “how do we make it safe.”

78.2 Why exactly-once does not exist

“Exactly once” is sold by many messaging systems (Kafka, Pulsar, some SQS modes) as a feature. Be skeptical. The phrase means different things in different systems and never what it sounds like.

At-most-once delivery is easy: send the message, don’t retry. If it drops, it drops. No duplicates, no guarantees.

At-least-once delivery is also easy: retry until you get an ack. Guarantees the message arrives, but may deliver duplicates if the ack is lost.

Exactly-once delivery — in the strict sense of “the message is processed by the receiver one and only one time, no duplicates, no losses” — would require the two generals problem to have a solution. It does not. What “exactly once” systems actually provide is one of:

Idempotent receivers. The sender retries (at-least-once), and the receiver deduplicates. The effect is once-only at the receiver, even though multiple copies may arrive on the wire. This is what Kafka calls “exactly-once semantics” with the transactional producer and idempotent consumer combined. It is really “at-least-once plus deduplication.”
Transactional output. The receiver’s processing and the output (write to a database, send to another topic) happen atomically. If the processing crashes mid-way, the output is rolled back. The next attempt reprocesses from the beginning. The observable effect is that each input produces exactly one output, even if the processing ran partially multiple times. This is exactly-once for the effect, not for the message.
Deduplication on both sides. Sender attaches a sequence number; receiver tracks seen sequence numbers and ignores duplicates. This gives once-at-receiver semantics even with at-least-once delivery.

None of these satisfy the strict definition. All of them provide “effectively once” at the cost of some state (dedup tables, sequence numbers, transactional logs) and some complexity. The right mental model is: the wire is at-least-once; the application layer makes effects idempotent.

When an interviewer asks about exactly-once, the right answer is: “Strict exactly-once doesn’t exist because of the two generals problem. What we actually do is at-least-once delivery plus idempotent processing. The idempotency key pattern is how HTTP APIs do this; the transactional producer + consumer is how Kafka does this.” That is the answer they are listening for.

78.3 Idempotency as the practical answer

An operation is idempotent if applying it twice produces the same result as applying it once. Mathematically: f(f(x)) = f(x). For an HTTP endpoint, it means: calling POST /charges with the same payload twice charges the user once, not twice.

Some HTTP methods are idempotent by specification: GET, PUT, DELETE, HEAD, OPTIONS. GET reads; PUT writes an absolute state (not “increment by 1”); DELETE removes (deleting an already-deleted resource is a no-op, not an error). POST is not idempotent by default — it creates new resources or triggers side effects, so retrying a POST creates duplicates.

But most API calls that matter (creating a payment, sending a message, triggering a job) are logically POSTs because they create something. The idempotency problem is specifically the “POST retried under uncertainty” problem.

The idiomatic fix is the idempotency key. The client generates a unique identifier (usually a UUIDv4) for each logical operation, attaches it to the request, and retries with the same key if the request fails or times out. The server treats the key as a “did I already do this?” lookup:

First time the server sees the key: do the work, record the result keyed by the idempotency key, return.
Subsequent times the server sees the key: return the stored result without re-doing the work.

The retry may happen minutes later or from a different client instance. As long as the key is the same, the server returns the same answer. No duplicate side effects. The client can safely retry as many times as needed under at-least-once semantics, and the effect at the server is once.

This pattern was popularized by Stripe and is now the de facto standard for any API that mutates billable or irreversible state. GitHub, Shopify, Square, most fintech APIs — all implement it. It is not in any HTTP RFC (there is a draft, draft-ietf-httpapi-idempotency-key-header, but it is not finalized), but the convention is strong enough to be de facto standard: Idempotency-Key: <client-generated-uuid> in the request header.

78.4 The Idempotency-Key HTTP pattern

The client generates a UUID for each logical operation:

POST /v1/charges
Authorization: Bearer <token>
Idempotency-Key: 6f2c8b0a-3d4f-4d0a-9b6f-1234567890ab
Content-Type: application/json

{"amount": 1000, "currency": "usd", "customer": "cus_42"}

If the client’s first attempt fails (timeout, 500, network drop), it retries with the exact same key:

POST /v1/charges
Idempotency-Key: 6f2c8b0a-3d4f-4d0a-9b6f-1234567890ab    ← same key
...same body...

The server’s handling logic:

def handle_post(request):
    key = request.headers.get("Idempotency-Key")
    if not key:
        return process(request)  # no key, no dedup
    
    record = idempotency_store.get(key)
    if record is not None:
        if record.status == "completed":
            # Return the stored response
            return record.response
        elif record.status == "in_progress":
            # Another request with this key is being handled
            # Wait a bit, or return 409 Conflict
            return wait_for_completion(key, timeout=5)
        # "failed" — let it retry
    
    # First time, or previous attempt failed
    idempotency_store.put(key, status="in_progress", ttl=24h)
    try:
        response = process(request)
        idempotency_store.put(key, status="completed", response=response, ttl=24h)
        return response
    except Exception as e:
        idempotency_store.put(key, status="failed", error=str(e), ttl=24h)
        raise

The critical bits:

The key is stored before processing begins. If the handler crashes mid-way, a retry sees status=in_progress or status=failed and can decide what to do. Without the upfront record, a crashed handler leaves no trace, and a retry creates a duplicate.
The response is stored on success. Subsequent requests return the exact same response body and status. This matters: a client expecting “here is your charge ID” should get the same charge ID on retry, not a fresh one.
A TTL on the record. The record does not live forever. 24 hours is a common default — long enough to cover any retry window, short enough to keep the store from growing unbounded.
In-progress handling. Two copies of a request with the same key arrive nearly simultaneously (the client retried quickly). The second one sees in_progress and must either wait or return a synchronization error. Waiting is nicer but more complex; 409 is simpler but uglier. Stripe’s pattern is to return a special “concurrent requests with the same key” error.
Error storage. If the first attempt returned a 500, the second attempt should probably retry (maybe the server was having a bad minute). If the first attempt returned a 400 (bad request), the second attempt should return the same 400 without re-processing. The rule: store the response and replay it for anything the client-initiated; only allow retry for server errors.

The pattern is simple. The tricky parts are storage and edge cases, which the next sections unpack.

78.5 Storage models: Redis, Postgres, both

Where do idempotency records live? The choice determines latency, durability, and operational complexity.

Redis. Fast, in-memory, atomic via SET NX (set if not exists). Typical implementation:

# Atomically claim the key or detect that it's taken.
claimed = redis.set(f"idem:{key}", "in_progress", nx=True, ex=86400)
if not claimed:
    # Key was already claimed; fetch the existing record.
    record = json.loads(redis.get(f"idem:{key}"))
    return replay(record)

# We own the key now; proceed.

Pros: sub-millisecond latency, cheap. SET NX is atomic — no race between two concurrent requests with the same key, only one will succeed.

Cons: Redis is typically not as durable as a real database. A Redis failover can lose recent writes. For a payment API, losing an idempotency record means a duplicate charge — bad. Mitigation: use Redis with AOF + replication, or use Redis only as a cache with the durable copy in a database.

Postgres (or any relational DB). A table with the key as primary key and the response as a column. Insertion uses ON CONFLICT DO NOTHING or SELECT ... FOR UPDATE for atomic claim.

INSERT INTO idempotency_records (key, status, created_at, ttl)
VALUES ($1, 'in_progress', now(), now() + interval '24 hours')
ON CONFLICT (key) DO NOTHING
RETURNING id;

If the RETURNING id is empty, the row already exists; the request is a duplicate. Fetch the existing row and replay.

Pros: durable. Survives restarts, failovers, partial failures. The idempotency record is in the same database as the effects it guards (typically), so a single transaction can insert the record and perform the side effect atomically. This is the strongest form of the pattern.

Cons: slower than Redis (a few milliseconds per check). Higher load on the database; watch out for the table becoming a hotspot under high throughput.

Both. The production-grade pattern: Redis as a fast cache, Postgres as the durable record. Check Redis first (sub-ms). If Redis says the record exists, return the cached result. If Redis says no, check Postgres (a few ms). If Postgres has the record, copy it back to Redis and return. If neither has it, claim in Postgres first, then Redis, then process.

The both-pattern adds complexity but gives you the best of both worlds: cheap happy path, durable guarantee. Stripe-style APIs use this. For most teams, starting with Postgres-only is fine and simpler; upgrade to Redis+Postgres only when the Postgres becomes a bottleneck.

One nuance: if your idempotency store is separate from your application database, you cannot atomically update both. A crash between “record stored” and “effect applied” leaves inconsistent state. Either:

Put the idempotency record in the same database as the effects, inside the same transaction. (Best.)
Use a saga or outbox pattern to reconcile.
Accept the small window and alert on anomalies.

Production systems usually pick option 1 for APIs where atomicity matters (payments, messages) and option 3 for less critical ones (non-refundable analytics events).

78.6 The lifecycle of an idempotency record

A complete record has four states:

In progress. The handler is currently processing. No response yet. Other requests with the same key must wait or error.
Completed. The handler finished successfully. The response is stored. Subsequent requests replay.
Failed. The handler finished with an error the client should see (validation error, 4xx). Subsequent requests replay the error.
Server-failed (optional). The handler crashed or returned 5xx. The client should be allowed to retry. Either delete the record, leave it in a special state, or let the TTL expire.

The state machine:

new → in_progress → completed
              ↘ → failed     (returned to client)
              ↘ → server_failed (allow retry or delete)

Storing the response atomically with the handler's completion means any retry — hours later, from any replica — gets the exact same response body without re-running the side effects.

Transitions must be atomic. The whole rest of the handler’s logic runs after the in_progress state is established and must eventually transition to completed or failed. If the handler’s process dies between those states, the record sits in in_progress until the TTL expires — which is why retries need to decide whether to wait or return a conflict error.

Some implementations use a timeout on in_progress: if a record has been in progress longer than N seconds, treat it as stale and allow a retry to claim it. This handles the crashed-handler case without leaving records wedged. The timeout must be longer than the longest legitimate request duration (e.g., 5× the p99 latency) to avoid stealing requests from slow handlers.

78.7 Request fingerprinting and mismatch detection

A subtle failure mode: the client reuses an idempotency key but sends a different body. This is usually a client bug, but if the server blindly replays the first result, the client never finds out about the bug.

The fix is a request fingerprint: hash the relevant parts of the request body and store the hash with the idempotency record. On a replay, verify that the new fingerprint matches the stored one. If it does not, return 422 Unprocessable Entity with an error like “Idempotency key reuse with different parameters.” The client is forced to fix the bug.

def fingerprint(request):
    # Hash the stable parts of the request: body, route, important headers.
    h = hashlib.sha256()
    h.update(request.method.encode())
    h.update(request.path.encode())
    h.update(json.dumps(request.body, sort_keys=True).encode())
    # Optionally include specific headers like Tenant-Id
    h.update(request.headers.get("tenant-id", "").encode())
    return h.hexdigest()

Stripe formalizes this: the first request with a given key is accepted and its fingerprint stored. Subsequent requests with the same key must have the same fingerprint or are rejected. This catches client bugs at the protocol layer and is a best practice.

What counts as the fingerprint is a design choice. Include:

HTTP method and path.
The request body (JSON, form, whatever).
Any header that changes the meaning (tenant, actor, audience).

Exclude:

Timing-sensitive headers (Date, Authorization, User-Agent).
Trace IDs and other per-request noise.

Get this wrong one way (too inclusive), and legitimate retries are rejected. Get it wrong the other way (too exclusive), and the feature silently fails. Test with paired positive and negative cases: same body → replay; different body → reject.

78.8 Replay safety of side effects

Idempotency keys protect against duplicate handler invocations. They do not automatically protect against duplicate side effects if the side effects are not atomic with the idempotency record write.

Concretely: the handler creates a payment (external API call), then stores the idempotency record. Between those two steps, the process crashes. The payment was created; the record was not. A retry sees no record, creates a second payment. Two payments for the same request.

The fixes:

Transactional boundary. Run the idempotency record write and the side effect inside the same database transaction. If the DB is Postgres and the side effect is a row insert, this works cleanly — commit the transaction with both the record and the row, and either both happen or neither does.

sequenceDiagram
  participant Client
  participant Server
  participant DB
  Client->>Server: POST /charges with Idempotency-Key abc123
  Server->>DB: BEGIN then INSERT idempotency_records then INSERT charges then COMMIT
  DB-->>Server: ok
  Server-->>Client: 200 charge_id
  Note over Client,Server: client retries after network drop
  Client->>Server: POST /charges with Idempotency-Key abc123
  Server->>DB: SELECT idempotency_records WHERE key=abc123
  DB-->>Server: status=completed response=200
  Server-->>Client: 200 charge_id replayed no new charge

The idempotency record and the side effect commit in the same transaction — a crash between them rolls both back, leaving a clean state for the next retry.

Outbox pattern. For side effects that cannot be transactional (external API calls, publishing to Kafka), use an outbox table: inside the request transaction, write the intended side effect to an outbox table along with the idempotency record. After commit, a separate worker reads the outbox and performs the side effect. If the worker crashes, it retries from the outbox — but since the outbox entry is keyed by idempotency key, duplicate worker runs do not produce duplicate effects (assuming the external API is itself idempotent).

Nested idempotency. When calling an external API, pass a deterministic idempotency key derived from the caller’s idempotency key. That way, retries cascade all the way down: the client’s retry causes the handler’s retry, which causes the external call’s retry, and the external service’s idempotency handling prevents duplication. This is the most robust pattern and is how production fintech systems work.

Compensating transactions. If a side effect happens and cannot be undone, build a reconciliation process that detects duplicates after the fact and refunds or cancels them. This is what you do when idempotency was not built in from the start; it is expensive and you want to avoid it.

The replay-safety story is where idempotency gets genuinely hard. The single-layer idempotency key pattern solves the “accidentally run the handler twice” problem. Propagating that guarantee down to external services requires every layer to support idempotency. For internal services you control, make this a design standard. For external services that do not, the outbox + reconciliation pattern is the fallback.

78.9 Retention windows and cleanup

How long do idempotency records live?

Too short: a client retrying after 20 minutes sees no record, creates a duplicate. The retention window must cover all realistic retry scenarios. Typical client retry loops: up to ~1 hour. Typical workflow retries (cron jobs, DLQ replays): up to 24 hours. Manual replays by engineers during incident recovery: sometimes days.

Too long: the store grows without bound. Millions of records, eventually tens of millions, slowing down queries and consuming memory.

The standard compromise: 24 hours. Long enough to cover most automatic retries, short enough to keep the store manageable. Stripe uses 24 hours. For particularly critical operations (financial settlements), 7 days is safer. For non-critical operations (analytics events), an hour may be enough.

Cleanup strategies:

Redis TTL. Set the TTL at write time; Redis handles expiration automatically. Free.
Postgres TTL. No automatic TTL; run a periodic job (DELETE FROM idempotency_records WHERE ttl < now()). Schedule it for off-peak hours. Use a partitioned table (partition by date) so old partitions can be dropped atomically instead of row-by-row deletes. This is important at high volumes.
Soft delete then hard delete. Mark records expired first, hard-delete later. Useful if you want an “idempotency hint” window where a late retry sees the record as expired and is told “this key was processed but is now too old — do not retry” (410 Gone).

The retention window is a parameter of your system; document it clearly in the API docs. Clients need to know how long they can safely retry before needing to generate a new key. Stripe documents theirs explicitly.

78.10 Failure modes and corner cases

A tour of the edges.

Client generates non-unique keys. A buggy client uses Date.now() or an auto-increment that resets on restart. Duplicate keys from different logical operations get conflated; later requests return the earlier one’s response. Fix: fingerprint mismatch detection catches this.

Key from a compromised account. An attacker steals a user’s credentials and replays a legitimate request with the same idempotency key and body. The server sees a valid replay and returns the cached response. Not a vulnerability — the attacker gets the same response the legitimate user got — but the rate-limiting story gets complicated. A replay still consumes a rate-limit slot; some implementations skip it. Decide deliberately.

Partial responses for streaming endpoints. The first request returns a stream; the connection drops mid-stream. The client retries. What should the server do — replay the full stream, or start fresh? There is no clean answer. Most streaming APIs do not support idempotency keys for this reason; clients handle stream resumption separately.

Side-effect bleed on retries across TTL expiration. The client retries after 25 hours; the record has expired; the server processes the request fresh, creating a duplicate. Fix: return 410 Gone for expired keys and instruct the client to use a new key. Or extend the TTL.

Idempotency key in the URL. POST /charges?idem=xyz. Logged, cached, reflected in browser history. Always use a header, not a URL parameter.

Large request bodies. The fingerprint includes the body; hashing a 10 MB body is slow. For very large payloads, hash a summary (content hash of the file in object storage) rather than the full body.

Fingerprint skew. The client includes a timestamp field in the body. Every retry has a different timestamp. Fingerprint mismatch rejects the retry. Fix: exclude known-volatile fields from the fingerprint, or normalize the body before hashing.

Two services with the same idempotency store. Two different APIs both hash their keys into a shared Redis. Key collisions across services. Fix: namespace keys: svc_payments:key, svc_email:key.

In-progress deadlock. Two clients send requests with the same key at nearly the same time. The first one marks it in progress. The second one waits. The first one crashes without updating the record. The second one waits forever. Fix: in-progress state has its own short timeout; if it hasn’t transitioned in 30 s, allow a new attempt to take over.

Replay after data model changes. The response for an operation was cached 24 hours ago. Since then, the response schema has changed. The replay returns stale data. Fix: version the stored responses; if the stored version does not match the current handler, treat it as expired.

Race between cache and DB. Redis says no record, DB says yes. Fix: on cache miss, check DB; on any mismatch, trust the DB. Cache is advisory.

78.11 Exactly-once in stream processing

A quick detour: stream processing systems (Flink, Kafka Streams, Spark Streaming) also market “exactly once.” The mechanism is different from the HTTP pattern but uses the same insight.

Kafka’s exactly-once semantics (EOS). The producer is “idempotent” — it attaches a producer ID and sequence number, and the broker deduplicates based on them. The producer also participates in transactions: writing to multiple partitions and committing offsets in a consumed topic all happen atomically. If a consumer reads from topic A and writes to topic B, it can do so transactionally: read, process, write, commit becomes one atomic operation. Either everything happens, or nothing happens and the next poll re-reads from A.

Combined with idempotent consumers (application-level dedup), the end-to-end effect is “each input message produces exactly one output message.” Kafka calls this “exactly-once semantics,” but notice: it still relies on at-least-once underneath, plus deduplication, plus transactional commits. It is the same trick as the HTTP idempotency pattern, applied to streams.

Flink’s exactly-once. State checkpoints are consistent across the cluster via Chandy-Lamport snapshots. On failure, the job rewinds to the last checkpoint and replays from there. The outputs go through a two-phase commit to external sinks (Kafka, DBs), so rolled-back work is never observed by downstream consumers. Again: at-least-once processing plus transactional output.

The lesson is the same: exactly-once requires either idempotency or transactional boundaries. There is no protocol trick that makes messages deliver exactly once on an unreliable channel. What production systems do is make the effects idempotent or transactional, and then tolerate duplicate deliveries on the wire. The HTTP idempotency key pattern is the same idea; it is just applied per-request instead of per-stream.

78.12 The mental model

Eight points to take into the next chapter:

The two generals problem rules out strict exactly-once over any unreliable channel. Any system advertising it is really at-least-once plus deduplication.
Idempotency is the practical substitute. Making operations idempotent lets retries be safe, which is the property clients actually need.
The idempotency key pattern: client sends Idempotency-Key: <uuid>; server stores key, fingerprint, and response; replays on retry.
Storage is Redis, Postgres, or both. Postgres is durable; Redis is fast; production often uses both. The record must be atomic with the side effect.
Fingerprint the request. Catch bugs where clients reuse keys with different bodies; return 422 on mismatch.
Replay safety requires care with side effects. Transactional boundaries, outbox patterns, or nested idempotency keys propagate the guarantee down the stack.
Retention windows are 24 h by default. Document them; expired keys return 410.
Stream processing systems solve the same problem differently — transactional producers, checkpointed state, two-phase commits — but the underlying insight is identical: at-least-once plus idempotent effects.

The next chapter continues the request lifecycle: retries, circuit breakers, timeouts, and the rest of the reliability-in-the-small toolkit.

Read it yourself

Akkoyunlu, Ekanadham, and Huber, “Some constraints and tradeoffs in the design of network communications” (1975). The original two generals proof.
Stripe’s API documentation on idempotent requests — the canonical public description of the pattern.
Brandur Leach, “Designing robust and predictable APIs with idempotency” — a deep write-up by a Stripe engineer.
Confluent’s blog on Kafka exactly-once semantics. Explains the transactional producer and why “exactly-once” is really idempotency under the hood.
Google’s “Exactly-once Is NOT Exactly the Same” post on the Dataflow semantics.
IETF draft-ietf-httpapi-idempotency-key-header — the in-progress standard for the header.

Practice

Explain the two generals problem in your own words. Why does adding more messages not help?
Why is POST not idempotent by default while PUT is? Give an example of a POST that would become a PUT under a different API design.
Sketch pseudocode for an idempotency check against Redis. Include the atomic claim and the in-progress handling.
A client reuses the same idempotency key with a different body. What should the server return, and why?
Explain why a payment API should store the idempotency record in the same transaction as the payment row. What breaks if they are in different databases?
How does Kafka’s “exactly-once” work? Which part is at-least-once and which part is the idempotency?
Stretch: Implement an idempotency middleware for a small HTTP service. Support fingerprinting, in-progress detection with timeout, and 24-hour TTL via Redis. Write a test that retries the same request 100 times and verifies the handler ran once.