All these intricacies can be avoided when it is possible to use a monotonically increasing sequence value as the idempotency key. In that case, the consumer does not need to store all the keys it ever has processed (or a reasonably sized subset thereof). It only needs to store a single value, the one of the latest message which it has processed. If it receives a message with the same or a lower idempotency key, that message must be a duplicate and can be ignored. When receiving messages from a partitioned source, such as a Kafka topic with multiple partitions, or from multiple independent producers (e.g., different clients of a REST API, each using their own separate sequence), then the latest key value per partition must be stored.
Monotonically increasing idempotency keys are a great improvement from the perspective of the message consumer. On the flipside, they may make things more complicated for producers: creating monotonically increasing sequence values isn’t without its own challenges. It is trivial if producers are single-threaded, producing one message at a time. In that case, a database sequence, or even a simple in-memory counter, can be used for creating the idempotency keys. Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy. That way, disk IO can be reduced. From a consumer perspective, Kafka partition offsets fall into that bucket, as they can be considered a monotonically increasing idempotency key for the messages consumed from a given partition.
Things get more complicated when the producer is subject to multiple concurrent requests at once, for instance a REST service with multiple request workers, perhaps even scaled out to multiple compute nodes in a cluster. To ensure monotonicity, retrieval of the idempotency key and emitting a message with that key must happen atomically, uninterrupted by other worker threads. Otherwise, you may end up in a situation where thread A fetches sequence value 100, thread B fetches sequence value 101, B emits a message with idempotency key 101, and then A emits a message with idempotency key 100\. A consumer would then, incorrectly, discard A’s message as a duplicate.
For most cases, ensuring this level of atomicity will impose a severe bottleneck, essentially serializing all requests of the producer system, regardless of how many worker threads or service instances you deploy. Note that if you really wanted to go down that route, solely using a database sequence for producing the idempotency key will not work. Instead, you’d have to use a mechanism such as Postgres advisory locks in order to guarantee monotonicity of idempotency keys in the outgoing messages.