幂等键用于精确一次处理
Idempotency Keys for Exactly-Once Processing

原始链接: https://www.morling.dev/blog/on-idempotency-keys/

使用单调递增的序列值作为幂等性键可以简化消费者的消息处理。消费者无需存储所有已处理的键,只需跟踪*最新*键,丢弃任何键较低或相等的消息作为重复消息。这在使用像Kafka这样的分区源时尤其高效。 然而,为生产者生成这些键可能具有挑战性。单线程生产者可以轻松利用数据库序列或计数器。并发生产者会带来复杂性;确保单调性需要原子获取和发出键,以防止重复——通常会造成性能瓶颈。仅仅使用数据库序列是不够的;真正的原子性需要诸如Postgres建议锁之类的机制。 虽然对消费者有益,但实现单调递增的键需要仔细考虑生产者架构和潜在的性能权衡。Kafka分区偏移量自然地充当分区内消息的此类键。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 幂等性键用于精确一次处理 (morling.dev) 10 分,由 defly 1小时前发布 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

All these intricacies can be avoided when it is possible to use a monotonically increasing sequence value as the idempotency key. In that case, the consumer does not need to store all the keys it ever has processed (or a reasonably sized subset thereof). It only needs to store a single value, the one of the latest message which it has processed. If it receives a message with the same or a lower idempotency key, that message must be a duplicate and can be ignored. When receiving messages from a partitioned source, such as a Kafka topic with multiple partitions, or from multiple independent producers (e.g., different clients of a REST API, each using their own separate sequence), then the latest key value per partition must be stored.

Monotonically increasing idempotency keys are a great improvement from the perspective of the message consumer. On the flipside, they may make things more complicated for producers: creating monotonically increasing sequence values isn’t without its own challenges. It is trivial if producers are single-threaded, producing one message at a time. In that case, a database sequence, or even a simple in-memory counter, can be used for creating the idempotency keys. Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy. That way, disk IO can be reduced. From a consumer perspective, Kafka partition offsets fall into that bucket, as they can be considered a monotonically increasing idempotency key for the messages consumed from a given partition.

Things get more complicated when the producer is subject to multiple concurrent requests at once, for instance a REST service with multiple request workers, perhaps even scaled out to multiple compute nodes in a cluster. To ensure monotonicity, retrieval of the idempotency key and emitting a message with that key must happen atomically, uninterrupted by other worker threads. Otherwise, you may end up in a situation where thread A fetches sequence value 100, thread B fetches sequence value 101, B emits a message with idempotency key 101, and then A emits a message with idempotency key 100\. A consumer would then, incorrectly, discard A’s message as a duplicate.

For most cases, ensuring this level of atomicity will impose a severe bottleneck, essentially serializing all requests of the producer system, regardless of how many worker threads or service instances you deploy. Note that if you really wanted to go down that route, solely using a database sequence for producing the idempotency key will not work. Instead, you’d have to use a mechanism such as Postgres advisory locks in order to guarantee monotonicity of idempotency keys in the outgoing messages.

联系我们 contact @ memedata.com