赞美 Memcached

赞美 Memcached
In praise of memcached

原始链接: https://jchri.st/blog/in-praise-of-memcached/

系统管理员通常默认使用 Redis 进行缓存，但随之产生了一个常见问题：开发人员经常开始将易失性的缓存当作持久化数据库来使用。由于 Redis 被集成到技术栈中时通常被假定为纯粹的临时存储，这种滥用在需要升级或迁移时会带来巨大的运维风险。作者认为，**Memcached** 是纯缓存层的更优选择。与 Redis 不同，Memcached 在设计上强制采用了“无状态”架构。其客户端库旨在从容应对服务器停机，通常会忽略连接错误并返回默认值，而不是导致应用程序崩溃。此外，Memcached 的客户端哈希使集群变得简单，并消除了管理持久化状态的复杂性。虽然 Redis 功能丰富，但其灵活性可能导致“像照顾宠物一样”的运维困境。Memcached 的简洁及其缺乏磁盘持久化的特性，使其成为那些需要高性能缓存，又想规避意外数据依赖风险的开发人员的理想低开销解决方案。最后，作者提醒我们，许多所谓的“数据库缓慢”问题实际上是查询优化问题——在实施缓存之前，请务必确保已妥善管理数据库索引。

Hacker News 上关于“赞美 Memcached”的讨论，凸显了 Memcached 的运维简洁性与 Redis 功能丰富性之间持续的争议。该文章的批评者认为，作者采用了“稻草人谬误”，并指出许多人之所以偏爱 Redis，恰恰是因为它能自动处理集群与共识机制——而这些正是 Memcached 所忽略的。怀疑论者还反驳了文章中称 Memcached 的“静默失败”（即忽略连接错误）是一种特性的观点，认为在处理复杂应用状态时，这种行为非常危险。相反，Memcached 的支持者认为，对于简单的缓存需求，它依然是更出色的“开箱即用”方案。他们指出，若要稳定地将 Redis 作为缓存使用，往往需要投入巨大的管理成本，例如谨慎处理持久化、配置内存策略，以及避免使用复杂的数据结构以防数据不一致。归根结底，评论者们强调，“最好的”工具取决于具体的项目需求：Memcached 提供了一种极简且易于维护的方案，而 Redis 在提供强大功能的同时，也带来了更高的运维复杂度。此外，讨论还涉及了对现代技术博客常见乱象的沮丧情绪；一些用户质疑该原文非传统的文风是否意味着它是 AI 生成的内容。

原文

If you happen to find yourself in a sysadmin position, or a position where you just so happen to maintain someone’s infrastructure, chances are that at some point in time the topic “we need a cache” comes up.

You think for a moment and reach out for Redis, because you’re used to it, it’s fully featured, and it works! You remember it being a good, solid cache, and you wonder which new features recent releases have brought, and head to its homepage:

Your agents aren’t failing. Their context is.

Inquiring agents want to know:

How can I use Redis Iris as a real-time context engine for AI apps?

Right, so probably something with AI^{. This is sort of understandable, because
Redis is a company that wants to make money.}

Anyways, Redis homepage aside, you deploy it, and off you go - your trusty cache. You hand the connection string to the people who asked for it, and off you go.

Few months later

After a while, it turns out that cache.set("key", "value") is a really simple abstraction, and definitely easier than INSERT INTO table VALUES ('key', 'value'). People start treating the REmote DIctionary Server as something that’s always there, something that persists data, something that is a database.

You don’t know this. Your ops colleagues don’t know this. Therefore, your alerting doesn’t know this either, because you assume that people treat the cache as something volatile.

You find out that this has been going on when you happen to do something to Redis. Maybe you upgrade it, maybe you move it to another node, maybe your cat hits the eject button on your RAID0 server’s HDD tray.

The issue isn’t that Redis doesn’t have persistence, the issue is that usually, Redis is brought into a stack as a cache, and it is run with the assumption that people treat it that way.

Usually, by the time you realize this, it is already too late, and Redis is too intertwined in the app to really leave its place. Instead, you have the eternal pleasure of maintaining it and monitoring it like a pet.

Enter memcached

First off - what’s a memcached? Easy, ask its website:

What is Memcached?

Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Wow, first sentence on the page, even with code examples. And look at those cute little mascots at the top!

Memcached is also a cache, similar to Redis. Chances are that you’re using a framework like Django, which supports pluggable caching and allows you to switch between different caching backends.

But why should you use memcached when it has much less features than Redis? Here are my reasons for why these days, I will always prefer memcached over Redis:

Dealing with memcached downtime is incredibly easy, because client libraries generally ignore connection exceptions. For instance, a simple get will just return the default value (or none) if the server is down.
Clustering memcached is wonderful, because memcached actually has no clustering built-in. To “cluster” it, you configure the client library with multiple URLs^{, and the client will select the target instance based on
hashing the key. If a client-side call detects an instance as done, it
removes the node from the
hasher.
After a certain time, the client will automatically attempt to reconnect and
use the dead node.}
memcached “solves” the whole persistence issue, because it does not persist to disk. It is therefore a perfect fit to just being scheduled as a stateless workload wherever you desire it.

None of these things are impossible with Redis, it’s just that memcached’s architecture in general more leans towards these directions, which makes it much, much more straightforward from an operations point of view.

But because of memcached being such a relatively simple application (plus the fact that you can run dozens instances of it with ~64 MB cache size and close to no overhead), if I need a cache these days, I usually reach out to memcached.

That said, a lot of “database too slow” problems actually begin their life as “query too slow” or “missing indices”, so be a kind person and help your developers with optimizing their queries.

Also, if you’re curious about some of the decisions behind memcached, the blog contains interesting posts, with one published just in May: “How Long Does That Response Take… For Real?”.