为什么动漫猫娘挡住了我访问Linux内核?
Why are anime catgirls blocking my access to the Linux kernel?

原始链接: https://lock.cmpxchg8b.com/anubis.html

## 阿努比斯:动漫猫娘与有缺陷的反机器人防御 阿努比斯是一个旨在保护网站免受人工智能爬虫侵害的项目,它向访问者展示了一个计算上简单但对人类来说不可能完成的挑战:找到一个nonce值,当它与挑战字符串结合时,产生一个具有前导零的SHA-256哈希值——类似于比特币挖矿。这表现为动漫猫娘头像在授予访问权限之前请求“灵魂称重”。 虽然意图是好的(作者,一家小型VPS主机提供商,理解了恶意爬虫的影响),但作者认为阿努比斯对于拥有充足计算资源的决心十足的人工智能供应商来说是无效的。他们的计算表明,即使爬取所有目前受到阿努比斯保护的网站,也只需要几分钟和几美元的成本,只需一台基本的云虚拟机即可。 该系统对资源有限的用户造成了不成比例的影响,作者使用一个简单的C程序成功绕过了它,以“挖掘”一个有效的token,突出了其自动化潜力。在被报告后,维护者迅速修复了一个允许重复token交换的bug。最终,作者质疑阿努比斯是否实现了其目标,并得出结论认为它主要增加了访问在线资源的摩擦,同时提供的实际保护微乎其微。

## 阿努比斯与对抗AI爬虫的斗争 一个名为阿努比斯的新项目,因使用动漫猫娘图像来挑战并阻止AI网络爬虫而备受关注。虽然看似古怪,但讨论揭示了对这些爬虫对互联网影响的更深层担忧。 核心问题并非一定是*阻止*复杂的爬虫,而是减轻为LLM训练抓取数据的行为不端机器人发起的巨大请求量。这些机器人经常无视网站规则(如`robots.txt`),并使服务器不堪重负。阿努比斯试图通过轻量级的工作量证明挑战来减缓这些爬虫的速度,但其有效性备受争议——它很容易被有决心的人绕过。 批评者指出阿努比斯存在安全漏洞(包括最近的XSS漏洞),并质疑其对JavaScript的依赖。一些人建议使用替代方案,如链接迷宫或仅仅通过IP/User-Agent阻止流量,但这些方法通常可以通过住宅代理绕过。最终,这场争论凸显了一种日益增长的紧张关系:需要在保护网站资源免受恶意抓取的同时,避免对合法用户采取过于侵入性的措施。目前的情况似乎难以为继,许多人担心互联网的金融模式会受到破坏。
相关文章

原文
Anubis.

Hey… quick question, why are anime catgirls blocking my access to the Linux kernel?

Intro

I’ve started running into more sites recently that deploy Anubis, a sort of hybrid art project slash network countermeasure. The project “weighs the souls” of HTTP requests to help protect the web from AI crawlers.

If you’ve seen anime catgirl avatars when visiting a new website, that’s Anubis.

A website blocked with Anubis

I’m sympathetic to the cause – I host this blog on a single core 128MB VPS, I can tell you some stories about aggressive crawlers!

Anubis recently started blocking how I access git.kernel.org and lore.kernel.org. Those sites host the Linux Kernel Mailing List archive and the kernel git repositories. As far as I know I do have a soul, I just wasn’t using a desktop browser… so how exactly is my soul being weighed?

Note: Linux has Tux 🐧, OpenBSD has Puffy 🐡, SuSE has Geeko 🦎 and Microsoft has Bob 🤓… nothing wrong with mascots! 😸

Problem

The traditional solution to blocking nuisance crawlers is to use a combination of rate limiting and CAPTCHAs. The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

Anubis – confusingly – inverts this idea. It insists visitors solve a problem trivial for computers, but impossible for humans. Visitors are asked to brute force a value that when appended to a challenge string, causes it’s SHA-256 to begin with a few zero nibbles.

lib/challenge/proofofwork/proofofwork.go#L66

If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that, perhaps most famously Coinhive and JSECoin.

So how do some useless SHA-256 operations prove you’re not a bot? The argument goes that this simply makes it too expensive to crawl your website.

The typical datacenter used by an AI crawler

This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.

Numbers

Let’s assume the argument has some merit and math out the claims.

We can see that with the default Anubis configuration, a typical website visitor will have to solve a challenge with a difficulty of 4.

anubis.go

This means that a visitor must make the first 4 hex digits of the challenge hash be zero, so 16 bits (4 digits, one nibble each). Therefore, you can expect to mine a suitable nonce within 2^16 SHA-256 operations.

If every single github star on the anubis project represents a website that has deployed Anubis, how much would the cloud services bill be to mine enough tokens to crawl every single website?

Anubis Project Stars

https://github.com/TecharoHQ/anubis/blob/main/anubis.go#L32

At the time of writing, Anubis has 11,508 github stars.

The default configuration means mining one token gets you access for 7 days (although I think this expiration check is broken, see below), so we need 11,508 * 2^16 SHA-256 operations per week, how expensive is that?

To get some numbers, I started an e2-micro vm on Google Compute Engine, and ran openssl speed. This is what you get in the free tier.

$ openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 6915549 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 4631718 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 393694 sha256's in 3.21s
Doing sha256 for 3s on 1024 size blocks: 100123 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 13300 sha256's in 2.98s
Doing sha256 for 3s on 16384 size blocks: 7137 sha256's in 2.99s
version: 3.0.17
built on: Tue Aug  5 07:09:41 2025 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 ...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           36882.93k    98809.98k    31397.40k    34175.32k    36561.61k    39107.90k

It looks like we can test about 2^21 every second, perhaps a bit more if we used both SMT sibling cores. This amount of compute is simply too cheap to even be worth billing for.

So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.

In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.

I’m just not convinced this math works… this is literally nothing for a souless AI vendor with a monthly cloud services budget in the 8 figures. However, the cost for real soul-owning humans with limited access to compute is high – the Anubis forums are full of complaints like these:

The discussion forums are full of users on limited devices complaining

Alternatives

Anubis cites hashcash as the primary inspiration for their design, an anti-spam solution from the 90s that was never widely adopted.

The idea of “weighing souls” reminded me of another anti-spam solution from the 90s… believe it or not, there was once a company that used poetry to block spam!

Habeas would license short haikus to companies to embed in email headers. They would then aggressively sue anyone who reproduced their poetry without a license. The idea was you can safely deliver any email with their header, because it was too legally risky to use it in spam.

Here’s a sample haiku:

winter into spring
brightly anticipated
like Habeas SWE (tm)

Was this a good idea? I don’t know, but they really did sue a few spammers!

Workarounds

So you’re trying to read LKML, but catgirl says no… is there a solution?

My issue is I don’t want to use a desktop browser to mine the required value, so how can I get the auth cookie?

If we look at the response with curl, we can see the challenge in the HTTP headers:

$ curl -I https://lore.kernel.org/
HTTP/2 200
server: nginx
set-cookie: techaro.lol-anubis-auth=; Path=/
set-cookie: techaro.lol-anubis-cookie-test-if-you-block-this-anubis-wont-work=5d737f0600ff2dd; Path=/

That techaro.lol-anubis-cookie is the challenge, here is a quick C program to mine an acceptable token:

lib/http.go

We can examine this auth token and see what Anubis gave us…

$ base64 -d <<< eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9 | jq
{
  "alg": "EdDSA",
  "typ": "JWT"
}
$ base64 -d <<< eyJhY3Rpb24iO...OTYifQ== | jq
{
  "action": "CHALLENGE",
  "challenge": "5d737f0600ff2dd",
  "exp": 1756185722,
  "iat": 1755580922,
  "method": "fast",
  "nbf": 1755580862,
  "policyRule": "dbf942088788cc96"
}

It looks like exp is the expiry date, so 1756185722, which is…

$ date --date @1756185722
Mon Aug 25 22:22:02 PDT 2025

Yep, about 7 days from the date I requested it. You can now place that into a cookie file for curl,lynx etc.

Interestingly, sending the same request the next day got me a new signed cookie!?

This seems like a bug – exchanging a mined token for an auth cookie should immediately remove the challenge from the store, or there is a double spend vulnerability.

This error benefits me, I have to mine less tokens, but I’ll open an issue 😇

Update: wow, fixed just a few minutes after opening an issue by the maintainer!

This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

Notes

  • I wrote this article with my own puny brain, I didn’t use any AI. I know there are endashes (they’re not emdashes!) – I have a habit of writing two consecutive dashes, which pandoc converts into U+2013.
  • This post is a bit critical of a small well-intentioned project, so I felt obliged to email the maintainer to discuss it before posting it online. I didn’t hear back.
联系我们 contact @ memedata.com