验证码已经失效20年了

验证码已经失效20年了
CAPTCHAs have failed for 20 years

原始链接: https://www.browserbase.com/blog/why-captchas-are-getting-harder

二十多年来，互联网一直陷入 CAPTCHA 设计者与机器人开发者之间不断升级的“猫鼠游戏”中。从变形文字到基于图像的物体识别，每一代挑战都被 OCR 和机器学习的进步一一破解。随着机器模仿人类感知的能力不断提升，“测试机器能力”这一前提已然过时。现代反机器人安全已转向行为分析和浏览器指纹识别，旨在评估会话是否合法。然而，这为日益增多的执行生产性任务的 AI 智能体制造了阻碍。 Browserbase 认为，解决方案在于不再强迫自动化程序模仿人类行为。智能体不应试图绕过测试，而应利用“已验证”身份和 Web Bot Auth，以加密方式证明其身份。通过从匿名且可疑的机器人流量转向经过认证的智能体会话，合法的自动化程序可以完全绕过安全关卡。归根结底，最有效的 CAPTCHA “破解器”是那种能够建立足够信任，从而从一开始就不会受到挑战的机制。

最近，一篇题为《验证码（CAPTCHA）已失败了20年》的文章在 Hacker News 上引发了一场讨论，凸显了围绕机器人检测的持续争论。用户认为，验证码只是一场注定会输的“猫鼠游戏”，因为攻击者总能绕过防御，且其行为已变得与合法用户难以区分。虽然一些评论者指出，这些工具在安全之外还有其他用途（例如训练视觉模型），但舆论普遍认为它们变得越来越具有侵扰性。随着验证码从保护表单转向限制网站的常规访问，人们的挫败感日益增加，导致许多用户直接放弃访问相关网站。这场讨论探讨了传统验证码为何会失效，并指出自 20 世纪 90 年代以来，自动化滥用行为就一直困扰着互联网。随着攻击者不断优化其手段，视觉测试的有效性持续下降。展望未来，参与者认为验证码时代可能即将终结，行业正转向更具限制性的替代方案，如强制登录墙、付费墙或新兴的加密身份验证协议（如 PACT）。归根结底，社区认为验证码的持续存在是一项已达效用终点的失败防御策略。

原文

TL;DR: Every CAPTCHA generation (distorted text, harder text, image grids) was eventually beaten by machines. Now with everyone using agents to run real workflows, the game has changed from testing what a browser can do to verifying who it is. That's why Browserbase is building agent identity with Verified and Web Bot Auth, because the best CAPTCHA “solver” never sees a CAPTCHA at all.

The CAPTCHA arms race: from distorted text to browser identity

If you've clicked every traffic light, bus, and crosswalk in a blurry image grid, you've taken part in one of the internet's longest-running security experiments. Those clicks were solving CAPTCHAs: tests of whether you're human.

As websites grew popular in the late 1990s, so did the incentives to abuse them. Spammers created thousands of fake accounts, bots scraped search engines, and scripts flooded forums with ads. Every popular site faced the same question: how do you tell a human from a machine?

CAPTCHA is a backronym for Completely Automated Public Turing test to tell Computers and Humans Apart, coined in a 2003 paper by Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford at Carnegie Mellon.

It's a reverse Turing test. The original asks a human to spot a machine through conversation; a CAPTCHA flips it, so the machine asks the question and the test passes if the responder behaves like a human. The goal isn't to prove intelligence. It's to make automation cost more than the attack is worth.

For over 20 years, every CAPTCHA has eventually been fooled, and each generation follows the same cycle:

defenders build a new challenge → it works for a while → attackers learn to solve it → defenders build something new → repeat.

It’s an endless cat and mouse chase.

Let's hop into the arena: cat (computer) vs. mouse (CAPTCHA).

Level 1: can you read this?

The first answer was surprisingly simple: make computers read.

Early CAPTCHAs showed distorted text (warped letters, uneven spacing, random lines, noisy backgrounds). To a human it was usually trivial. Our brains are remarkably good at recognizing patterns through missing pixels and distortion.

Computers, however, were not.

Optical character recognition (OCR) at the time did well on clean printed text but struggled when characters were rotated, stretched, overlapped, or obscured. The key assumption was that perception was the hard part. If a computer couldn't tell where one character ended and the next began, it couldn't read the word.

For a while, it worked. A distorted word stopped automated scripts while barely slowing a human. AltaVista and Yahoo adopted early systems, and the approach worked well enough that the Carnegie Mellon researchers formally coined the term.

Then OCR got better. 🐈

Attackers realized they didn't need to solve the whole CAPTCHA at once. Most text CAPTCHAs were generated in stages: render text, apply distortion, add noise, draw obfuscating lines, output an image.

Well, if the CAPTCHA was created in stages, then it could be defeated in stages. Attackers built computer vision pipelines that removed background noise, thresholded images to black-and-white, segmented characters into individual regions, and fed those regions to OCR.

What appeared to be an AI problem became an image processing problem.

Once segmentation was reliable, recognition accuracy jumped. The same advances that digitized books and read street signs made computers capable CAPTCHA solvers.

The mouse had made its move, and the cat adapted. Time to repeat.

Level 2: make the text harder

The defenders figured that if attackers can segment characters, then make segmentation impossible. CAPTCHAs grew aggressive (overlapping letters, unnatural shapes, noisy backgrounds), some so distorted they looked more like abstract art than text.

Around this time, von Ahn noticed something. Millions of people were spending seconds a day solving CAPTCHAs, an enormous amount of visual recognition work that immediately disappeared. What if that effort could be useful?

This idea became reCAPTCHA.

Instead of random text, it showed scanned words from books and archives that OCR couldn't confidently read. Every solved challenge helped digitize printed material. For a while, everyone won when websites got protection, and libraries got digitized.

Then machine learning arrived. 🐈

Traditional OCR relied on hand-engineered rules (edge detectors, character templates, segmentation heuristics) that worked until designers changed the distortion. Machine learning removed the hard-coding. Instead of teaching a computer how to recognize a character, researchers trained models on millions of examples and let them learn the patterns. Neural networks recognized heavily distorted characters without perfect segmentation, because the noise that confused traditional OCR still carried enough signal to recover the answer.

The CAPTCHAs designed to stop machines eventually became harder for humans than for the models.

The mouse raised the stakes, but the cat learned faster.

Level 3: find the traffic lights

By the early 2010s, the founding assumption (computers can't read) was hard to defend. So designers abandoned text and asked users to identify objects instead.

Where text CAPTCHAs tested character recognition, image CAPTCHAs tested semantic understanding. Humans do this effortlessly when we recognize a bicycle from the side, half-hidden behind a car, at night, or cropped to a corner of the frame.

For computers, this was the same hand-engineered problem as before, now in two dimensions. Traditional vision systems detected edges, corners, gradients, and textures, then tried to assemble them into objects:


features = combine(detect_edges(image), detect_corners(image), compute_gradients(image))
if matches_bicycle_template(features):
    return "bicycle"

The real world doesn't follow templates. A bicycle appears from thousands of angles, partly obscured, under shifting light. The edge cases are endless.

Then ImageNet happened. 🐈

The 2009 dataset gave researchers millions of labeled images across thousands of categories, enough to tackle object recognition at scale. In 2012, a deep neural network called AlexNet dramatically outperformed traditional vision systems on the ImageNet benchmark. Once again the question shifted from “can a computer recognize a bicycle?” to “how much labeled data can we give the model?” Convolutional neural networks directly learned visual features like edges and textures in early layers, shapes and parts in the middle, and whole objects deep down. No template required.

The timing was pretty unfortunate for designers. Traffic lights, buses, crosswalks, and storefronts were common CAPTCHA categories, and also among the most common categories in large vision datasets. The challenges meant to prove computers couldn't see appeared exactly as computers learned to.

The mouse found a new hiding place, but the cat learned how to see.

Level 4: the browser becomes the CAPTCHA

The pattern was now impossible to ignore.

Every generation assumed some capability humans had and computers lacked. Defenders built a challenge around it. Attackers automated it. Repeat.

Any challenge with a correct answer became a target for optimization. You can't build a test of human intelligence while developers are actively building to replicate it.

So modern anti-bot systems stopped asking whether a browser could solve a challenge and started asking whether it should be challenged at all.

They became probabilistic. Rather than one challenge, they collect signals across a session and combine them into a risk score. These signals ranged across browser fingerprints, installed fonts, canvas and WebGL rendering, TLS fingerprints, cookie history, network reputation, request timing, and interaction patterns. Individually weak, but together they form a detailed picture. A real Chrome browser on a real laptop behaves differently than a freshly spawned browser in a datacenter.

This is the philosophy behind reCAPTCHA v3 and Cloudflare Turnstile. When the system is confident, no CAPTCHA appears. When uncertain, it asks for more.

Then attackers realized solving CAPTCHAs was no longer the objective. If challenges only appear when a browser looks suspicious, the goal is to not look suspicious. Challenge solving became browser fingerprinting. Instead of better OCR or classifiers, attackers studied fingerprints, reputation, and network signals to pass as legitimate.

The mouse stopped asking questions, and the cat started learning how to blend in.

It’s a tie: proving who you are

Historically, websites treated every browser as an anonymous stranger. To gain confidence, they issued a challenge. Modern detection works in reverse. Instead of asking browsers to prove themselves repeatedly, sites try to determine whether they already recognize the browser.

Through that lens, the trends make sense. A browser with a consistent history is more trustworthy than one that appeared thirty seconds ago. One whose fingerprints, network, and behaviour all align is more trustworthy than one whose signals contradict each other. One tied to a known identity is more trustworthy than one that's anonymous.

The web that produced CAPTCHAs was dominated by anonymous traffic, where most automation existed to scrape, spam, or abuse. Treating every bot as suspicious was usually correct.

Today's web is different. Browser agents are booking travel, filing compliance reports, monitoring infrastructure, and completing workflows for real users. Websites still struggle to tell an agent acting for a user from a bot exploiting a system, and historically the safest option was to treat them the same.

Can this browser solve a CAPTCHA? → Should this browser see a CAPTCHA at all?

If the browser is the CAPTCHA, the open question is no longer what a browser can do but whether it can establish trust. That has produced a new model, where browsers and agents that explicitly prove who they are instead of repeatedly proving what they can do.

One emerging standard is Web Bot Auth, which lets browser agents cryptographically identify themselves as they navigate the web. Sites can then distinguish anonymous automation from agents operating through trusted providers, and make the call based on identity rather than inference. This is the direction we're building toward at Browserbase, in partnership with Cloudflare.

It's a different premise than the previous two decades: legitimate automation shouldn't have to pretend to be human. If the last twenty years taught computers to pass human tests, the next decade may be about giving them a way to introduce themselves instead. The most successful CAPTCHA “solver” is the one that never sees a CAPTCHA.

Give your agents identity and evolve from the cat vs. mouse chase here: browserbase.com/contact-web-bot-auth