ChatGPT 不会让你输入,直到 Cloudflare 读取你的 React 状态。
ChatGPT won't let you type until Cloudflare reads your React state

原始链接: https://www.buchodi.com/chatgpt-wont-let-you-type-until-cloudflare-reads-your-react-state-i-decrypted-the-program-that-does-it/

## ChatGPT 的机器人检测:深入了解 Cloudflare Turnstile 最近对 ChatGPT 机器人检测系统 Cloudflare Turnstile 的分析显示,其加密方案出乎意料地薄弱,并且进行了一个全面的指纹识别过程。研究人员解密了嵌入在 ChatGPT 网络流量中的 377 个 Turnstile 程序,发现它超越了简单的浏览器指纹识别。 Turnstile 评估浏览器特征(WebGL、硬件、字体)、Cloudflare 网络数据(IP 位置)和完全渲染的 ChatGPT React 应用程序状态这三个层面的 55 个属性。至关重要的是,它会验证一个*完整*的 React 应用程序启动 – 无头浏览器或不完整的渲染将会失败。 “加密”依赖于一个简单的 XOR 密码,密钥公开嵌入在程序的字节码中,使得解密变得简单直接。这使得观察确切的指纹识别清单成为可能。 除了指纹识别之外,Turnstile 还利用“信号协调器”来跟踪用户行为(击键时间、鼠标移动)以及工作量证明挑战。然而,其核心防御在于验证一个真实、完全加载的应用程序环境。 虽然混淆会阻碍随意检查,但分析表明该系统并非具有密码学安全性,而是依赖于通过混淆和动态检查来实现操作安全。该研究强调了一个隐私问题:生成密钥的服务器固有地可以访问指纹数据。

## ChatGPT 与 Cloudflare:机器人检测深度解析 最近一篇Hacker News上的帖子讨论了ChatGPT如何利用Cloudflare检测机器人,具体方法是要求React应用程序完全渲染*之后*才接受用户输入。这意味着Cloudflare会检查JavaScript是否已执行,从而有效地验证真实的浏览器环境。 讨论强调OpenAI可能实施此举是为了防止滥用免费ChatGPT访问,避免有人将其用作免费API。用户报告了问题,包括性能缓慢和在提示处卡住,可能与这种检测方法有关。 评论者们争论了使用虚拟机绕过此检测的可行性、交互*之前*数据收集的侵入性,以及Cloudflare等激进的机器人保护措施带来的整体糟糕用户体验。一些人建议OpenAI应该优先优化前端和服务器端模型效率,而不是进行客户端检查。 也有人对文章的清晰度和潜在的AI作者身份表示怀疑。
相关文章

原文

Every ChatGPT message triggers a Cloudflare Turnstile program that runs silently in your browser. I decrypted 377 of these programs from network traffic and found something that goes beyond standard browser fingerprinting.

The program checks 55 properties spanning three layers: your browser (GPU, screen, fonts), the Cloudflare network (your city, your IP, your region from edge headers), and the ChatGPT React application itself (__reactRouterContext, loaderData, clientBootstrap). Turnstile doesn't just verify that you're running a real browser. It verifies that you're running a real browser that has fully booted a specific React application.

A bot that spoofs browser fingerprints but doesn't render the actual ChatGPT SPA will fail.

The Encryption Was Supposed to Hide This

The Turnstile bytecode arrives encrypted. The server sends a field called turnstile.dx in the prepare response: 28,000 characters of base64 that change on every request.

The outer layer is XOR'd with the p token from the prepare request. Both travel in the same HTTP exchange, so decrypting it is straightforward:

outer = json.loads(bytes(
    base64decode(dx)[i] ^ p_token[i % len(p_token)]
    for i in range(len(base64decode(dx)))
))
# → 89 VM instructions

Inside those 89 instructions, there is a 19KB encrypted blob containing the actual fingerprinting program. This inner blob uses a different XOR key that is not the p token.

Initially I assumed this key was derived from performance.now() and was truly ephemeral. Then I looked at the bytecode more carefully and found the key sitting in the instructions:

[41.02, 0.3, 22.58, 12.96, 97.35]

The last argument, 97.35, is the XOR key. A float literal, generated by the server, embedded in the bytecode it sent to the browser. I verified this across 50 requests. Every time, the float from the instruction decrypts the inner blob to valid JSON. 50 out of 50.

The full decryption chain requires nothing beyond the HTTP request and response:

1. Read p from prepare request
2. Read turnstile.dx from prepare response
3. XOR(base64decode(dx), p) → outer bytecode
4. Find the 5-arg instruction after the 19KB blob → last arg is the key
5. XOR(base64decode(blob), str(key)) → inner program (417-580 VM instructions)

The key is in the payload.

What the Decrypted Program Checks

Each inner program uses a custom VM with 28 opcodes (ADD, XOR, CALL, BTOA, RESOLVE, BIND_METHOD, JSON_STRINGIFY, etc.) and randomized float register addresses that change per request. I mapped the opcodes from the SDK source (sdk.js, 1,411 lines, deobfuscated).

The program collects 55 properties. No variation across 377 samples. All 55, every time, organized into three layers:

Layer 1: Browser Fingerprint

WebGL (8 properties): UNMASKED_VENDOR_WEBGL, UNMASKED_RENDERER_WEBGL, WEBGL_debug_renderer_info, getExtension, getParameter, getContext, canvas, webgl

Screen (8): colorDepth, pixelDepth, width, height, availWidth, availHeight, availLeft, availTop

Hardware (5): hardwareConcurrency, deviceMemory, maxTouchPoints, platform, vendor

Font measurement (4): fontFamily, fontSize, getBoundingClientRect, innerText. Creates a hidden div, sets a font, measures rendered text dimensions, removes the element.

DOM probing (8): createElement, appendChild, removeChild, div, style, position, visibility, ariaHidden

Storage (5): storage, quota, estimate, setItem, usage. Also writes the fingerprint to localStorage under key 6f376b6560133c2c for persistence across page loads.

Layer 2: Cloudflare Network

Edge headers (5): cfIpCity, cfIpLatitude, cfIpLongitude, cfConnectingIp, userRegion

These are injected server-side by Cloudflare's edge. They exist only if the request passed through Cloudflare's network. A bot making direct requests to the origin server or running behind a non-Cloudflare proxy will produce missing or inconsistent values.

Layer 3: Application State

React internals (3): __reactRouterContext, loaderData, clientBootstrap

This is the part that matters. __reactRouterContext is an internal data structure that React Router v6+ attaches to the DOM. loaderData contains the route loader results. clientBootstrap is specific to ChatGPT's SSR hydration.

These properties only exist if the ChatGPT React application has fully rendered and hydrated. A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them. A bot framework that stubs out browser APIs but doesn't actually run React won't have them.

This is bot detection at the application layer, not the browser layer.

The Exit: How the Token Is Built

After collecting all 55 properties, the program hits a 116-byte encrypted blob that decrypts to 4 final instructions:

[
  [96.05, 3.99, 3.99],     // JSON.stringify(fingerprint)
  [22.58, 46.15, 57.34],   // store
  [33.34, 3.99, 74.43],    // XOR(json, key)
  [1.51, 56.88, 3.99]      // RESOLVE → becomes the token
]

The fingerprint is JSON.stringify'd, XOR'd, and resolved back to the parent. The result is the OpenAI-Sentinel-Turnstile-Token header sent with every conversation request.

What Else Sentinel Runs

Turnstile is one of three challenges. The other two:

Signal Orchestrator (271 instructions): Installs event listeners for keydown, pointermove, click, scroll, paste, and wheel. Monitors 36 window.__oai_so_* properties tracking keystroke timing, mouse velocity, scroll patterns, idle time, and paste events. A behavioral biometric layer running underneath the fingerprint.

Proof of Work (25-field fingerprint + SHA-256 hashcash): Difficulty is uniform random (400K-500K), 72% solve under 5ms. Includes 7 binary detection flags (ai, createPRNG, cache, solana, dump, InstallTrigger, data), all zero across 100% of 100 samples. The PoW adds compute cost but is not the real defense.

Who Can Decrypt the Token

The XOR key for the inner program is a server-generated float embedded in the bytecode. Whoever generated the turnstile.dx knows the key. The privacy boundary between the user and the system operator is a policy decision, not a cryptographic one.

The obfuscation serves real operational purposes: it hides the fingerprint checklist from static analysis, prevents the website operator (OpenAI) from reading raw fingerprint values without reverse-engineering the bytecode, makes each token unique to prevent replay, and allows Cloudflare to change what the program checks without anyone noticing.

But the "encryption" is XOR with a key that's in the same data stream. It prevents casual inspection. It does not prevent analysis.

The Numbers

Metric Value
Programs decrypted 377/377 (100%)
Unique users observed 32
Properties per program 55 (identical across all samples)
Instructions per program 417-580 (mean 480)
Unique XOR keys (50 samples) 41
SO behavioral properties 36
PoW fingerprint fields 25
PoW solve time 72% under 5ms

Methodology

No systems were accessed without authorization. No individual user data is disclosed. All traffic was observed from consented participants. The Sentinel SDK was beautified and manually deobfuscated. All decryption was performed offline using Python.

联系我们 contact @ memedata.com