Codex、Opus、Gemini 尝试制作《反恐精英》。

Codex、Opus、Gemini 尝试制作《反恐精英》。
Codex, Opus, Gemini try to build Counter Strike

原始链接: https://www.instantdb.com/essays/agents_building_counterstrike

## AI 构建反恐精英：模型对决最近，Gemini 3 Pro、Codex Max 5.1 和 Claude Opus 4.5 接受挑战，仅使用文本提示构建一个基础的多人反恐精英游戏。目标：一个具有功能性前端和后端的 3D 浏览器 FPS 游戏。 **前端（设计与机制）：** Claude Opus 4.5 在视觉设计方面表现出色，创建了吸引人的地图、角色和枪械。Gemini 3 Pro 紧随其后，而 Codex Max 5.1 在视觉效果上有所落后。 **后端（多人功能）：** 在多人实现方面，情况发生了逆转。Gemini 3 Pro 被证明最可靠，能够高效地添加持久性并处理错误。Claude 表现良好，而 Codex 则面临更多困难。 **关键观察：** Claude 擅长初始设计，有效利用文档。Gemini 在逻辑问题解决和迭代调试方面表现出色，受益于持续的构建测试。Codex 处于中间位置，始终获得第二名。该实验突出了 AI 在处理复杂任务和通过 CLI 迭代代码方面的进展。然而，挑战依然存在——尤其是在细致的调试方面（例如 React 的 `useEffect` 钩子，这让 Claude 陷入困境）以及弥合“氛围编码”和稳健编程实践之间的差距。最终，所有三个模型都成功构建了一个可玩的多人 FPS 游戏，*没有*任何手动编写的代码——这是一项重大成就，展示了 AI 编码能力的快速进步。你可以在这里试玩这些游戏：[Codex](https://cscodex.vercel.app/)，[Claude](https://csclaude.vercel.app/)，[Gemini](https://csgemini.vercel.app/)。

## AI尝试《反恐精英》：Hacker News 总结一篇 Hacker News 帖子讨论了“instantdb.com”项目，该项目使用 Codex、Opus 和 Gemini AI 模型来构建类似《反恐精英》的游戏。用户对进展印象深刻，但也指出了局限性和潜在问题。一位用户发现 Claude 的版本存在一个漏洞，导致容易且非预期的击杀。另一位用户发现 Gemini 版本中的着色器代码与 three.js 内置代码完全相同，并且可能侵犯了 A.J. Preetham 的版权。开发者提供了每个 AI 尝试的 GitHub 仓库链接（Codex、Gemini、Claude）。讨论还涉及《反恐精英》过渡的历史（1.6 到 Source 到 GO/CS2）以及对一款技能导向、低图形化版本，让人联想到老式射击游戏如《Quake》的需求。有人建议改进网站，例如为图片添加灯箱。总的来说，该项目引发了人们对人工智能在游戏开发中的潜力以及当前局限性的兴趣。

原文

In the last week we’ve had three major model updates: Gemini 3 Pro, Codex Max 5.1, Claude Opus 4.5. We thought we’d give them a challenge:

Build a basic version of Counter Strike. The game had to be a 3D UI and it had to be multiplayer.

If you're curious, pop open (an ideally large computer screen) and you can try out each model's handiwork yourself:

Codex Max 5.1: https://cscodex.vercel.app/
Claude Opus 4.5: https://csclaude.vercel.app/
Gemini 3 Pro: https://csgemini.vercel.app/

We have a full video of us going through the build here, but for those who prefer text, you get this post.

We'll go over some of our high-level impressions on each model, then dive deeper into the performance of specific prompts.

The Setup

We signed up for the highest-tier plan on each model provider and used the defaults set for their CLI. For Codex, that’s 5.1 codex-max on the medium setting. For Claude it’s Opus 4.5. And with Gemini it's 3 pro.

We then gave each model about 7 consecutive prompts. Prompts were divided into two categories:

Frontend: At first agents only having to worry about the game mechanics. Design the scene, the enemies, the logic for shooting, and some sound effects.

Backend: Once that was done agents would then make the game multiplayer. They would need to build be selection of rooms. Users could join them and start shooting.

A High-Level Overview

So, how'd each model do?

In a familiar tune with the other Anthropic models, Opus 4.5 won out on the frontend. It made nicer maps, nicer characters, nicer guns, and generally had the right scene from the get-go.

Once the design was done, Gemini 3 Pro started to win in the backend. It got less errors adding multiplayer and persistence. In general Gemini did the best with making logical rather than visual changes.

Codex Max felt like an “in-between” model on both frontend and backend. It got a lot of “2nd place” points in our book. It did reasonably well on the frontend and reasonably well on the backend, but felt less spikey then the other models.

Here’s the scorecard in detail:

	Codex	Claude	Gemini
Frontend
Boxes + Physics	🥉	🥇	🥈
Characters + guns	🥉	🥇	🥈
POV gun	🥈	🥇	🥉
Sounds	🥈	🥇	🥈
Backend
Moving	🥈	🥉	🥇
Shooting	🥉	🥇	🥉
Saving rooms	🥈	🥉	🥇
Bonus	🥈	🥉	🥇

Okay, now let’s get deeper into each prompt.

Goal number 1 was to set up the physics for the game. Models needed to design a map with a first-person viewpoint, and the ability to shoot enemies.

Prompt

I want you to create a browser-based version of counter strike, using three js.

For now, just make this local: don't worry about backends, Instant, or anything like that.

For the first version, just make the main character a first-person view with a cross hair. Put enemies at random places. Enemies have HP. You can shoot them, and kill them. When an enemy is killed, they respawn.

Make everything simple polygons -- rectangles.

Here’s a side-by-side comparison of the visuals each model came up with:

Visually Claude came up with the most interesting map. There were obstacles, a nice floor, and you could see everything well.

Gemini got the something nice working too.

Codex had an error on it’s first run ^[1] (it called a function without importing it), but it fixed it real quick. Once bugs were fixed, it’s map was the least visually pleasing. Things were darker, there were no obstacles, and it was hard to tell the floor.

Now that we had a map and some polygons, we asked the models to style up the characters. This was our prompt:

I want you to make the enemies look more like people. Use a bunch of square polygons to represent a person, and maybe a little gun

Here’s the result of their work:

Again it feels like Claude did the best job here. The character look quite human — almost at the level of design in Minecraft. Gemini did well too. Codex made it’s characters better, but everything was a single color, which really diminished it compared to the others.

We then asked each model to add a gun to our first-person view. When we shoot, we wanted a recoil animation.

I want you to make it so I also have a gun in my field of view. When I shoot, the gun moves a bit.

Here’s the side-by-side of how the recoil felt for each model:

Here both Claude and Codex got the gun working in one shot. Claude’s gone looks like a real darn pistol though.

Gemini had an issue trying to stick the gun to the camera. This got us in quite a back and forth, until we realized that the gun was transparent.

We were almost done the frontend: the final step was sound. Here’s what we asked:

I want you to use chiptunes to animate the sound of shots. I also want to animate deaths.

All models added sounds pretty easily. The ending part in our prompt: “I also want to animate deaths.” was added at the spur of the moment in the video. Our intention was to add sound to deaths. But that’s not what happened.

All 3 models misunderstood the sentence in in the same way: they thought the wanted to animate how the characters died. Fair enough, re-reading the sentence again, we would understand it that way too.

Here’s the results they came up with:

All the models got the sound done easily. They all got animations, but we thought Claude’s animation felt the most fun.

Now that all models had a real frontend, we asked them to make it multiplayer.

We didn’t want the models to worry about shots just yet: goal 1 was to share the movement positions. Here’s what we asked it to do:

I want you to use Instant presence.

Don't save anything in the database, just use presence and topics. You can look up the docs.

There should should just be one single room.

You no longer the need to have the enemies that are randomly placed. All the players are what get placed.

For now, don't worry about shots. Let's just make it so the positions of the players are what get set in presence.

Gemini got this right in one shot. Both Codex and Claude needed some more prodding.

	Codex	Claude	Gemini
Moving	🥈	🥉	🥇

It was interesting to see how each model tried to solve problems:

Codex used lots of introspection. It would constantly look at the typescript library and look at the functions that were available. It didn’t seem to look at the docs as much.

Claude looks at the docs a bunch. It read and re-read our docs on presence, but rarely introspected the library like Codex did.

Gemini seemed to do both. It looked at the docs, but then I think because it constantly ran the build step, it found any typescript errors it had, and fixed it up.

Gemini made the fastest progress here, though all of them got through, as long as we pasted the errors back.

Then we moved to getting shots to work. Here was the prompt:

Now let's make shots work. When I shoot, send the shot as a topic, and make it affect the target's HP. When the target HP goes to zero, they should die and respawn.

	Codex	Claude	Gemini
Shooting	🥉	🥇	🥈

Claude got this right in one shot. Gemini and Codex had a few issues to fix, but just pasting the errors got them though.

Now that all models had a single room working, it was time to get them supporting multiple rooms.

The reason we added this challenge, was to see (a) how they would deal with a new API (persistence), and (b) how they would deal with the refactor necessary for multiple rooms.

So, now I want you to make it so the front page is actually a list of maps. Since our UI is using lots of polygons, make the style kind of polygonish

Make the UI look like the old counter strike map selection screen. I want you to save these maps in the database. Each map has a name. Use a script to generate 5 random maps with cool names.

Then, push up some permissions so that anyone can view maps, but they cannot create or edit them.

When you join a map, you can just use the map id as the room id for presence.

The maps UI

All models did great with the UI. Here’s how each looked:

We kind of like Gemini’s UI the most, but they were all pretty cool.

The Persistence

And the persistence worked well too. They all dutifully created schema for maps, pushed a migration, and seeded 5 maps.

The Refactor

But things got complicated in the refactor.

	gpt 5.1 codex max (medium)	Claude 4.5 Opus	Gemini 3 Pro
Saving rooms	🥈	🥉	🥇

Gemini got things done in one shot. It also chose to keep the map id in the URL, which made it much handier to use. Codex took one back and forth with a query error.

But Claude really got stuck. The culprit was hooks. Because useEffect can run multiple times, it ended up having a few very subtle bugs. For example, it made 2 canvas objects instead of 1. It also had multiple animation refs running at once.

It was hard to get it to fix things by itself. We had to put our engineer hats on and actually look at the code to unblock Claude here.

This did give us a few ideas though:

Claude’s issues were human-like. How many of us get tripped up with useEffect running twice, or getting dependency arrays wrong? I think improving the React DX on these two issues could really push humans and agents further.
And would have happened if a non-programmer was building this? They would have gotten really stuck. We think there needs to be more tools to go from “strictly vibe coding”, to “real programming”. Right now the jump feels too steep.

At the end, all models built real a multiplayer FPS, with zero code written by hand! That’s pretty darn cool.

Parting thoughts

Well, models have definitely improved. They can take much higher-level feedback, and much higher-level documentation. What really strikes us though is how much they can iterate on their own work thanks to the CLI.

There’s still lots to go though. The promise that you never have to look at the code doesn’t quite feel real yet.