克劳德代码涌现行为:技能结合时
Claude Code Emergent Behavior: When Skills Combine

原始链接: https://vibeandscribe.xyz/posts/2025-01-07-emergent-behavior.html

## Claude 的技能协同:总结 最近,对 AI Claude 的实验揭示了一种强大的能力:技能组合。作者在开发提高 AI 性能的工具(从 macOS 平铺窗口管理器开始)时,发现 Claude 可以有效地合并不同的技能来解决复杂问题。 最初,创建了一个新的“优化关键路径”技能,但难以确定 *应该* 优化什么。将其与预先存在的“oberdebug”技能(专为通过日志进行系统性的错误诊断而设计)结合使用,结果具有变革性。这种协同作用使 Claude 能够以惊人的准确性 pinpoint 一个性能瓶颈(服务器响应中的 JSON 编码),最终显著缩短了关键操作的时间。 这一成功凸显了技能分层潜力。作者随后将“头脑风暴”与“前端设计”结合起来,为新博客创建了一个以设计为重点的构思流程,甚至将“代码审查”与其他技能集成,以实现更全面的开发方法。关键在于,Claude 的强大之处不仅在于单个技能,还在于它们协同工作的能力,从而解锁了新的问题解决和创造力水平。

## Claude 代码技能:并非那么涌现? 最近一篇在Hacker News(现已更新至vibeandscribe.xyz)上发表的文章,讨论了Claude代码新“技能”功能中的“涌现行为”,引发了争论。虽然被宣传为一种新能力,但许多评论员认为技能仅仅是结构化的提示组合——类似于使用工具的代理,或编写良好的动态工作流(MCPs)。 作者ryanthedev将技能视为可重用的工具,但承认这个概念对Claude来说并不新鲜。其他人指出Codex现在也提供类似的功能(“Talents”)。一个突出的挑战是可靠地激活技能;即使在明显适用时,Claude有时也会在*不*使用预期技能的情况下执行任务。 讨论的重点是技能调用其他技能的可能性,从而创建复杂的工作流,以及人工智能驱动的技能创建和编辑的可能性。然而,人们对“过于元”以及避免过度复杂性的问题表示担忧。最终,共识倾向于技能是一种有用的*抽象*——这是供应商提供的第一种此类功能——而不是一种真正涌现的现象。
相关文章

原文

I ran across something interesting today. Claude can combine skills! You might be thinking, "huh?", here's what I mean.

I've been vibe coding like a fiend lately, experimenting and learning the flow. Working on ways to make AI more successful. First thing I built? A tiling window manager for macOS. Because that's obviously the first thing to build when your goal is world domination.

For something I didn't write, I was happy with the performance. Not winning any awards, but feels responsive to me. Still, I knew there had to be plenty of room for improvement in that hellscape of a repo.

With this in mind, Claude and I colluded on a new optimize-critical-path skill. You can guess what it does. First path? thegrid focus next command. I kicked off the agent and told it to look at this action and do its thing.

Claude was all too happy to use this new skill. It started kicking off subagents and tokens were flying across my screen like a scene from a worse, more dystopian version of the Matrix. I noticed it was thrashing and making assumptions about performance. I heard a voice in my head that sounded like Obi-Wan, telling me that we should always measure first.

Then I got an idea. Previously I hacked together a debug skill that adds logs and helps trace issues. I thought to myself, can I combine skills?

> what is another way we can approach this. Could we combine optimize critical path skill with my oberdebug?

● Good idea - let's use oberdebug's hypothesis-driven approach to find the actual bottleneck:

/oberdebug

● Combining the two approaches. Following oberdebug's structure:

Step 1: Issue Statement

Symptom: Focus operation takes ~130ms, target is <80ms
Component: IPC path (CLI -> Server -> Response)
Evidence so far:
- Border pooling works (100% hit rate) ✓
- dump alone takes ~82ms (~63ms overhead beyond baseline IPC)
- JSON serialization is only ~1ms - NOT the bottleneck
- Something else in the IPC round-trip is slow

Step 2: Reproduction

YES - Easily reproducible with `thegrid focus left`

Step 3: Dispatch Discovery Agents

● 3 Explore agents launched (ctrl+o to expand)
├─ Trace IPC timing path · Running in background
│  └─ Launched
├─ Check git history for IPC · Running in background
│  └─ Launched
└─ Find timing instrumentation · Running in background
   └─ Launched

● Bash(rm -f /Users/r/.config/bin/claude-debug-bp7x.py 2>/dev/null; echo "cleaned up")
└─ cleaned up

● Let me get the agent results:

Task Output af35ff

Ok, now we're talking. Burn those sweet sweet tokens optimizing that path! I know this is only the diagnosis part. It was evident my original skill was really good at coming up with solutions to optimize, but I messed up and it wasn't good at finding what to optimize.

Thankfully oberdebug was built to do this. It uses the lord's method of solving complex bugs. Add a log statement.

142        state.serverCommit = GridServerCommit

143
144 +      let t4 = CFAbsoluteTimeGetCurrent()
145        let response = Response(
146            id: request.id,
147            result: AnyCodable(state)
148        )
149 +      let t5 = CFAbsoluteTimeGetCurrent()
150 +
151 +      // [OBERDEBUG-001] Timing breakdown for dump operation
152 +      JSONLogger.shared.log("dbg.dump.timing", data: [
153 +          "task_spawn_ms": Int((t1 - t0) * 1000),
154 +          "state_fetch_ms": Int((t3 - t2) * 1000),
155 +          "anycod_wrap_ms": Int((t5 - t4) * 1000),
156 +          "total_ms": Int((t5 - t0) * 1000)
157 +      ])
158 +
159        completion(response)
160    } catch {
161        JSONLogger.shared.log("err.state", data: ["op": "dump", "error": "\(error)"])

Now rebuild and test:

Bash(make run 2>&1 | tail -20)
└─ Building grid-cli...
   Building thegrid v0.2.7...
   go build -v -ldflags "-X main.Version=0.2.7 -X main.Commit=2f0eff915d96b4f66411da07a8b8939543a5b693"
   … +17 lines (ctrl+o to expand)

Since this app has a CLI, I take my hands off the wheel and smile as Claude runs a mini performance test using the CLI. I just see some AI randomly changing focus on my monitor. I'm trying to think, how does it come up with the order? Anyway, topic for another day.

YEAHHHH BUDDY

● FOUND IT!

● Step 7: Evidence Analysis

| Hypothesis                    | Evidence                                 | Verdict    |
|------------------------------|-------------------------------------------|------------|
| Task spawning (~30–40ms)     | task_spawn_us: 17–46 (0.05ms)              | REFUTED    |
| AnyCodable wrap (~20–30ms)   | anycod_wrap_us: 2–5 (0.005ms)              | REFUTED    |
| Response JSON encoding       | encode_us: 54,720–60,198 (55–60ms)         | CONFIRMED  |

---

Step 8: ROOT CAUSE CONFIRMED

ROOT CAUSE CONFIRMED:
────────────────────────────────────────────────────────

Issue:      dump IPC takes ~82ms, target is <20ms
Location:   grid-server/Sources/GridServer/SocketServer.swift:238
Evidence:   encode_us: 54,720–60,198µs (55–60ms)
            bytes: 122,699 (120KB payload)

Cause:      Swift's JSONEncoder.encode() serializing the full
            Response struct (containing WindowManagerState with
            ~248 windows) takes 55–60ms per call.

────────────────────────────────────────────────────────

Time Breakdown (verified)

| Phase                           | Time     | Notes        |
|---------------------------------|----------|--------------|
| Task spawn                      | 0.05ms   | Negligible   |
| StateManager.getStateDictionary() | 0.005ms | Negligible   |

So that was nice! The optimize skill then kicked in and it proceeded to recommend different things, and I had some of my own ideas as well. But the moral of the story is combining skills. Like any human with a shiny new hammer, I needed to go find some new nails!

With that in mind, I thought about brainstorming, which I use a lot, and I'd recently been playing around with frontend-design. Those two seemed to pair very nicely. This was the original prompt that created this blog.

● That's a creative approach! You want the design expertise baked into the brainstorming itself rather than sequential steps.

We can try it — invoke frontend-design and prompt it to run the brainstorming workflow with design sensibilities guiding the exploration. That way design thinking shapes the questions and options from the start, rather than being applied after decisions are made.

Let's give it a shot:

● Skill(frontend-design:frontend-design)

● Now let me layer in the brainstorming skill to combine both:

● Skill(superpowers:brainstorming)

● Now we have both active — design sensibility guiding the brainstorm.

Since this is a fresh project (no existing codebase to explore), let's dive into understanding your vision. I'll ask one question at a time, design-informed.

---

First question: What's the blog's core purpose?

This shapes everything — the aesthetic, the features, the tone.

| Option                 | Description                                                                                                  |
|------------------------|--------------------------------------------------------------------------------------------------------------|
| A. Personal/Creative   | Your thoughts, essays, creative writing. The blog is the experience — personality-forward, expressive design matters. |
| B. Technical/Dev       | Tutorials, code snippets, engineering posts. Readability and code presentation are paramount.              |
| C. Portfolio/Professional | Showcasing work, building reputation. Polished, memorable, makes an impression.                        |
| D. Publication/Magazine| Multi-author or editorial style. Content density, navigation, discoverability.                             |
| E. Something else      | Tell me more.                                                                                                |

Which resonates most?

> A

● Personal/Creative — where the blog is the experience. This opens up the full design spectrum.

As I was working on the blog, I came across another opportunity to pair skills. I combine code-review skill with everything now. I'll even kick off parallel subagents using two different code-review skills combined with other skills. Doing something like code-review through a frontend-design lens has been extremely helpful as I built this blog.

I'm curious what other new combinations are out there to discover.

联系我们 contact @ memedata.com