神经鸟群

神经鸟群
Neural Boids

## 神经集群：从鸟群到“Noids” 受星形雀群（表现出无中心控制的协调运动的鸟群）的启发，“Noids”是一种被设计用来复制这种行为的神经网络。研究表明，星形雀大约会跟随7个附近的邻居，基于*拓扑*距离（排名，而非物理距离），即使在鸟群拉伸的情况下也能保持凝聚力。传统的“Boids”使用手动编写的规则（分离、对齐、凝聚）来模拟鸟群。然而，“Noids”*学习*鸟群行为。每个Noid网络只有1,922个参数，接收输入，代表它自身的速度以及5个邻居的位置/速度，并输出一个转向力。这种简化设计允许高效计算——利用GPU并行处理进行快速模拟。训练涉及模仿已建立的Boid规则，但由此产生的网络将行为体现为单个学习函数，反映了星形雀大脑的生物学合理性。涌现的鸟群行为展示了复杂行为如何从简单的、局部作用的个体和相对较少数量的学习参数中产生。代码是开源的，突出了可访问和可扩展的鸟群模拟的潜力。

## 神经鸟群：摘要一篇最近的 Hacker News 帖子详细介绍了“神经鸟群”，它使用神经网络实现经典的“鸟群”集群模拟。作者 ecto 旨在探索受椋鸟群启发涌现行为，参考了研究表明椋鸟遵循令人惊讶的简单规则来实现协调运动。然而，这篇帖子引发了对写作风格的强烈批评，许多评论者认为它听起来像是通用的 LLM 生成文本。人们对事实不准确和缺乏个人声音表示担忧。Ecto 承认了反馈并进行了更正。除了写作之外，讨论的重点是该方法的创新性。一些人质疑在原始鸟群规则上训练神经网络的价值，认为强化学习会更有趣。另一些人则争论性能方面以及该实现与现有 GPU 加速鸟群模拟的区别。尽管存在批评，但该项目激发了将类似技术应用于其他模拟的想法，例如流体动力学以及在游戏开发中创建涌现行为。核心概念——使用神经网络基于简单的局部交互来模拟复杂系统——仍然引人入胜。

原文

Those aren't boids.

I'm calling them noids - neural boids. No hand-written rules. A small neural network takes in what each agent can see and outputs a steering force. 1,922 learned parameters. That's the entire program!

But before I explain how they work, I want to talk about real birds.

How real birds flock

Starling murmuration

A starling murmuration can have 300,000 birds. No leader. No choreography. No bird knows the shape of the flock. And yet the whole mass moves as one, rippling and folding, evading a falcon in coordinated waves that propagate faster than any single bird can fly.

For decades, people assumed this required some kind of global signal. A lead bird. A pheromone. Something.

In 2010, a team led by Andrea Cavagna at the University of Rome tracked individual starlings in 3D using multiple synchronized cameras. They reconstructed the position and velocity of every bird in flocks of over a thousand. And they found something surprising!

Each bird tracks about 6-7 neighbors. Not the closest by distance, but the closest by rank. Bird number 1 through 7, sorted by proximity. This is called topological distance rather than metric distance, and the distinction matters: it means the interaction doesn't break down when the flock stretches thin. Whether your 7th neighbor is 2 meters away or 20, you still track exactly 7!

That's the whole mechanism. Each bird adjusts its velocity to stay close to its nearest handful of neighbors, match their heading, and avoid collision. Local perception. Local action. The murmuration is what you get when 300,000 agents do this simultaneously.

No one tells the flock to turn. A turn starts when a few birds on the edge react to a predator. Their neighbors notice the change and adjust. Their neighbors' neighbors adjust. The correction propagates through the flock like a compression wave, at speeds 3-4x faster than any individual bird. It's not communication. It's physics. The same way sound moves through air: each molecule only bumps the ones next to it, but the wave crosses the room.

Click anywhere near the flock to startle it. Watch how the reaction spreads. The noids you hit scatter first, and the rest follow a beat later as the disturbance propagates through their neighbor connections.

click anywhere to startle the flock

From birds to boids to noids

In 1986 Craig Reynolds encoded this insight as three rules:

Separation: steer away from neighbors that are too close
Alignment: match the average heading of nearby neighbors
Cohesion: steer toward the average position of nearby neighbors

He called the agents boids. It worked! With three rules and some weight tuning, you get convincing flocking on a screen. Every boid simulation since, in games, films, screensavers, is some variation of Reynolds' three rules.

But Reynolds had to invent the rules. He had to decide that there should be exactly three. He had to pick the mathematical form of each one: inverse distance for separation, averaging for alignment. He had to tune the weights by hand.

What if you didn't have to do any of that? What if you just gave each agent a perception of its neighbors, the same local view a real starling has, and let a neural network learn the steering function from data?

That's a noid! Delete the rules. Keep the perception. Learn the rest.

What a neural network actually does

Every neural network is a function. Input goes in, output comes out. The weights determine which function. Training finds the right weights for your task.

Most examples of this are too complex to see clearly. Language models have billions of parameters. Image classifiers operate in thousand-dimensional space. The mechanics are the same, but the scale hides the intuition.

A noid is a neural network you can actually watch. 24 numbers in, 2 numbers out. The input is what the noid perceives. The output is what it does. The function between them, shaped by 1,922 weights, is its entire behavior.

hover any node to inspect. blue negative, gold positive

What a noid sees

Real starlings track ~7 neighbors by topological rank. A noid tracks 5. Same idea, slightly smaller neighborhood. Each noid encodes:

Its own velocity: [vx, vy]
Its heading: [sin θ, cos θ]
Its 5 nearest neighbors, relative position and velocity: [Δx, Δy, Δvx, Δvy] × 5

24 numbers. That's the entire universe from a noid's perspective!

gold = focus noid blue = 5 nearest neighbors arrow = network output

The gold noid is the one we're watching. The blue connections show its 5 nearest neighbors, the only agents it knows about. The red arrow is the network's output: a 2D acceleration that steers it through the flock.

No noid knows where it is in absolute space. No noid can count the flock. No noid knows the flock's shape, or that there even is a shape. Just local perception feeding into a function that outputs a push. Same as a starling.

This is the design choice that matters most. Not the network architecture. Not the training algorithm. What you choose to perceive determines what you can learn. Cavagna's team proved that real birds use topological neighbors. That's the foundation everything else builds on.

The network

1,922 parameters total. hover or click a layer.

24 numbers. That's everything a noid knows. Its own velocity (2 floats), its heading (2 floats), and its 5 nearest neighbors as relative position and velocity (4 floats × 5). No absolute position. No flock size. Just local perception.

vxvysincosn1: Δx Δy Δvx Δvyn2: Δx Δy Δvx Δvyn3: Δx Δy Δvx Δvyn4: Δx Δy Δvx Δvyn5: Δx Δy Δvx Δvy

A 24×32 weight matrix plus 32 biases. 800 parameters. Each of the 32 outputs is a different weighted combination of all 24 inputs. The network learns which combinations matter: the angle to the nearest neighbor, the relative speed of the group, the closing distance to a collision. This is where raw perception becomes features.

Without activations, stacking linear layers would be pointless: two matrix multiplies collapse into a single matrix. SiLU (x · sigmoid(x)) breaks that linearity. It suppresses small signals, amplifies large ones, and lets the network carve curved decision boundaries instead of straight lines. This is where a stack of arithmetic becomes a function approximator.

32×32 weight matrix plus 32 biases. 1,056 parameters, over half the network. The first layer asks "what do I see?" This layer asks "what does it mean?" It combines first-layer features into higher-order patterns: not just "neighbor is close" but "neighbor is close and approaching and I'm heading toward it." Another SiLU follows, gating this deeper representation.

The final layer collapses 32 dimensions to 2: an x-acceleration and a y-acceleration. 66 parameters. Then tanh squashes the result to [−1, 1] and ×60 scales it to match the physics. No matter what the network computes internally, steering force is bounded. A noid has limited thrust, same as a real bird.

1,922 parameters total. You could fit a flock's entire social contract in a QR code!

How training works

The network learns by imitating Reynolds' three rules. Generate random flock configurations. For each noid, compute what the classic rules would do. Then adjust the network's weights to match.

This is called imitation learning, also known as behavioral cloning. You have an expert (the hand-tuned rules). You have a student (the neural network). The student watches the expert act and learns to copy it. The same technique is used to train self-driving cars from human demonstrations and to teach robots from teleoperation.

The loss function is simple: how far off was the network's acceleration from the target?

$L = \frac{1}{2}\sum(a_{\text{predicted}} - a_{\text{target}})^2$

Backpropagation computes how each of the 1,922 weights contributed to the error. Gradient descent nudges them in the right direction. Repeat a few thousand times.

The noids in this post were trained live in your browser when the page loaded. 3,000 gradient steps, done before the first frame rendered. The network is so small that training it takes less time than loading a web font!

From chaos to order

This is the part I can't stop thinking about!

Random weights produce chaos. Every noid accelerating nonsensically. Trained weights produce flocking. But weight space is continuous. You can linearly interpolate between the two sets of weights and watch the transition happen in real time.

At zero, noise. Random accelerations, no coherence. Around 0.3, the first hints of local alignment, pairs of noids briefly synchronizing before the noise pulls them apart. By 0.6, clusters form. By 0.8, it's unmistakably a flock!

This is a phase transition in weight space. 1,922 floats, smoothly varied, and at some critical threshold collective behavior clicks on. The network doesn't have a "flocking" switch. There's no single weight that enables it. The behavior is distributed across all 1,922 parameters, and it emerges from their interaction.

This is how all neural networks work. There's no neuron in GPT that stores the concept of "irony." There's no pixel detector in a vision model. The behavior is distributed across all the weights, emerging from their interaction. Noids make this visible because the system is small enough to watch.

The starling lesson

A starling doesn't know Reynolds' three rules. It doesn't separate, then align, then cohere in three discrete steps. It sees its neighbors and it moves. Whatever algorithm runs in a starling's brain, it wasn't hand-decomposed into named subroutines by a programmer.

That's what a noid captures that a boid doesn't. A boid is three rules bolted together. A noid is a single learned function, perception in, action out, with no explicit decomposition into separation, alignment, and cohesion. Those behaviors emerge from the weights, the same way they emerge from whatever neural circuitry a starling uses.

The architecture mirrors the biology: topological neighbors, local perception, a nonlinear mapping to motor output, no global state. We don't know the exact function a starling computes. But we know the form of it. And a noid is a 1,922-parameter hypothesis about what that function might look like.

Performance

A neural network forward pass is a matrix multiplication. Matrix multiplication is what GPUs do. The entire flock's inference, all 1,024 forward passes, collapses into a single GPU dispatch: [1024 × 24] @ [24 × 32]. One call. Every agent in parallel.

Classic boid rules can't do this. Separation needs per-agent distance comparisons. Alignment averages conditionally. Cohesion divides by neighbor count. The logic branches per agent, which GPUs hate.

On CPU, the cost breakdown looks like this:

	noid	boid
steering (per agent)	527 ns	13 ns
full tick, n=1024	1.9 ms	1.1 ms
full tick, n=4096	17 ms	14 ms

The steering itself is 40x more expensive. But neighbor search (O(n) with spatial hashing) dominates the total, so the gap shrinks to 1.2x at scale. The neural network adds a 20% tax on CPU.

On GPU, that 527 ns per agent becomes a single dispatch for all agents simultaneously. The 1,856 multiply-adds per agent map directly to shader cores. The inference in the demo above runs on your GPU via WebGPU if your browser supports it (check the badge in the corner). On CPU fallback, it still hits 60fps at 1,024 agents thanks to spatial hashing.

This is the real argument for neural flocking. Not that the rules are better (Reynolds' three rules work great). But that the computational form, matrix multiplications on fixed-size tensors, maps to hardware that already exists in every phone, laptop, and GPU. The same hardware that runs language models and image generators can run a flock. The noid forward pass is the same operation as a single attention head, just smaller.

The library

noid is open source. Self-contained Rust crate. No framework, no GPU requirement.

use noid::{Config, Flock};

let mut flock = Flock::new(Config::default(), 42);
loop {
    flock.tick(1.0 / 60.0);
    for pos in &flock.positions {
        // render at pos[0], pos[1]
    }
}

Training is built in:

use noid::train::Trainer;

let mut trainer = Trainer::new(Config::default(), 42);
for _ in 0..5000 {
    trainer.step(0.001);
}
trainer.brain().save_json();

Weight interpolation too:

use noid::Brain;

let chaos = Brain::random(42);
let order = Brain::load_json(include_str!("weights.json"));
let halfway = Brain::lerp(&chaos, &order, 0.5);

300,000 starlings. 7 neighbors each. No leader.

1,024 noids. 5 neighbors each. 1,922 weights. One GPU dispatch.

Same idea.

神经鸟群 Neural Boids