Those aren't boids.
I'm calling them noids - neural boids. No hand-written rules. A small neural network takes in what each agent can see and outputs a steering force. 1,922 learned parameters. That's the entire program!
But before I explain how they work, I want to talk about real birds.
How real birds flock

A starling murmuration can have 300,000 birds. No leader. No choreography. No bird knows the shape of the flock. And yet the whole mass moves as one, rippling and folding, evading a falcon in coordinated waves that propagate faster than any single bird can fly.
For decades, people assumed this required some kind of global signal. A lead bird. A pheromone. Something.
In 2010, a team led by Andrea Cavagna at the University of Rome tracked individual starlings in 3D using multiple synchronized cameras. They reconstructed the position and velocity of every bird in flocks of over a thousand. And they found something surprising!
Each bird tracks about 6-7 neighbors. Not the closest by distance, but the closest by rank. Bird number 1 through 7, sorted by proximity. This is called topological distance rather than metric distance, and the distinction matters: it means the interaction doesn't break down when the flock stretches thin. Whether your 7th neighbor is 2 meters away or 20, you still track exactly 7!
That's the whole mechanism. Each bird adjusts its velocity to stay close to its nearest handful of neighbors, match their heading, and avoid collision. Local perception. Local action. The murmuration is what you get when 300,000 agents do this simultaneously.
No one tells the flock to turn. A turn starts when a few birds on the edge react to a predator. Their neighbors notice the change and adjust. Their neighbors' neighbors adjust. The correction propagates through the flock like a compression wave, at speeds 3-4x faster than any individual bird. It's not communication. It's physics. The same way sound moves through air: each molecule only bumps the ones next to it, but the wave crosses the room.
Click anywhere near the flock to startle it. Watch how the reaction spreads. The noids you hit scatter first, and the rest follow a beat later as the disturbance propagates through their neighbor connections.
click anywhere to startle the flock
From birds to boids to noids
In 1986 Craig Reynolds encoded this insight as three rules:
- Separation: steer away from neighbors that are too close
- Alignment: match the average heading of nearby neighbors
- Cohesion: steer toward the average position of nearby neighbors
He called the agents boids. It worked! With three rules and some weight tuning, you get convincing flocking on a screen. Every boid simulation since, in games, films, screensavers, is some variation of Reynolds' three rules.
But Reynolds had to invent the rules. He had to decide that there should be exactly three. He had to pick the mathematical form of each one: inverse distance for separation, averaging for alignment. He had to tune the weights by hand.
What if you didn't have to do any of that? What if you just gave each agent a perception of its neighbors, the same local view a real starling has, and let a neural network learn the steering function from data?
That's a noid! Delete the rules. Keep the perception. Learn the rest.
What a neural network actually does
Every neural network is a function. Input goes in, output comes out. The weights determine which function. Training finds the right weights for your task.
Most examples of this are too complex to see clearly. Language models have billions of parameters. Image classifiers operate in thousand-dimensional space. The mechanics are the same, but the scale hides the intuition.
A noid is a neural network you can actually watch. 24 numbers in, 2 numbers out. The input is what the noid perceives. The output is what it does. The function between them, shaped by 1,922 weights, is its entire behavior.
hover any node to inspect. blue negative, gold positive
What a noid sees
Real starlings track ~7 neighbors by topological rank. A noid tracks 5. Same idea, slightly smaller neighborhood. Each noid encodes:
- Its own velocity:
[vx, vy] - Its heading:
[sin θ, cos θ] - Its 5 nearest neighbors, relative position and velocity:
[Δx, Δy, Δvx, Δvy]× 5
24 numbers. That's the entire universe from a noid's perspective!
gold = focus noid blue = 5 nearest neighbors arrow = network output
The gold noid is the one we're watching. The blue connections show its 5 nearest neighbors, the only agents it knows about. The red arrow is the network's output: a 2D acceleration that steers it through the flock.
No noid knows where it is in absolute space. No noid can count the flock. No noid knows the flock's shape, or that there even is a shape. Just local perception feeding into a function that outputs a push. Same as a starling.
This is the design choice that matters most. Not the network architecture. Not the training algorithm. What you choose to perceive determines what you can learn. Cavagna's team proved that real birds use topological neighbors. That's the foundation everything else builds on.
The network
1,922 parameters total. hover or click a layer.
24 numbers. That's everything a noid knows. Its own velocity (2 floats), its heading (2 floats), and its 5 nearest neighbors as relative position and velocity (4 floats × 5). No absolute position. No flock size. Just local perception.
vxvysincosn1: Δx Δy Δvx Δvyn2: Δx Δy Δvx Δvyn3: Δx Δy Δvx Δvyn4: Δx Δy Δvx Δvyn5: Δx Δy Δvx Δvy
A 24×32 weight matrix plus 32 biases. 800 parameters. Each of the 32 outputs is a different weighted combination of all 24 inputs. The network learns which combinations matter: the angle to the nearest neighbor, the relative speed of the group, the closing distance to a collision. This is where raw perception becomes features.
Without activations, stacking linear layers would be pointless: two matrix multiplies collapse into a single matrix. SiLU (x · sigmoid(x)) breaks that linearity. It suppresses small signals, amplifies large ones, and lets the network carve curved decision boundaries instead of straight lines. This is where a stack of arithmetic becomes a function approximator.
32×32 weight matrix plus 32 biases. 1,056 parameters, over half the network. The first layer asks "what do I see?" This layer asks "what does it mean?" It combines first-layer features into higher-order patterns: not just "neighbor is close" but "neighbor is close and approaching and I'm heading toward it." Another SiLU follows, gating this deeper representation.
The final layer collapses 32 dimensions to 2: an x-acceleration and a y-acceleration. 66 parameters. Then tanh squashes the result to [−1, 1] and ×60 scales it to match the physics. No matter what the network computes internally, steering force is bounded. A noid has limited thrust, same as a real bird.
1,922 parameters total. You could fit a flock's entire social contract in a QR code!
How training works
The network learns by imitating Reynolds' three rules. Generate random flock configurations. For each noid, compute what the classic rules would do. Then adjust the network's weights to match.
This is called imitation learning, also known as behavioral cloning. You have an expert (the hand-tuned rules). You have a student (the neural network). The student watches the expert act and learns to copy it. The same technique is used to train self-driving cars from human demonstrations and to teach robots from teleoperation.
The loss function is simple: how far off was the network's acceleration from the target?
Backpropagation computes how each of the 1,922 weights contributed to the error. Gradient descent nudges them in the right direction. Repeat a few thousand times.
The noids in this post were trained live in your browser when the page loaded. 3,000 gradient steps, done before the first frame rendered. The network is so small that training it takes less time than loading a web font!
From chaos to order
This is the part I can't stop thinking about!
Random weights produce chaos. Every noid accelerating nonsensically. Trained weights produce flocking. But weight space is continuous. You can linearly interpolate between the two sets of weights and watch the transition happen in real time.
At zero, noise. Random accelerations, no coherence. Around 0.3, the first hints of local alignment, pairs of noids briefly synchronizing before the noise pulls them apart. By 0.6, clusters form. By 0.8, it's unmistakably a flock!
This is a phase transition in weight space. 1,922 floats, smoothly varied, and at some critical threshold collective behavior clicks on. The network doesn't have a "flocking" switch. There's no single weight that enables it. The behavior is distributed across all 1,922 parameters, and it emerges from their interaction.
This is how all neural networks work. There's no neuron in GPT that stores the concept of "irony." There's no pixel detector in a vision model. The behavior is distributed across all the weights, emerging from their interaction. Noids make this visible because the system is small enough to watch.
The starling lesson
A starling doesn't know Reynolds' three rules. It doesn't separate, then align, then cohere in three discrete steps. It sees its neighbors and it moves. Whatever algorithm runs in a starling's brain, it wasn't hand-decomposed into named subroutines by a programmer.
That's what a noid captures that a boid doesn't. A boid is three rules bolted together. A noid is a single learned function, perception in, action out, with no explicit decomposition into separation, alignment, and cohesion. Those behaviors emerge from the weights, the same way they emerge from whatever neural circuitry a starling uses.
The architecture mirrors the biology: topological neighbors, local perception, a nonlinear mapping to motor output, no global state. We don't know the exact function a starling computes. But we know the form of it. And a noid is a 1,922-parameter hypothesis about what that function might look like.
Performance
A neural network forward pass is a matrix multiplication. Matrix multiplication is what GPUs do. The entire flock's inference, all 1,024 forward passes, collapses into a single GPU dispatch: [1024 × 24] @ [24 × 32]. One call. Every agent in parallel.
Classic boid rules can't do this. Separation needs per-agent distance comparisons. Alignment averages conditionally. Cohesion divides by neighbor count. The logic branches per agent, which GPUs hate.
On CPU, the cost breakdown looks like this:
| noid | boid | |
|---|---|---|
| steering (per agent) | 527 ns | 13 ns |
| full tick, n=1024 | 1.9 ms | 1.1 ms |
| full tick, n=4096 | 17 ms | 14 ms |
The steering itself is 40x more expensive. But neighbor search (O(n) with spatial hashing) dominates the total, so the gap shrinks to 1.2x at scale. The neural network adds a 20% tax on CPU.
On GPU, that 527 ns per agent becomes a single dispatch for all agents simultaneously. The 1,856 multiply-adds per agent map directly to shader cores. The inference in the demo above runs on your GPU via WebGPU if your browser supports it (check the badge in the corner). On CPU fallback, it still hits 60fps at 1,024 agents thanks to spatial hashing.
This is the real argument for neural flocking. Not that the rules are better (Reynolds' three rules work great). But that the computational form, matrix multiplications on fixed-size tensors, maps to hardware that already exists in every phone, laptop, and GPU. The same hardware that runs language models and image generators can run a flock. The noid forward pass is the same operation as a single attention head, just smaller.
The library
noid is open source. Self-contained Rust crate. No framework, no GPU requirement.
use noid::{Config, Flock};
let mut flock = Flock::new(Config::default(), 42);
loop {
flock.tick(1.0 / 60.0);
for pos in &flock.positions {
// render at pos[0], pos[1]
}
}Training is built in:
use noid::train::Trainer;
let mut trainer = Trainer::new(Config::default(), 42);
for _ in 0..5000 {
trainer.step(0.001);
}
trainer.brain().save_json();Weight interpolation too:
use noid::Brain;
let chaos = Brain::random(42);
let order = Brain::load_json(include_str!("weights.json"));
let halfway = Brain::lerp(&chaos, &order, 0.5);300,000 starlings. 7 neighbors each. No leader.
1,024 noids. 5 neighbors each. 1,922 weights. One GPU dispatch.
Same idea.