你能构建的最小大脑：Python 中的感知器

你能构建的最小大脑：Python 中的感知器
The Smallest Brain You Can Build: A Perceptron in Python

原始链接: https://ranpara.net/posts/perceptron-explained-from-scratch/

感知机是现代神经网络的基本构建单元。它受到生物神经元的启发，通过一个简单的数学公式处理输入，从而做出二元决策：`若 (权重 * 输入 + 偏置) > 0，则输出 1，否则输出 0`。 “权重”决定了输入的重要性，而“偏置”则起到阈值的作用，允许机器调整决策边界以适配数据。在训练过程中，感知机通过反复遍历数据（轮次）进行学习，并在出错时调整权重和偏置。“学习率”控制着每次修正时这些数值改变的幅度。一个关键的最佳实践是“归一化”，即将输入缩放到统一的范围。这可以防止较大的数值主导其他输入，并确保学习过程平稳且高效。虽然单个感知机只能画出一条直线来分割数据，但它却是复杂人工智能的重要基石。当这些简单的单元堆叠成层时，它们就会演变成能够解决复杂非线性问题的精细神经网络。本质上，每一个先进的 AI 模型都建立在这一相同的基本机制之上：权重、偏置和学习循环。

Hacker News | 新闻 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交 | 登录你能构建的最小大脑：Python 感知机 (ranpara.net) 10 分 | DevarshRanpara 发布于 31 分钟前 | 隐藏 | 过往 | 收藏 | 1 条评论 | 帮助 esafak 6 分钟前 | 下一条 [–] 如果你想学习机器学习的基础知识，我推荐你读一本书，例如 Chris Bishop 写的《深度学习：基础与概念》(Deep Learning: Foundations and Concepts)。如果你坚持在线学习，一个选择是 https://course.fast.ai/。如果你不懂机器学习，我不认为你能通过即兴演示学到多少东西。回复准则 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式搜索：

原文

A perceptron is the smallest brain you can build. One number goes in. One yes-or-no answer comes out. That is the whole thing.

It sounds too simple to matter. But this tiny idea is the seed of every neural network running today. In this post we build a perceptron from scratch in Python, and we watch it learn, live, in your browser. No heavy math. No big libraries. Just a weight, a bias, and a loop.

I am not a native English speaker, and I am still learning this field myself. So I will explain it the way I needed someone to explain it to me. Slowly, and from the ground up.

What is a perceptron?

In 1958, a researcher named Frank Rosenblatt built a machine he called the perceptron.

It was inspired by a single brain cell, a neuron. A neuron takes in signals, and if those signals are strong enough, it fires. Rosenblatt copied that idea in math:

output = 1   if (w · x + b) > 0
         0   otherwise

Here x is the input, w is the weight, and b is the bias. Do not worry about those words yet. We will meet each of them by building something real.

Think like a human first

Before a machine decides anything, let us watch a human decide. Meet John Doe. He has a job offer, and he must answer one question: should he take it?

John does not flip a coin. He weighs things. Some factors matter to him more than others.

Factor (input)	Value	How much John cares (weight)
Extra pay	high	a lot
Stays in the same city	no, he must move	a lot

John multiplies each factor by how much he cares about it, then adds everything up. If the total is high enough, he says yes. If not, he says no.

That is a perceptron. The factors are the inputs. How much he cares is the weight. And “high enough” is a threshold he carries in his head. Hold on to that threshold. Later we will give it a name: the bias.

How John Doe decides: each input is multiplied by a weight, the results are summed with a bias, and the total becomes one yes-or-no answer.

The simplest possible decision: is this number positive?

Let us shrink the problem until almost nothing is left. One input. One question.

Is this number positive?

That is it. Feed the machine a number. It should answer True for positive and False for negative.

The machine makes its guess like this:

prediction = (weight * value + bias) > 0

Multiply the input by the weight, add the bias, and check if the result is above zero. If yes, it predicts True. If no, it predicts False. This little formula is the classifier, also called the decision function.

At the start, the weight and bias are just random numbers. So the machine guesses badly. Now comes the only clever part: it learns from its mistakes.

if prediction != result:
    error = result - prediction      # True - False = 1, False - True = -1
    weight += learning_rate * error * value
    bias   += learning_rate * error

When the guess is wrong, we nudge the weight and bias in the right direction. The error tells us which way to nudge. The learning rate decides how big each nudge is. We do this for every example, then repeat the whole pass again. One full pass over the data is called an epoch. Repeating epochs is training.

Here is that exact machine. Press Train and watch it learn. Each green dot is a positive number (True), each red dot is negative (False), and the blue dashed line is where it has decided to split them.

epoch 0 weight 0 bias 0 boundary – accuracy 0%

It snaps into place almost immediately. Look at the readout: the boundary lands right around 0, and the bias settles near 0 too.

That is not an accident. For this problem, we never needed the bias at all. Which is strange, because bias is supposed to be important. To see why it matters, we need a harder question.

What is a decision boundary?

That blue line has a name: the decision boundary. It is the exact point where the machine flips from saying False to saying True.

We can compute it. The boundary sits where w · x + b = 0. Solve for x:

decision_boundary = -bias / weight

For “is this number positive,” the boundary should be at 0. And it is. Now watch what happens when the right answer is not at zero.

Why do we need bias? The student-pass example

New problem. Same machine. We give it exam scores from 0 to 100, and we ask:

Did the student pass?

The rule is simple: a score of 50 or higher passes. So the decision boundary should sit at 50, not at 0.

Let us try to solve it the way we solved the last one, using the weight only. In the demo below, turn off “Use bias” and press Train.

epoch 0 weight 0 bias 0 boundary – accuracy 0%

Use bias

Watch the accuracy. It climbs to around 50 percent and then gets stuck. It cannot do better, no matter how long you train it.

Here is why. With no bias, the formula is just weight * score. Every exam score is a positive number. So if the weight is positive, the machine calls every student a pass. If the weight is negative, it fails everyone. The boundary is glued to 0, and it cannot move. A line forced through zero simply cannot separate “below 50” from “50 and up.”

Now turn “Use bias” back on and press Train again. The accuracy climbs all the way to 100 percent, and the boundary slides over and parks near 50.

That is the whole job of the bias. The weight sets the steepness. The bias moves the boundary left or right so it can sit wherever the answer actually is. Remember decision_boundary = -bias / weight. With a bias, the boundary can be anything. Without one, it is stuck at zero forever.

The one sentence to remember: when your inputs sit far from zero, you need a bias to move the line to them.

How does a perceptron learn? Epochs and learning rate

You saw two dials while training: epochs and learning rate.

An epoch is one full pass over all the data. The machine rarely gets everything right in a single pass, so we go again, and again. More epochs means more chances to fix mistakes. That is why accuracy climbs as you keep training.

The learning rate is the size of each correction. In the code it is the learning_rate multiplier:

weight += learning_rate * error * value

Small steps are careful but slow. Big steps are fast but can overshoot and bounce around. Choosing it well is part of the craft. Here we used 0.1, which is gentle enough to stay stable.

Why do we normalize data?

There is a quiet problem hiding in the pass example. Look at that update line again:

weight += learning_rate * error * value

The correction is multiplied by value. For exam scores, value can be as large as 100. So a single wrong guess can throw the weight by a huge amount. The machine still learns, but it lurches around instead of settling smoothly.

The fix is normalization: shrink the inputs to a small, tidy range before training. The simplest version is to divide every score by the largest possible score, so 0 to 100 becomes 0 to 1.

In the demo below, first press Train with normalization off and watch the accuracy line jump around on its way up. Then turn “Normalize data” on, reset, and train again. Same machine, same answer, but it gets there in a fraction of the epochs, and the climb is smooth.

epoch 0 weight 0 bias 0 boundary – accuracy 0%

Normalize data

One honest note. With a single input like this, normalization mostly buys you speed and calm. It becomes essential when your inputs live on very different scales. Think back to John Doe: his pay was measured in thousands of dollars, but “same city” was just a 0 or a 1. Without normalization, the dollars would drown out everything else, and the machine would basically ignore the city. Putting both on the same scale lets each factor get a fair say. (Dividing by the max is the easy version; a common general method is to subtract the mean and divide by the spread, called standardization.)

The full perceptron in Python

Here is the complete program for “is this number positive,” with nothing hidden. It is short enough to read in one sitting.

import random

learning_rate = 0.1
EPOCHS = 100

weight = random.uniform(-1, 1)
bias   = random.uniform(-1, 1)

# positive numbers are True, negative numbers are False
data  = [(i * 0.1, True)  for i in range(1, 501)]
data += [(i * 0.1, False) for i in range(-500, 0)]
random.shuffle(data)

for epoch in range(EPOCHS):
    for value, result in data:
        prediction = (weight * value + bias) > 0
        if prediction != result:
            error = result - prediction          # +1 or -1
            weight += learning_rate * error * value
            bias   += learning_rate * error

decision_boundary = -bias / weight
print(f"weight = {weight:.3f}")
print(f"bias   = {bias:.3f}")
print(f"decision boundary = {decision_boundary:.3f}")

To turn this into the student-pass machine, you change two things: make the data exam scores with result = score >= 50, and, if you want to feel the pain of a missing bias, freeze the bias at 0. Everything else stays the same.

Acknowledgments

The core inspiration for this post came from the fantastic video ChatGPT is made from 100 million of these [The Perceptron] by Welch Labs. If you are a visual learner and want to see the rich history and hardware behind these concepts, I highly recommend watching it!

What’s next?

You just built a working perceptron. It takes an input, weighs it, adds a bias, and decides. It learns from its own mistakes, one epoch at a time.

A single neuron can only draw one straight line. The magic starts when you stack them: the output of one neuron becomes the input of the next. Layer enough of them together and you get a neural network that can learn shapes far more tangled than a single line. But every one of those neurons is doing exactly what you just watched. A weight, a bias, a decision.

If you want the non-technical story of how I ended up writing code in Canada at all, I wrote about it here: The Outsider Who Shipped Anyway.

Thanks for building this with me. Now go change the numbers and break it. That is the fastest way to learn.