新型神经网络更具可解释性
A new type of neural network is more interpretable

原始链接: https://spectrum.ieee.org/kan-neural-network

一组研究人员开发了一种新型神经网络,称为“柯尔莫哥洛夫-阿诺德网络”(KAN)。 这种新架构以两位数学家的名字命名,旨在通过在数据表示过程中提供更大的灵活性、需要更少的参数来改进现有方法。 测试结果表明,KAN 在准确性和效率方面显着优于传统的多层感知器(MLP)。 KAN 与称为 MLP 的传统神经网络的不同之处在于其突触方法。 突触或神经网络中各个节点之间的连接在 KAN 中具有多种用途:它们了解每种关系的全部性质 - 将输入映射到输出的函数,而不仅仅是确定连接的权重。 此外,KAN 中的神经元得到了简化; 他们将所有先前突触的输出值相加,而无需应用额外的计算过程。 与 MLP 相比,测试 KAN 功能的科学应用程序展示了卓越的性能。 例如,在处理涉及碰撞粒子速度的相对论问题时,在参数数量相同的情况下,KAN 的准确度比 MLP 高 100 倍。 在拓扑结行为的模拟中也观察到了类似的改进。 KAN 需要更少的参数来实现可比较的结果,同时仍然通过可视化激活函数提供对变量之间关系的洞察。 尽管 KAN 当前的局限性包括由于图形处理单元 (GPU) 未优化而导致计算要求增加和效率低下,但研究人员认为 KAN 可能特别有利于解决处理小规模数据集的科学家所面临的各种挑战。 随着进一步的进步,专家们预见到从材料科学到量子力学等各个领域的重要应用,可能会在高温超导和受控核聚变等领域取得突破性发现。

相关文章

原文

Artificial neural networks—algorithms inspired by biological brains—are at the center of modern artificial intelligence, behind both chatbots and image generators. But with their many neurons, they can be black boxes, their inner workings uninterpretable to users.

Researchers have now created a fundamentally new way to make neural networks that in some ways surpasses traditional systems. These new networks are more interpretable and also more accurate, proponents say, even when they’re smaller. Their developers say the way they learn to represent physics data concisely could help scientists uncover new laws of nature.

“It’s great to see that there is a new architecture on the table.” —Brice Ménard, Johns Hopkins University

For the past decade or more, engineers have mostly tweaked neural-network designs through trial and error, says Brice Ménard, a physicist at Johns Hopkins University who studies how neural networks operate but was not involved in the new work, which was posted on arXiv in April. “It’s great to see that there is a new architecture on the table,” he says, especially one designed from first principles.

One way to think of neural networks is by analogy with neurons, or nodes, and synapses, or connections between those nodes. In traditional neural networks, called multi-layer perceptrons (MLPs), each synapse learns a weight—a number that determines how strong the connection is between those two neurons. The neurons are arranged in layers, such that a neuron from one layer takes input signals from the neurons in the previous layer, weighted by the strength of their synaptic connection. Each neuron then applies a simple function to the sum total of its inputs, called an activation function.

black text on a white background with red and blue lines connecting on the left and black lines connecting on the right In traditional neural networks, sometimes called multi-layer perceptrons [left], each synapse learns a number called a weight, and each neuron applies a simple function to the sum of its inputs. In the new Kolmogorov-Arnold architecture [right], each synapse learns a function, and the neurons sum the outputs of those functions.The NSF Institute for Artificial Intelligence and Fundamental Interactions

In the new architecture, the synapses play a more complex role. Instead of simply learning how strong the connection between two neurons is, they learn the full nature of that connection—the function that maps input to output. Unlike the activation function used by neurons in the traditional architecture, this function could be more complex—in fact a “spline” or combination of several functions—and is different in each instance. Neurons, on the other hand, become simpler—they just sum the outputs of all their preceding synapses. The new networks are called Kolmogorov-Arnold Networks (KANs), after two mathematicians who studied how functions could be combined. The idea is that KANs would provide greater flexibility when learning to represent data, while using fewer learned parameters.

“It’s like an alien life that looks at things from a different perspective but is also kind of understandable to humans.” —Ziming Liu, Massachusetts Institute of Technology

The researchers tested their KANs on relatively simple scientific tasks. In some experiments, they took simple physical laws, such as the velocity with which two relativistic-speed objects pass each other. They used these equations to generate input-output data points, then, for each physics function, trained a network on some of the data and tested it on the rest. They found that increasing the size of KANs improves their performance at a faster rate than increasing the size of MLPs did. When solving partial differential equations, a KAN was 100 times as accurate as an MLP that had 100 times as many parameters.

In another experiment, they trained networks to predict one attribute of topological knots, called their signature, based on other attributes of the knots. An MLP achieved 78 percent test accuracy using about 300,000 parameters, while a KAN achieved 81.6 percent test accuracy using only about 200 parameters.

What’s more, the researchers could visually map out the KANs and look at the shapes of the activation functions, as well as the importance of each connection. Either manually or automatically they could prune weak connections and replace some activation functions with simpler ones, like sine or exponential functions. Then they could summarize the entire KAN in an intuitive one-line function (including all the component activation functions), in some cases perfectly reconstructing the physics function that created the dataset.

“In the future, we hope that it can be a useful tool for everyday scientific research,” says Ziming Liu, a computer scientist at the Massachusetts Institute of Technology and the paper’s first author. “Given a dataset we don’t know how to interpret, we just throw it to a KAN, and it can generate some hypothesis for you. You just stare at the brain [the KAN diagram] and you can even perform surgery on that if you want.” You might get a tidy function. “It’s like an alien life that looks at things from a different perspective but is also kind of understandable to humans.”

Dozens of papers have already cited the KAN preprint. “It seemed very exciting the moment that I saw it,” says Alexander Bodner, an undergraduate student of computer science at the University of San Andrés, in Argentina. Within a week, he and three classmates had combined KANs with convolutional neural networks, or CNNs, a popular architecture for processing images. They tested their Convolutional KANs on their ability to categorize handwritten digits or pieces of clothing. The best one approximately matched the performance of a traditional CNN (99 percent accuracy for both networks on digits, 90 percent for both on clothing) but using about 60 percent fewer parameters. The datasets were simple, but Bodner says other teams with more computing power have begun scaling up the networks. Other people are combining KANs with transformers, an architecture popular in large language models.

One downside of KANs is that they take longer per parameter to train—in part because they can’t take advantage of GPUs. But they need fewer parameters. Liu notes that even if KANs don’t replace giant CNNs and transformers for processing images and language, training time won’t be an issue at the smaller scale of many physics problems. He’s looking at ways for experts to insert their prior knowledge into KANs—by manually choosing activation functions, say—and to easily extract knowledge from them using a simple interface. Someday, he says, KANs could help physicists discover high-temperature superconductors or ways to control nuclear fusion.

From Your Site Articles

Related Articles Around the Web

联系我们 contact @ memedata.com