显示 HN:一种小型编程语言,其中一切都是值。
Show HN: A small programming language where everything is pass-by-value

原始链接: https://github.com/Jcparkyn/herd

## Herd:一种面向值的编程语言 Herd 是一种动态类型、解释型编程语言,旨在实现简洁性和探索值语义。Herd 中的一切都是值——包括列表和字典——并且按值传递,保证了将数据传递给函数时不会产生意外的副作用。修改会创建副本,确保数据不可变,除非显式声明为可变,使用 `var`。 Herd 利用引用计数进行内存管理,通过消除引用循环(由于其基于值的特性,引用循环是不可能的)来避免垃圾回收的复杂性。它与 Swift 的“可变值语义”和 Matlab/R 中的写时复制方法共享概念,但旨在实现更高的效率和简洁性。 主要特性包括:使用 `\` 的简洁函数语法,使用 `!` 的解构,用于链式的管道操作符 `|`,以及由于写时复制机制而具有线程安全数据的内置多线程支持。Herd 采用单遍 JIT 编译器与 Cranelift,提供令人惊讶的良好性能,并利用 NaN 包装来高效地表示原始类型。虽然目前缺乏用户定义的类型,但未来的开发可以结合类似 Julia 的结构体和多重分派。 Herd 默认优先考虑不可变性,提供 `set` 用于修改,`=` 用于定义不可变变量,反映了定义更多不可变变量而非可变变量的常见做法。

## Herd:一种按值传递的编程语言 jcparkyn 最近分享了“Herd”,一种用 Rust 构建、使用 Cranelift (JIT 编译) 和 LALRPOP (解析器) 的业余编程语言,灵感来自 *Crafting Interpreters*,但已显著发散。其核心原则是:**一切都是按值传递**。这并非传统的复制;它利用引用计数和写时复制来实现数据竞争免疫,无需锁或通道——这对于并发编程来说可能是一个强大的特性。 讨论强调了这种方法的优点,提供了一种介于不变性和可变命令式语法之间的折衷方案。虽然写时复制可能会引入性能开销,但创建者已经对合成工作负载进行了基准测试,并承认潜在的优化,包括探索持久数据结构。 该语言有意缺乏显式指针,专注于值语义。虽然一些评论者建议添加可变引用或追踪垃圾回收器等特性,但创建者旨在探索纯粹基于值的系统中的可能性,优先考虑简单性和函数式透明性。Herd 旨在提供可变性的好处,而无需共享状态的复杂性。
相关文章

原文

Herd is a simple interpreted programming language where everything is a value.

Disclaimer: This is a hobby language, and you probably shouldn't use it for anything important.

In Herd, everything is pass-by-value, including lists and dicts. This means that when you pass a list or dict to a function, you can guarantee that the function won't modify your copy.

You can modify variables locally just like you would in an imperative language, but there will never be any side-effects on other copies of the value.

var foo = { a: 1 };
var bar = foo; // make a copy of foo
set bar.a = 2; // modify bar (makes a copy)
println foo.a bar.a; // prints 1 2

All reference types in Herd (e.g. strings, lists, dicts) use reference counting. Whenever you make a copy of a reference type, the reference count is incremented. Whenever you modify a reference type, one of two things happen:

  • If there's only one reference to the value, it can be modified in-place without making any copies. This is because the code doing the modification must be the only reference, so no other code will be able to observe the modification.
  • If there's more than one reference to the value, the language makes a shallow copy of it with the modification applied. The copy now has a reference count of one, so subsequent modifications usually don't need to allocate.

There's one very convenient consequence of everything being a value: Reference cycles are impossible! This means the reference counting system doesn't need cycle detection, and can also be used as a garbage collector.

Comparison to other languages

  • Swift: Uses a lot of similar ideas (under the name "Mutable Value Semantics"), but is statically typed and overall a much more complex language, including many types that aren't values.
  • Matlab / R: These languages also use copy-on-write semantics for arrays, but as far as I'm aware they can't reliably track when reference counts decrease back to one, so they have to make copies more often than herd does. (I also dislike these languages for other reasons).
  • PHP: Uses value semantics for arrays but not objects.
  • Perl: Uses value semantics for array and hashes, but exposes many ways to break the guarantees of value semantics.

Use the println function to print to the console. Strings are defined using single quotes.

x = 1; // define an immutable variable
var y = 2; // define a mutable variable
set y = 3; // modify a mutable variable

Values in herd have the following types:

  • () the unit type, representing no value.
  • bool boolean values: true and false.
  • number 64-bit floating point number, e.g. 42, -7, 2.6.
  • string a text string, e.g. 'hello'.
  • list an ordered collection of values, e.g. [1, 2, 3].
  • dict a collection of key-value pairs, e.g. { x: 1, y: 2 }.
  • function a function value, e.g. \x y\ x + y.
var list = ['a', 'b', 'c'];
set list.[1] = 'z';
println list; // prints ['a', 'z', 'c']
var dict = { x: 1, y: 2 };
set dict.x = 3;
println dict; // prints { x: 3, y: 2 }

You can use shorthand when creating dicts if the key name matches the variable name:

x = 10;
y = 20;
dict = { x, y }; // equivalent to { x: x, y: y }

In herd, blocks are defined using parentheses. If the last expression in the block is not terminated with a semicolon, its value becomes the value of the block.

x = (
  a = 2;
  b = 3;
  a + b
);
println x; // prints 5

All functions in herd are defined using the anonymous function syntax:

// define function taking two parameters
multiply1 = \a b\ (
  return a * b;
);
// call the function
x = multiply1 3 4;
println x; // prints 12

The body of the function is just a block expression, so you can omit the return statement if you want:

multiply2 = \a b\ (a * b);

You can also omit the parentheses for simple expressions:

And there's an even shorter syntax for single-parameter functions:

Functions with no parameters can be defined using \\, and called by passing a unit value ():

getRandomNumber = \\ Math.randomInt 0 100;
println (getRandomNumber ()); // prints a random number

Functions with a variable number of arguments can be defined using .. for the last parameter:

min = \first ..rest\ (
  var minVal = first;
  for x in rest do (
    if x < minVal then (
      set minVal = x;
    )
  )
  minVal
);

println (min 5 3 8); // prints 3

In herd, everything is immutable unless declared with var. This includes lists and complex objects that would be mutable in other languages.

list = ['a', 'b']; // list is immutable, because it wasn't defined with var.
set list.[0] = 'A'; // Not allowed

Each copy of a value is a distinct copy, and modifications to one variable won't modify other variables - even for lists and dicts!

var list1 = ['a', 'b'];
var list2 = list1; // make a copy
set list2.[0] = 'c'; // modify list2
println list1; // prints ['a', 'b']
println list2; // prints ['c', 'b']

This also applies when passing values to functions:

doubleEveryItem = \var list\ (
  for [i, x] in List.enumerate list do (
    set list.[i] = x * 2;
  )
  list
);
list1 = [3, 4];
list2 = doubleEveryItem list1;
println list1; // prints [3, 4]
println list2; // prints [6, 8]

Use ! to destructure lists and dicts:

list = [1, 2];
![a, b] = list; // destructure list
println a; // prints 1
println b; // prints 2

dict = { x: 10, y: 20 };
!{ x, y } = dict; // destructure dict
println x; // prints 10
println y; // prints 20

You can also use pattern matching in switch expressions to handle multiple cases:

x = [1, 2, 3];
y = switch x on {
  [] => 'Empty list',
  [1, ...rest] => 'Begins with 1, then ' ++ (toString rest),
  [a] => 'Single item: ' ++ (toString a),
  _ => 'Something else',
};
println y; // prints 'Begins with 1, then [2, 3]'

You can use var to destructure to a mutable variable, or set to modify an existing variable:

list = [1, 2];
[a, var b] = list; // a is immutable, b is mutable
set b = b + 10;
println [list, a, b]; // prints [[1, 2], 1, 12]

The pipe operator | can be used to chain function calls in a more readable way:

x = [1, 2, 3]
  | List.map \(_ * 2) // double each item
  | List.filter \(_ > 3); // keep only items greater than 3
println x; // prints [4, 6]

You can combine the pipe operator with set to modify variables in-place using |=:

var x = [1, 2];
set x |= List.push 3; // equivalent to set x = x | List.push 3;
println x; // prints [1, 2, 3]

Export code from a file by returning it at the end of the file:

// in file1.herd
x = 42;
double = \n\ n * 2;
return { x, double };

Import code from other files using the import function:

// in file2.herd
!{ x, double } = import 'file1.herd';
println (double x); // prints 84

You can also import all code from a file:

// in file2.herd
File1 = import 'file1.herd';
println (File1.double File1.x); // prints 84

Standard library modules are already imported for you, and can be accessed from the imported modules:

  • System - system functions for getting current time, program args, etc.
  • IO - file input/output functions.
  • List - list utility functions.
  • Dict - dict utility functions.
  • Math - mathematical functions and constants.
  • Bitwise - bitwise operations on integers.
  • Random - random number generation.
  • String - string utility functions.
  • File - file system utility functions.
  • Parallel - multithreading utilities.

Some very commonly used functions are also available globally:

  • not
  • range
  • assert
  • toString
  • print
  • println
  • len

Herd has built-in support for multithreading using the Parallel standard library module.

Use Parallel.parallelMap to map a function over a list in parallel:

list = [1, 2, 3];
// square each item in parallel
squared = Parallel.parallelMap list \(_ * _);
println squared; // prints [1, 4, 9]

Use Parallel.parallelRun to run multiple functions in parallel and wait for all of them to complete:

getOrder = \id\ (...); // TODO
getCatalog = \\ (...); // TODO
orderId = 123;
// Run both functions in parallel
![order, catalog] = Parallel.parallelRun [
  \\ getOrder orderId,
  \\ getCatalog (),
];

In herd, it is impossible to create a data race, because any mutations only affect the current thread. You can safely pass complex data structures between threads without worrying about synchronization. However (as with the rest of herd), each function will have its own copy of the data, so the only way to communicate between threads is via the return value.

Dynamic typing I'm not generally a fan of dynamic typing, but:

  • I wanted to learn how to build an interpreter.
  • I wanted the language to be as simple as possible.
  • I don't think this concept has ever been explored in a dynamic language before (excluding Matlab and R, which have large caveats). Swift is the closest equivalent, but it's statically typed and much more complex.

User-defined types Herd currently has a very simple type system, with no user-defined types. If I wanted to turn this into a production-ready language, I'd probably add Julia-esque structs and multiple dispatch (following the same immutability guarantees as the rest of the language).

Semicolons This is mostly just because I'm too lazy to write a whitespace-sensitive parser.

var, set, and = I took a different approach with the syntax here than most other languages I've seen, in particular that the default for = is to define immutable variables, and mutating them requires a dedicated set keyword.

  1. In most programs, the majority of variables can be immutable.
  2. Programs define immutable variables more often than they mutate variables.

So I made immutable variable definition the "default", and gave keywords to the other two operations.

Function syntax Just trying out something different which is a bit more concise than most other languages. This is sort of a hybrid between Rust's closure syntax (but with \ instead of |) and ML-style function calls (f x y) which require a bit less punctuation.

Currying (or lack thereof) Currying has a lot of nice properties for language implementation and reasoning, but personally I don't think it's a good fit for herd and would make the language less approachable.

  • With currying, we can't produce a good error message when the user calls a function with the wrong number of arguments.
  • Currying is incompatible with varargs functions (which I admittedly haven't implemented yet).
  • It's still possible to explicitly partially apply a function using the \(func x _) syntax.

Herd uses a very naive single-pass JIT compiler to convert code to machine code at runtime, with cranelift for generating the final optimized machine code. This results in surprisingly good performance - not in the same league as modern JS runtimes, but competitive or faster than many interpreted languages (e.g. CPython).

Values in Herd are represented using NaN-boxing, so primitive types (number, bool, unit) can be stored without any heap allocation. I chose this over tagged pointers, because it makes it much easier to get good numerical performance (at least for 64-bit floats) without complex inlining logic.

The current biggest performance gaps in herd are:

  1. Atomic reference counting overhead, particurly for array/list mutations in hot loops. A lot of this could be removed by lifting the atomic operations out of loops, so the hot code can safely mutate its owned copy.
  2. The single-pass JIT compiler adds some startup overhead by compiling all imported code, especially as the standard library and user codebase get larger. Cranelift is still pretty fast though, so this isn't a major problem.
  3. The single-pass JIT also limits the optimizations that we can do, particularly around type specialization. A more advanced tracing JIT could get a bit of extra performance, but with a big complexity cost.

Here are some benchmark numbers on an i5-13600KF, comparing herd to CPython 3.11 and JavaScript (Node.js 18.6) on a selection of scripts (see ./benchmarks for the full code):

Benchmark herd js python
binarytrees 1285.1ms 789.1ms (-38.6%) 1992.4ms (55.0%)
binarytrees-m 211.1ms
helloworld 25.9ms 53.4ms (106.1%) 26.8ms (3.6%)
iterables 630.6ms 1360.9ms (115.8%)
mandelbrot 485.6ms 146.1ms (-69.9%) 3109.8ms (540.3%)
nbody 2998.1ms 86.0ms (-97.1%) 1898.2ms (-36.7%)
联系我们 contact @ memedata.com