你需要子类型化

你需要子类型化
You Need Subtyping

原始链接: https://blog.polybdenum.com/2025/03/26/why-you-need-subtyping.html

子类型化，即类型 X 可在预期类型 Y 的任何地方使用，比面向对象继承更基础，对现代编程语言至关重要。即使是低级语言也能从子类型化中受益，尽管存在内存布局优化方面的顾虑。随着类型系统不断发展以捕获更多错误，它们会融入超越内存布局的特性，自然而然地产生子类型关系。 Kotlin 中的空值检查就是一个例子：`String` 是 `String?`（可空字符串）的子类型，尽管它们具有相同的内存布局。类似地，像别名分析的借用检查这样的高级静态分析技术严重依赖于子类型化，在编译时将指针与权限关联起来。这些权限对运行时没有影响，但对错误检测至关重要。在语言设计中忽略子类型化会导致粗糙的边缘，并阻碍复杂类型系统的有效性。未来的语言必须优先考虑子类型化，以充分发挥静态分析和编译时错误检测的潜力。

Hacker News上的一篇文章链接到一篇题为“你需要子类型”（polybdenum.com）的文章。这篇文章已经有一些评论了。一个用户不同意这篇文章的观点，并提到了TypeScript子类型中协变性的问题。另一个用户要求解释文章中“boxed”（装箱）一词的含义。第三个用户回应说，“boxed”通常指堆分配，但在文章中的相关性尚不明确，因为文章似乎将装箱记录等同于动态/结构化类型，但这并不一定正确。最后，还有一个AI创业学校的广告。

解析，不验证 (2019) 2024-07-23

（评论） 2025-03-22

（评论） 2024-07-23

（评论） 2024-05-10

（评论） 2024-07-22

原文

Ever since Stephen Dolan’s 2016 thesis Algebraic Subtyping showed how to combine type inference and subtyping, I’ve been developing increasingly sophisticated programming languages based on those ideas, first IntercalScript in 2019, then CubiML in 2020, PolySubML in 2025, and with my next language already in the planning stages.

I’ve long thought that subtyping is the next great evolution in programming language design and that it is a critical feature for new programming languages. However, existing programming languages have little or no subtyping, and there is a general lack of awareness about subtyping in the programming community. Therefore, in this post, I will explain what subtyping is and why it is so important.

What is subtyping?

If you remember the fad for Object Oriented Programming, “subtyping” might call to mind classes and inheritance hierarchies. However, subtyping is a far more basic and general notion than that.

We say that type X is a subtype of type Y in a language if a value of type X can be used wherever type Y is expected. Notice that this has nothing to do with classes or inheritance. In a trivial sense, subtyping is present in every language because types are always subtypes of themselves under this definition. However, the important question in language design is to what extent does the compiler allow for non-trivial subtyping relationships.

What might subtyping look like? First, let’s consider a high level language with boxed records, like Javascript or Python:

// r: {x: int, y: int}
function foo(r) {
    console.log(r.x + r.y);
}

foo({x: 42, y: 21, z:18});

foo only reads the x and y properties of its argument, but you can pass in any value that has those properties, even if it also has other properties that foo does not look at. Javascript does not actually have type annotations, but if it did, it might look something like “foo requires a value of type {x: int, y: int}, while the argument passed has type {x: int, y: int, z: int}”. In this case, {x: int, y: int, z: int} is a subtype of {x: int, y: int}

Yes, you need subtyping even for low level languages

The usual reaction to an example like this is something like “Well, that might be ok for high level languages where all values are boxed anyway, but if you care about performance, you need to compile things with fixed type-dependent memory representations, so you can’t have subtyping relationships between records of different widths like that. Therefore, there’s no need for subtyping for most languages.”

However, this reaction is misguided. While the specific example in the previous section is something you’d only see in unoptimized languages, even optimized languages still need subtyping. They’ll have fewer subtyping relationships, but not zero subtyping.

Memory layout optimization means you can’t have any subtyping relationships between types with different memory layouts. If your type system is sparse enough that there is only one type for each memory layout, then it is true that you won’t have any nontrivial subtyping relationships. However, such a sparse type system is not very useful.

As computers have grown faster, there’s been a strong trend towards leveraging them to help programmers develop software faster by catching bugs automatically at compile time. That means that type systems naturally evolve to capture more and more properties of the code worth checking, including properties that don’t map 1:1 to memory layouts. And that in turn means you have subtyping!

Null checking

Consider the problem of null checking, Tony Hoare’s infamous Billion Dollar Mistake. Null pointer errors are such a common problem in programming that most modern languages have some way of checking nullability in the type system to statically prevent mistakes.

For example, in Kotlin, String? is the type of nullable strings, which could be a normal String value or null. Meanwhile, String is the type of non-nullable strings, which are statically guaranteed to not be null:

val not_null: String = "Hello";
val maybe_null: String? = "World";

fun takes_nonnull(x: String) {}
fun takes_nullable(x: String?) {}

// Can pass String to String
takes_nonnull(not_null);
// Can pass String? to String?
takes_nullable(maybe_null);
// Can *also* pass String to String?
takes_nullable(not_null);

Key to the usability of this system is the fact that you can freely pass non-nullable values even where nullable values are expected. If a function accepts a value that can be String or null, it is ok to pass it a value that is guaranteed to be a String and not null. Or in other words, String is a subtype of String?!

String and String? are two different types that have the same memory layout, so even with type-specific memory layouts, we still have a non-trivial subtyping relationship. In order to statically check properties like nullability that don’t affect memory layout, the compiler needs to support subtyping.

Alias checking

In the case of null checking, your lattice of types is fairly trivial - you just have String and String?. Every type is either nullable or not. However, that’s merely scratching the surface of the kinds of analysis that type checkers are and will be called on to perform.

For example, the next great frontier in static type checking is alias analysis. Alias related issues are a common source of bugs and in order to statically prevent them, you need a borrow checker. This associates each pointer with permissions, each of which can be valid for a specific lifetime.

Every pointer has the same runtime representation (they’re all just pointers in memory). However, the static types are completely different. The permissions of your pointers only exist at compile time and have no effect on the runtime behavior of the code, but they are still critical to catching aliasing bugs during compilation.

As with null checking, the “extra” information naturally gives rise to subtyping relationships. If you have a pointer with more permissions, it should still be usable in a place that only requires a pointer with fewer permissions. If you have a conditional value derived from two possible values (e.g. if foo {p} else {q}), then the new pointer should have the permissions that both of the possible source pointers share. And so on.

Unlike with null checking, where you just have one bit of “extra” information associated with each type (can the values be null or not?), with borrow checking, you have an infinite amount of possible information that can be associated with each type. This means that it is more important than ever to have a type system and compiler that are designed from the ground up to work well with subtyping.

Conclusion

As the null checking example shows, nearly every language has some sort of subtyping. However, type systems and compilers are often not designed with subtyping in mind, leading to lots of preventable rough edges. The more sophisticated the type system, the bigger the cost of not taking subtyping seriously in the design. In order to design the programming languages of tomorrow, Algebraic Subtyping-inspired designs will become even more important.

See here for a basic tutorial on how to implement subtyping with type inference.