字符串处理速度提升了
Strings Just Got Faster

原始链接: https://inside.java/2025/05/01/strings-just-got-faster/

JDK 25 通过在内部 `String.hash` 字段上使用 `@Stable` 注解,显著提升了 `String` 的性能,使得 `String::hashCode` 大部分情况下可以进行常量折叠。这允许 JVM 缓存和重用哈希码,尤其有利于不可变 `Map` 的查找。例如,使用常量字符串键从静态映射中检索 `MethodHandle` 现在性能提升显著(例如,快 8 倍)。 JVM 现在可以对整个查找链进行常量折叠:哈希码计算、映射探测、`MethodHandle` 检索和本地调用解析。此优化有效地缩短了流程,直接调用了本地方法。 存在一种特殊情况:哈希码为零的字符串(例如空字符串)无法从常量折叠中受益,但这正在解决中。虽然 `@Stable` 是 JDK 内部注解,但 JEP 502 旨在为自定义应用程序提供类似优化的用户级构造。您可以下载 JDK 25 来测试这些改进。

这篇 Hacker News 讨论帖关注 Java 字符串类最近的性能改进,重点是新的“稳定值”(Stable Values)JEP。用户们赞赏的是,Java 开发者无需修改代码,只需更新 JRE 就能受益于这些增强。 讨论探讨了这些优化在大规模平台上的意义。一些用户深入探讨了 StableValue、Records 和 Value Objects 之间的区别,强调 StableValue 允许常量的延迟初始化,从而使 JVM 能够对其进行充分优化。 一个关键点是其对使用字符串作为键的不可变映射的影响。更快的字符串哈希计算,结合不可变映射,允许 JVM 基本上“内联”映射查找,从而显著提高性能。 人们也提出了关于哈希冲突和拒绝服务攻击的担忧,因为 Java 的字符串哈希并非随机的。讨论还涉及到 Java 的 HashMap 实现中使用树来减轻冲突性能下降的问题,以及对 Kotlin 和 Scala 的潜在好处。

原文

In JDK 25, we improved the performance of the class String in such a way that the String::hashCode function is mostly constant foldable. For example, if you use Strings as keys in a static unmodifiable Map, you will likely see significant performance improvements.

Here is a relatively advanced example where we maintain an immutable Map of native calls, its keys are the name of the method call and the values are a MethodHandle that can be used to invoke the associated system call:

// Set up an immutable Map of system calls
static final Map<String, MethodHandle> SYSTEM_CALLS = Map.of(
        malloc, linker.downcallHandle(mallocSymbol,),
        free, linker.downcallHandle(freeSymbol),
        ...);



// Allocate a memory region of 16 bytes
long address = SYSTEM_CALLS.get(malloc).invokeExact(16L);

// Free the memory region
SYSTEM_CALLS.get(free).invokeExact(address);

The method linker.downcallHandle(…) takes a symbol and additional parameters to bind a native call to a Java MethodHandle via the Foreign Function & Memory API introduced in JDK 22. This is a relatively slow process and involves spinning bytecode. However, once entered into the Map, the new performance improvements in the String class alone allow constant folding of both the key lookups and the values, thus improving performance by a factor of more than 8x:

--- JDK 24 ---

Benchmark                     Mode  Cnt  Score   Error  Units
StringHashCodeStatic.nonZero  avgt   15  4.632 ± 0.042  ns/op

--- JDK 25 ---

Benchmark                     Mode  Cnt  Score   Error  Units
StringHashCodeStatic.nonZero  avgt   15  0.571 ± 0.012  ns/op

Note : the benchmarks above are not using a malloc() MethodHandle but an int identity function. After all, we are not testing the performance of malloc() but the actual String lookup and MethodHandle performance.

This improvement will benefit any immutable Map<String, V> with Strings as keys and where values (of arbitrary type V) are looked up via constant Strings.

When a String is first created, its hashcode is unknown. On the first call to String::hashCode, the actual hashcode is computed and stored in a private field String.hash. This transformation might sound odd; if String is immutable, how can it mutate its state? The answer is that the mutation cannot be observed from the outside; String would functionally behave the same regardless of whether or not an internal String.hash cache field is used. The only difference is that it becomes faster for subsequent calls.

Now that we know how String::hashCode works, we can unveil the performance changes made (which consists of a single line of code): the internal field String.hash is marked with the JDK-internal @Stable annotation. That’s it!

@Stable tells the virtual machine it can read the field once and, if it is no longer its default value (zero), it can trust the field never change again. Hence, it can constant-fold the String::hashcode operation and replace the call with the known hash. As it turns out, the fields in the immutable Map and the internals of the MethodHandle are also trusted in the same way. This means the virtual machine can constant-fold the entire chain of operations:

  • Computing the hash code of the String “malloc” (which is always -1081483544)
  • Probing the immutable Map (i.e., compute the internal array index which is always the same for the malloc hashcode)
  • Retrieving the associated MethodHandle (which always resides on said computed index)
  • Resolving the actual native call (which is always the native malloc() call)

In effect, this means the native malloc() method call can be invoked directly, which explains the tremendous performance improvements. To put it in other words, the chain of operation is completely short-circuited.

There is an unfortunate corner case that the new improvement does not cover: if the hash code of the String happens to be zero, constant folding will not work. As we learned above, constant folding can only take place for non-default values (i.e., non-zero values for int fields). However, we anticipate we will be able to fix this small impediment in the near future. You might think only one in about 4 billion distinct Strings has a hash code of zero and that might be right in the average case. However, one of the most common strings (the empty string “”) has a hash value of zero. On the other hand, no string with 1 - 6 characters (inclusive) (all characters ranging from ` ` (space) to Z) has a hash code that is zero.

As @Stable annotation is applicable only to internal JDK code, you cannot use it directly in your Java applications. However, we are working on a new JEP called JEP 502: Stable Values (Preview) that will provide constructs that allow user code to indirectly benefit from @Stable fields in a similar way.

You can download JDK 25 already today and see how much this performance improvement will benefit your current applications,

联系我们 contact @ memedata.com