185微秒类型提示

Without Type Hint	19,959	200.4
With Type Hint	264,316	15.1

185微秒类型提示
The 185-Microsecond Type Hint

原始链接: https://blog.sturdystatistics.com/posts/type_hint/

## Roughtime 服务器性能提升：类型提示的故事最近一个 Roughtime 协议的开源实现，该协议旨在实现安全的时间同步，通过一个看似微不足道的代码更改，获得了令人惊讶的 13 倍吞吐量提升。该服务器处理涉及排队、16 个版本之间的协议兼容性、Merkle 树构建和 Ed25519 签名等请求——所有这些都是计算密集型任务。然而，初步分析显示，90% 的请求时间都花在一个简单的计算字节数组长度的函数中。尽管通过了测试且没有反射警告，但 `mapv` 函数的动态分派和运行时类型检查引入了显著的开销。修复方法？添加一个类型提示 (`fn [^bytes v] (alength v)`)，告知编译器字节数组的类型。这使得编译器能够发出单个高效的字节码指令，而不是复杂的函数调用链。虽然隔离测试显示性能提升了 8 倍，但端到端基准测试表明性能提升了 13 倍，这可能是由于减少了反射调用路径中的竞争，并改进了 JIT 优化。这表明在 Clojure 中，没有反射警告并不能保证最佳性能，并且分析对于识别意外的瓶颈至关重要——即使是在“简单的”代码中。

## Clojure 类型提示带来的 185 微秒性能提升 sturdystatistics.com 上的一篇最新博文详细介绍了一种通过添加类型提示在 Clojure 代码中实现的令人惊讶的性能改进。作者观察到显著的加速——将执行时间减少到 185 微秒——通过指定字节数组的类型来实现。最初的解释认为编译器将代码优化为单个 CPU 指令。然而，Hacker News 评论区的讨论表明这可能过于简单化。专家指出，字节码可能仍然包含带有潜在异常处理的类型转换，而加速很可能归功于 JIT（即时编译）编译器优化。具体来说，添加类型提示允许 JIT 编译器假设类型，从而实现类型保护以及在没有提示的情况下不可能进行的内联或循环展开等优化。作者承认需要检查发出的字节码以获得明确的解释，并表达了对像 Clojure 这样高级语言通过 JIT 优化实现令人印象深刻的性能的潜力感到兴奋。该帖子还提到了“Roughtime”协议，这是一种用于密码学可验证时间的系统，用于加固许可证服务器。

How a “trivial” change yielded a 13× throughput increase.

We recently released an open-source Clojure implementation of Roughtime, a protocol for secure time synchronization with cryptographic proof.

When a client asks for the time, it sends a random nonce. The server replies with a signed certificate containing both the nonce and a timestamp, proving the response happened after the request. Responses can be chained together with provable ordering; if any server’s timestamps are inconsistent with that ordering, that server is cryptographically “outed” as unreliable.

The Heavy Lifting

A single request to our server triggers a surprising amount of work:

1. Queueing

An incoming request goes through basic validation and enters a “received queue.” This queue is processed by a batcher, which sends batches to one of four worker queues. When a worker queue picks up a batch, it decodes each request, groups them into sub-batches by version number, and responds to each sub-batch. These go into a sender queue which un-batches and sends the responses back to the requesting server.

2. Protocol Compatibility

We support the entire evolution of the protocol: from Google’s original specification, through all fifteen IETF drafts – that’s sixteen versions. That means we have conditional logic littered throughout the codebase: version tags, padding schemes, tag labels, hash sizes, and packet layouts all vary with the protocol version. In several places, compatibility won over elegance or optimization.

3. Recursive Merkle Trees

Each batch is rolled into a Merkle tree using SHA-512. That means recursive hashing all the way to the root; this is pure CPU-bound work.

4. Ed25519 Signatures

Finally, each response is signed with Ed25519. Public-key signatures are notoriously expensive and are usually the dominant cost in systems like this.

The “Sluggish” Server

Given all that complexity, along with the fact that I’m using a high-level dynamic programming language, I wasn’t surprised when my initial benchmarks showed the server responding in 200 microseconds (µs).

I ran a profiler expecting to see SHA-512 or Ed25519 dominating.

Instead, nearly 90% of the runtime was attributed to the most mundane line in the entire library:

(defn encode-rt-message [msg-map]
  (let [sorted-entries (sort-tags msg-map)
        tag-bytes      (mapv #(tag/tag->bytes (key %)) sorted-entries)
        val-bytes      (mapv #(tag/pad4 (val %)) sorted-entries)

        ;; THE BOTTLENECK:
        val-lens       (mapv alength val-bytes)

        ...]

Test conditions:

Results: