为Java风格的单分派弯曲CLOS Mop
Bending the CLOS Mop for Java-Style Single Dispatch

原始链接: https://atgreen.github.io/repl-yell/posts/clos-mop-dispatch/

## Clojure 在 OpenLDK 上:优化 Java 方法分派 在 OpenLDK (一种 Common Lisp JVM) 上运行 Clojure 曾面临极慢的启动时间——达到 REPL 几乎需要三个小时——这是由于 CLOS (Common Lisp 对象系统) 和 Java 的分派机制之间存在根本性不匹配所致。CLOS 使用多分派(考虑所有参数类型),而 Java 使用单分派(仅考虑接收者的类型)。OpenLDK 将 Java 映射到 CLOS,导致每次 Java 调用都进行代价高昂的方法查找,而 Clojure 在启动期间加载的约 3000 个类进一步加剧了这一问题。 解决方案是利用 CLOS 的元对象协议 (MOP) 将 Java 方法的默认分派机制替换为单分派实现。这是通过创建具有分派和 `invokespecial` 调用(Java 构造函数/超类调用)哈希表缓存的自定义 `java-generic-function` 元类来实现的。 关键改进包括拦截 SBCL 的 `update-dfun`(它在类定义上重建分派网络)并将其替换为缓存清除,以及为常用的 Java 方法(如 ``)预先创建泛型函数。这些更改将 REPL 启动时间从近三个小时缩短到三分钟以内,展示了 MOP 在不修改核心 Lisp 实现的情况下进行有针对性的运行时定制的强大功能。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 为 Java 风格的单分派弯曲 CLOS Mop (atgreen.github.io) 3 分,atgreen 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 atgreen 1 小时前 [–] 我通过 OpenLDK(我的 Java JIT 编译器和 Common Lisp 上的运行时)在 Common Lisp (SBCL) 上运行了一个 Clojure REPL。 这篇博文概述了我如何操作 CLOS 的方法分派以提高单分派方法调用的性能。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

I hit a wall while getting Clojure running on OpenLDK, my Common Lisp JVM. It was taking almost three hours to get to a Clojure REPL prompt. Profiling showed that huge chunks of that time were spent in invoke-special, buried deep in SBCL’s PCL implementation, trying to sort through thousands of methods for <init>(). The JVM constructor. Every Java class has one, and in Clojure every function is its own class.

The problem is a fundamental impedance mismatch: CLOS does multi-method dispatch — it considers the types of all arguments when selecting a method. Java does single dispatch — only the receiver object’s type matters. OpenLDK maps Java classes to CLOS classes and Java virtual methods to CLOS generic functions, which means every Java method call pays the full cost of CLOS’s multi-dispatch protocol. For a handful of classes, nobody notices. For the ~3000 classes Clojure loads at startup, it’s a catastrophe.

The fix was to reach into the CLOS Meta-Object Protocol and replace the dispatch machinery for Java methods with something that actually matches Java’s dispatch model.

A Quick Tour of CLOS Dispatch

SBCL’s PCL (Portable Common Loops) implements the MOP’s dispatch protocol. When you call a generic function, PCL routes the call through a discriminating function — an internal function that PCL attaches to each generic function object. The discriminating function embodies a discrimination net (dfun), a caching structure that tries to fast-path calls based on argument types it has seen before.

Standard CLOS dispatch flow

On a dfun cache miss, PCL falls back to the full MOP protocol: compute-applicable-methods-using-classes examines the class of every specializable argument against every defined method’s specializers, the matching methods are sorted by specificity across all argument positions, and PCL builds an effective method — a compiled function that chains the primary and auxiliary methods together with call-next-method support. The result is folded back into the dfun so the next call with the same types is fast.

In steady state this works fine. The problem is what happens when the method set changes. Every time you add a method or define a new class, PCL calls update-dfun to rebuild the discrimination net from scratch. During Clojure bootstrap, we’re loading classes in a tight loop — the dfun never gets a chance to stabilize. Each class definition triggers update-dfun on every generic function that has methods specializing on that class or its ancestors. With thousands of methods on <init>(), each rebuild is expensive, and it happens thousands of times.

What Java Actually Needs

Java’s virtual dispatch is simple: look at the receiver object’s class, walk up the class hierarchy until you find a matching method, done. There’s no consideration of the other arguments’ types. This is textbook single dispatch, and it has a textbook optimization: a cache keyed on the receiver’s class.

That’s exactly what I built.

The java-generic-function Metaclass

The MOP lets you subclass standard-generic-function to change how dispatch works. I defined a new metaclass with two hash-table caches and a lock:

(defclass java-generic-function (standard-generic-function)
  ((dispatch-cache :initform (make-hash-table :test 'eq)
                   :accessor java-gf-dispatch-cache)
   (invoke-special-cache :initform (make-hash-table :test 'eq)
                         :accessor java-gf-invoke-special-cache)
   (cache-lock :initform (bordeaux-threads:make-lock "java-gf-cache")
               :reader java-gf-cache-lock))
  (:metaclass sb-mop:funcallable-standard-class))

The dispatch-cache maps a receiver class to a pre-built effective method function. The invoke-special-cache does the same thing for invokespecial bytecode — Java’s mechanism for calling a specific parent class’s method (constructors, super calls). The lock is there because OpenLDK can load classes from multiple threads.

Replacing the Discriminating Function

The MOP protocol for customizing dispatch is compute-discriminating-function. I specialize it on java-generic-function to return a lambda that does a hash-table lookup instead of walking the full MOP dispatch chain:

(defmethod closer-mop:compute-discriminating-function
    ((gf java-generic-function))
  (let ((cache (java-gf-dispatch-cache gf))
        (lock (java-gf-cache-lock gf)))
    (lambda (&rest args)
      (let* ((receiver (first args))
             (class (class-of receiver))
             (emfun (gethash class cache)))
        (if emfun
            (funcall emfun args)
            (let ((new-emfun (%compute-java-effective-method gf class)))
              (bordeaux-threads:with-lock-held (lock)
                (setf (gethash class cache) new-emfun))
              (funcall new-emfun args)))))))

On a cache hit, it’s two operations: class-of and a hash lookup. On a miss (the first call for each receiver class), it computes the effective method, caches it, and calls it. After that, every subsequent call for that class is O(1).

Fast dispatch flow

The cache miss path calls %compute-java-effective-method, which does call compute-applicable-methods-using-classes — but only once per (generic-function, receiver-class) pair between invalidations:

(defun %compute-java-effective-method (gf class)
  (let* ((lambda-list (closer-mop:generic-function-lambda-list gf))
         (nargs (length lambda-list))
         (class-list (cons class
                          (make-list (max 0 (1- nargs))
                                     :initial-element (find-class 't)))))
    (multiple-value-bind (methods definitive-p)
        (closer-mop:compute-applicable-methods-using-classes gf class-list)
      (declare (ignore definitive-p))
      (if (null methods)
          (lambda (args)
            (apply #'no-applicable-method gf args))
          (let* ((around (remove-if-not
                          (lambda (m) (equal (method-qualifiers m) '(:around)))
                          methods))
                 (primary (remove-if-not
                           (lambda (m) (null (method-qualifiers m)))
                           methods))
                 (chain (append around primary)))
            (if (null chain)
                (lambda (args)
                  (apply #'no-applicable-method gf args))
                (let ((first-mf (closer-mop:method-function (first chain)))
                      (rest-chain (rest chain)))
                  (lambda (args)
                    (funcall first-mf args rest-chain)))))))))

The trick with class-list is key: I only specialize on the receiver’s actual class. Every other argument position gets T, the universal supertype. This tells the MOP “I don’t care about these arguments for dispatch purposes” — which is exactly Java’s single-dispatch semantics.

The Real Win: Short-Circuiting update-dfun

The custom discriminating function helps, but the single biggest performance win is intercepting PCL’s update-dfun. This is the internal function that rebuilds the discrimination net, and PCL calls it every time a method is added, removed, or a relevant class is defined. During Clojure bootstrap, that means thousands of times.

For java-generic-function instances, I replace the expensive rebuild with a cache clear:

(let ((original-update-dfun (fdefinition 'sb-pcl::update-dfun)))
  (sb-ext:without-package-locks
    (setf (fdefinition 'sb-pcl::update-dfun)
          (lambda (gf &rest args)
            (if (and (typep gf 'java-generic-function)
                     (slot-boundp gf 'dispatch-cache))
                (progn
                  (bordeaux-threads:with-lock-held ((java-gf-cache-lock gf))
                    (clrhash (java-gf-dispatch-cache gf))
                    (clrhash (java-gf-invoke-special-cache gf)))
                  (sb-mop:set-funcallable-instance-function
                   gf (closer-mop:compute-discriminating-function gf)))
                (apply original-update-dfun gf args))))))

Cache invalidation flow

When PCL triggers update-dfun on a java-generic-function, instead of rebuilding a discrimination net, I clear both hash tables and reinstall the fast discriminating function. The caches repopulate lazily on the next call to each class. For non-Java generic functions, the original PCL logic runs as usual.

This is the difference between O(n) work on every class definition (where n is the method count) and O(1) work followed by amortized lazy cache fills.

Pre-Creating Hot Generic Functions

Not all generic functions are equal. <init>() is the single hottest one — every Java object creation calls it. I pre-create it (and a few others) with the java-generic-function metaclass during bootstrap, before any classes are loaded:

(ensure-generic-function '|<init>()|
                         :generic-function-class 'java-generic-function
                         :lambda-list '(|this|))

(ensure-generic-function '|clone()|
                         :generic-function-class 'java-generic-function
                         :lambda-list '(|this|))

And in the code generator, every Java instance method gets its generic function pre-created with the fast metaclass:

(unless (and (fboundp method-name)
             (typep (symbol-function method-name) 'generic-function))
  (ensure-generic-function method-name
    :generic-function-class 'java-generic-function
    :lambda-list (cons 'this arg-names)))

This guarantees that no Java method ever falls through to PCL’s default dispatch machinery.

invokespecial Caching

Java’s invokespecial instruction is used for constructor chaining and super calls. It bypasses virtual dispatch and targets a specific class’s method. During Clojure bootstrap, constructor chains like Object.<init>AbstractCollection.<init>AbstractList.<init> → &mldr; fire constantly.

I added a second cache specifically for this pattern:

(defun invoke-special (method-symbol owner-symbol args)
  (let* ((gf (symbol-function method-symbol))
         (owner-class (find-class owner-symbol)))
    (when (typep gf 'java-generic-function)
      (let* ((cache (java-gf-invoke-special-cache gf))
             (entry (gethash owner-class cache)))
        (unless entry
          (setf entry (%compute-invoke-special-entry
                        gf owner-class (length args)))
          (when entry
            (bordeaux-threads:with-lock-held ((java-gf-cache-lock gf))
              (setf (gethash owner-class cache) entry))))
        (when entry
          (return-from invoke-special
            (funcall (car entry) args (cdr entry))))))
    ;; fallback to full MOP lookup for non-java-generic-functions
    ...))

Each cache entry is a cons of (method-function . next-methods), so calling a cached invokespecial is a single funcall.

The Payoff

This work was motivated by cl-clojure, a project that runs Clojure inside Common Lisp via OpenLDK. Clojure’s bootstrap loads a massive number of classes — the core library alone defines hundreds of IFn implementations, each with invoke methods at multiple arities. Without the MOP customization, PCL spent more time rebuilding discrimination nets than doing useful work. With it, getting to a Clojure REPL went from 2 hours 45 minutes to 2 minutes 40 seconds.

The whole fix is about 120 lines of Common Lisp in a single file (src/java-gf.lisp), plus a few ensure-generic-function calls sprinkled through the bootstrap code. It touches exactly one internal PCL function (update-dfun), and it leaves all non-Java generic functions completely untouched.

What I like about this solution is that it’s the MOP working as designed. Gregor Kiczales and the AMOP authors built the Meta-Object Protocol specifically so you could do this kind of thing — swap out pieces of the object system’s implementation without forking the runtime. I didn’t patch SBCL. I didn’t write a custom compiler. I subclassed standard-generic-function, specialized one MOP generic function, and intercepted one internal PCL function. The rest of CLOS keeps working exactly as before.

The source is at github.com/atgreen/openldk.

联系我们 contact @ memedata.com