Common Lisp 中的快速序列迭代

Common Lisp 中的快速序列迭代
Fast SEQUENCE iteration in Common Lisp

原始链接: https://world-playground-deceit.net/blog/2025/12/fast-sequence-iteration-in-common-lisp.html

## `do-sequence` 宏概要本文详细介绍了新的 Common Lisp 宏 `do-sequence` 的创建，旨在改进对序列（如列表和向量）的迭代，并使用关键字参数（`:start`、`:end`、`:key` 等）。使用 `reduce` 或 ANSI 的 `elt+length+loop` 等现有方法要么效率低下（在列表上为二次方），要么缺乏灵活性。 `do-sequence` 旨在提供一个统一且高性能的接口。它利用宏展开，根据序列类型和关键字参数专门化迭代循环，从而实现与 `reduce` 相当甚至更高的速度，尤其是在 SBCL 之外。它提供 `:with-index` 和 `:with-raw-elt`，分别用于访问迭代索引和元素，而无需应用 `:key` 函数。基准测试表明，在各种 Lisp 实现（SBCL、CCL、ECL、CLISP）和序列类型上，性能提升显著（高达 224% 更快），尤其是在复杂的迭代中。但是，由于深度特化，该宏会引入代码膨胀和编译时间增加，因此在内联之前需要仔细考虑。该代码符合 ANSI CL 标准，并采用 Zlib 许可。

最近一篇Hacker News上的帖子讨论了Common Lisp中快速序列迭代的问题，引发了关于优化和语言特性的讨论。原始文章（world-playground-deceit.net）强调了性能改进，有趣的是，SBCL（一种Common Lisp实现）的主要开发者在帖子发布后*立即*改进了`reduce`函数——这证明了该项目的积极开发。评论者争论是否依赖序列是缺少迭代器协议的变通方法，建议使用生成器和半协程作为更强大的解决方案。然而，现有的可扩展序列协议（如Trivial-Extensible-Sequences）由于实施和生态系统支持有限而面临采用挑战。最终，坚持使用经过良好优化且符合ANSI标准的解决方案仍然是一种实用方法。社区赞扬了SBCL的开发者持续改进该语言。

原文

If you don't know what sequences are in CL, here's the gist of it: either a linked list or a vector. The main idea being that some operations like element search, comparison or deletion should have the same interface for both those types.

Basically, sequences are a band-aid over the lack of real iterator protocol (yes, even considering the mildy supported extensible sequences thing). But well, that's what we have to work with and it's not so bad really; we could be in barren C-ountry after all.

One of the problems of sequences is that if you want to iterate over these while supporting the conventional keywords (:start, :end, :key) without writing entirely separate (and gnarly) versions, there are only two unified interfaces provided by ANSI:

elt+length+loop (or do): naïve and simple, what's actually used under the hood by iterate's in-sequence driver. No free keyword forwarding and quadratic (thus extremely slow) on lists.
reduce: way more practical as it handles all the keywords for you (even :from-end) and much faster since any non-toy implementation internally dispatches over the actual sequence type. Still not ideal as it has the overhead of repeat funcall of your closure and you don't have access to the often needed internal raw element (without :key applied) or iteration index.

So when I made a naïve max-elt (for this years' Advent of Code) then was forced to rewrite it to use reduce, I said to myself I must build that Lovecraftian macro to handle the required nested monomorphization/specialization… so here's the beast:

(defmacro do-sequence ((var seq &key (start 0) end key with-index with-raw-elt) &body body)
  "Iterate on a sequence with the usual keywords available. :WITH-INDEX (resp. :WITH-RAW-ELT)
takes an unevaluated symbol to use as index (resp. element without :KEY applied).

NB: since iteration is done via `(LOOP ... :DO (LOCALLY ,@BODY)), said body can use RETURN and
contain declarations."
  (once-only (seq start end key)
    (macrolet ((impl (type has-key-p &optional has-end-p)
                 `(let ((ivar (or with-index (gensym "IDX")))
                        (rvar (cond (with-raw-elt     with-raw-elt)
                                    (,(not has-key-p) var)
                                    (t                (gensym "RAW")))))
                    `(loop
                       ,@(ecase ,type
                           (list
                            `(,@(when (or ,has-end-p with-index)
                                  `(:for ,ivar :of-type (integer 0 #.array-dimension-limit)
                                    :from ,start ,@(when ,has-end-p `(:below ,end))))
                              :for ,rvar :in (nthcdr ,start ,seq)))
                           (vector
                            `(:for ,ivar :of-type (integer 0 #.array-dimension-limit)
                              :from ,start :below (or ,end (length ,seq))
                              :for ,rvar := (aref ,seq ,ivar))))
                       ,@(cond (,has-key-p   `(:for ,var := (funcall ,key ,rvar)))
                               (with-raw-elt `(:for ,var := ,rvar)))
                       :do (locally ,@body)))))
      `(etypecase ,seq
         (list
          (cond ((and ,key ,end) ,(impl 'list t   t))
                (,key            ,(impl 'list t   nil))
                (,end            ,(impl 'list nil t))
                (t               ,(impl 'list nil nil))))
         ((simple-array * (*))           (if ,key
              ,(impl 'vector t)
              ,(impl 'vector nil)))
         (vector
          (if ,key
              ,(impl 'vector t)
              ,(impl 'vector nil)))))))

Except for the de facto standard once-only macro, it is ANSI CL compliant. You can copy it under the Zlib license, if you wish so.

Now, for some benchmarks! Here I compare the aforementioned max-elt core loop written with reduce then with do-sequence against large sequences of different types; needing both the index and raw element somewhat complicates the result and illustrate why I included :with-index and :with-raw-elt.

(deftype index () `(integer 0 ,array-dimension-limit))
(deftype end ()   '(or null index))
(deftype key ()   '(or symbol (function (t) t) null))
(deftype test ()  '(or symbol (function (t t) t)))

(declaim (ftype (function (sequence test &key (:start index) (:end end) (:key key))
                          (values t (or null index)))
                max-elt-reduce max-elt-do-seq))

(defun max-elt-reduce (seq pred &key (start 0) end key)
  (let ((i start) maxi max)
    (declare (type index i))
    (reduce (if key                 (lambda (kmax el)
                  (incf i)
                  (let ((kel (funcall key el)))
                    (if (or (= i start) (funcall pred kel kmax))
                        (progn (setf maxi i max el) kel)
                        kmax)))
                (lambda (kmax el)
                  (incf i)
                  (if (or (= i start) (funcall pred el kmax))
                      (progn (setf maxi i max el) el)
                      kmax)))
            seq :start start :end end)
    (values max maxi)))

(defun max-elt-do-seq (seq pred &key (start 0) end key)
  (let (maxi rmax max)
    (do-sequence (el seq :start start :end end :key key :with-index i :with-raw-elt r)
      (when (or (= i start) (funcall pred el max))
        (setf maxi i rmax r max el)))
    (values rmax maxi)))


(defconstant +len+ 5000000)

(let ((l   (make-sequence 'list   +len+ :initial-element 0))
      (sv  (make-sequence 'vector +len+ :initial-element 0))
      (fsv (make-array +len+ :element-type 'fixnum :initial-element 0))
      (fv  (make-array +len+ :element-type 'fixnum :initial-element 0 :adjustable t)))
  (format t "Test,~A~%" (lisp-implementation-type))
  (loop :for (args name) :in
        `(((,l   ,#'>) "LIST")
          ((,l   ,#'> :start 100 :end ,(- +len+ 100) :key ,#'1+) "LIST (fiddly)")
          ((,sv  ,#'>) "SIMPLE-VECTOR")
          ((,fsv ,#'>) "(SIMPLE-ARRAY FIXNUM)")
          ((,fv  ,#'>) "(VECTOR FIXNUM)"))
        :do (let ((tref (nth-value 1 (measure (apply #'max-elt-reduce args))))
                  (tnew (nth-value 1 (measure (apply #'max-elt-do-seq args)))))
              (format t "~A,~D → ~D (~@D%)~%"
                      name
                      (round (* tref 1000))
                      (round (* tnew 1000))
                      (round (* 100 (- (/ tref tnew) 1)))))))

And here are the results on some popular implementations (durations in ms):

Test	SBCL	CCL	ECL	CLISP
LIST	39 → 26 (+50%)	124 → 64 (+93%)	979 → 499 (+96%)	372 → 251 (+48%)
LIST (fiddly)	46 → 39 (+18%)	129 → 77 (+69%)	1186 → 703 (+69%)	450 → 408 (+10%)
SIMPLE-VECTOR	44 → 32 (+38%)	140 → 81 (+71%)	966 → 600 (+61%)	417 → 314 (+33%)
(SIMPLE-ARRAY FIXNUM)	44 → 31 (+42%)	141 → 81 (+73%)	954 → 608 (+57%)	417 → 315 (+32%)
(VECTOR FIXNUM)	51 → 37 (+38%)	177 → 121 (+46%)	960 → 635 (+51%)	422 → 331 (+27%)

And another, simpler case (position-all, you can guess what it does) to exercise compilers a bit differently:

Code

(deftype queue () '(cons cons list))
(defun make-queue ()
  (let ((q (cons nil nil)))
    (setf (car q) q)
    q))

(declaim (ftype (function (t queue) queue) push-queue)
         (inline push-queue))
(defun push-queue (obj queue)
  (let ((new-tail (list obj)))
    (setf (cdar queue) new-tail
          (car queue)  new-tail))
  queue)


(deftype index () `(integer 0 ,array-dimension-limit))
(deftype end ()   '(or null index))
(deftype key ()   '(or symbol (function (t) t) null))
(deftype test ()  '(or symbol (function (t t) t)))

(declaim (ftype (function (t sequence &key (:start index) (:end end) (:key key) (:test test))
                          list)
                position-all-reduce position-all-do-seq))

(defun position-all-reduce (obj seq &key (start 0) end key (test #'eql))
  (let ((i start))
    (declare (type index i))
    (cdr
     (reduce (if key                  (lambda (acc el)
                   (incf i)
                   (when (funcall test (funcall key el) obj)
                     (push-queue i acc))
                   acc)
                 (lambda (acc el)
                   (incf i)
                   (when (funcall test el obj)
                     (push-queue i acc))
                   acc))
             seq :start start :end end :initial-value (make-queue)))))

(defun position-all-do-seq (obj seq &key (start 0) end key (test #'eql))
  (let ((acc (make-queue)))
    (do-sequence (el seq :start start :end end :key key :with-index i)
      (when (funcall test el obj)
        (push-queue i acc)))
    (cdr acc)))


(defconstant +len+ 5000000)

(let* ((sv  (let ((tmp (make-sequence 'vector +len+ :initial-element 0)))
              (loop :for i :from 0 :below +len+ :by 1000
                    :do (setf (aref tmp i) 42))
              tmp))
       (l   (coerce sv 'list))
       (fsv (make-array +len+ :element-type 'fixnum :initial-contents l))
       (fv  (make-array +len+ :element-type 'fixnum :initial-contents l :adjustable t)))
  (format t "Test,~A~%" (lisp-implementation-type))
  (loop :for (args name) :in
        `(((42   ,l) "LIST")
          ((42   ,l :start 100 :end ,(- +len+ 100) :key ,#'1+) "LIST (fiddly)")
          ((42  ,sv) "SIMPLE-VECTOR")
          ((42 ,fsv) "(SIMPLE-ARRAY FIXNUM)")
          ((42  ,fv) "(VECTOR FIXNUM)"))
        :do (let ((tref (nth-value 1 (measure (apply #'position-all-reduce args))))
                  (tnew (nth-value 1 (measure (apply #'position-all-do-seq args)))))
              (format t "~A,~D → ~D (~@D%)~%"
                      name
                      (round (* tref 1000))
                      (round (* tnew 1000))
                      (round (* 100 (- (/ tref tnew) 1)))))))

Test	SBCL	CCL	ECL	CLISP
LIST	30 → 11 (+173%)	58 → 18 (+224%)	879 → 351 (+150%)	291 → 147 (+98%)
LIST (fiddly)	33 → 21 (+57%)	63 → 28 (+122%)	907 → 538 (+69%)	352 → 301 (+17%)
SIMPLE-VECTOR	29 → 20 (+45%)	70 → 33 (+110%)	826 → 458 (+80%)	310 → 220 (+41%)
(SIMPLE-ARRAY FIXNUM)	29 → 21 (+38%)	71 → 35 (+106%)	855 → 477 (+79%)	314 → 219 (+43%)
(VECTOR FIXNUM)	40 → 26 (+54%)	108 → 70 (+54%)	835 → 468 (+78%)	321 → 228 (+41%)

Benchmarking details

Versions used: SBCL 2.5.11, CCL 1.13, ECL 24.5.10, CLISP 2.49.92 (bytecode compiler used)
Hardware: AMD 5900X, 64 GB DDR4
Software: Gentoo Linux, linux 6.12, gcc 15.2, glibc 2.41
Misc.: optimize was set to (speed 3) (safety 0) (debug 0) for the sake of benchmarking, speedups are comparable without. measure does a full GC before starting its timer.

Well, isn't that welcome? Both the performance and readability gains were worth the effort, in my opinion; especially outside of SBCL which seems to have a particularly well optimized reduce.

Beware code bloat and compilation slowdown though, such deep specialization with loop macros as leaves isn't free; so don't inline unless you intend on exploiting your compiler's DCE pass.

Common Lisp 中的快速序列迭代 Fast SEQUENCE iteration in Common Lisp

Common Lisp 中的快速序列迭代
Fast SEQUENCE iteration in Common Lisp