雷霆战机
Rayforce

原始链接: https://github.com/RayforceDB/rayforce

Rayforce 是一款高性能、零依赖的 C17 分析引擎,将列式处理与图遍历融合为一个单一的优化流水线。它专为速度和效率而设计,采用了延迟操作 DAG、多趟优化器,以及能够将数据保留在 L1/L2 缓存中的融合式、基于分块(morsel-driven)的字节码执行器。 主要技术特性包括: * **执行:** 采用自定义伙伴分配器(避开系统 `malloc`)、线程本地内存池,并通过线程池实现并行执行。 * **图引擎:** 具有支持高级算法(BFS、Dijkstra、PageRank 等)和因子化执行的双索引 CSR 存储。 * **向量搜索:** 包含原生 HNSW 索引,具备支持过滤的 ANN 功能,可进行高速相似度查询。 * **Rayfall:** 一种类似 Lisp 的查询语言,提供交互式 REPL,并能在运行时将表达式直接编译为引擎的字节码虚拟机。 Rayforce 专为生产环境设计,支持基于 mmap 的存储、并行 CSV 解析和高效的内存管理。该项目基于 MIT 许可证发布,提供 Python 绑定,并与 Lynx 合作开发,以支持高要求的金融和分析工作负载。完整文档请访问 [rayforcedb.github.io/rayforce](https://rayforcedb.github.io/rayforce)。

抱歉。
相关文章

原文

Rayforce

Columnar analytics and graph traversal in one fused pipeline.

MIT License Single Header Docs C17 Zero Dependencies Custom Allocator GitHub Stars


Rayforce is a pure C17 zero-dependency embeddable engine where columnar analytics and graph traversals share a single operation DAG, pass through a multi-pass optimizer, and execute as fused morsel-driven bytecode. No malloc.

make            # debug build (ASan + UBSan)
make release    # optimized build
make test       # run full test suite
./rayforce      # start the Rayfall REPL

Rayforce ships with Rayfall — a Lisp-like query language with a rich set of builtins. The REPL prompt is :

‣ (set t (table [Symbol Side Qty]
    (list [AAPL GOOG MSFT AAPL GOOG]
          [Buy Sell Buy Sell Buy]
          [100 200 150 300 250])))

‣ (select {from:t by: Symbol Qty: (sum Qty)})
+--------+----------------------------+
| Symbol |            Qty             |
|  sym   |            i64             |
+--------+----------------------------+
| AAPL   | 400                        |
| GOOG   | 450                        |
| MSFT   | 150                        |
+-------------------------------------+
| 3 rows (3 shown) 2 columns (2 shown)|
+-------------------------------------+

‣ (pivot t 'Symbol 'Side 'Qty sum)
+--------+-----+----------------------+
| Symbol | Buy |         Sell         |
|  sym   | i64 |         i64          |
+--------+-----+----------------------+
| AAPL   | 100 | 300                  |
| GOOG   | 250 | 200                  |
| MSFT   | 150 | 0                    |
+-------------------------------------+
| 3 rows (3 shown) 3 columns (3 shown)|
+-------------------------------------+

Headers: include/rayforce.h (types, memory, atoms, vectors, tables, symbols), src/ops/ops.h (DAG construction, opcodes, optimizer, executor, graph algorithms), src/mem/heap.h (allocator lifecycle).

#include <rayforce.h>
#include "mem/heap.h"
#include "ops/ops.h"

int main(void) {
    ray_heap_init();
    ray_sym_init();

    /* Build a table */
    int64_t regions[] = {0, 0, 1, 1, 2, 2};
    int64_t amounts[] = {100, 200, 150, 300, 175, 225};
    ray_t* reg = ray_vec_from_raw(RAY_I64, regions, 6);
    ray_t* amt = ray_vec_from_raw(RAY_I64, amounts, 6);
    ray_t* tbl = ray_table_new(2);
    tbl = ray_table_add_col(tbl, ray_sym_intern("region", 6), reg);
    tbl = ray_table_add_col(tbl, ray_sym_intern("amount", 6), amt);
    ray_release(reg); ray_release(amt);

    /* Group by region, sum amounts */
    ray_graph_t* g = ray_graph_new(tbl);
    ray_op_t* keys[]    = { ray_scan(g, "region") };
    uint16_t  agg_ops[] = { OP_SUM };
    ray_op_t* agg_ins[] = { ray_scan(g, "amount") };
    ray_op_t* grp = ray_group(g, keys, 1, agg_ops, agg_ins, 1);

    ray_t* result = ray_execute(g, grp);

    if (result && !RAY_IS_ERR(result)) ray_release(result);
    ray_graph_free(g);
    ray_release(tbl);
    ray_sym_destroy();
    ray_heap_destroy();
}

Build — Construct a lazy DAG: scans, filters, joins, aggregations, window functions, graph traversals. Nothing executes yet.

Optimize — Multi-pass rewriting: type inference → constant folding → SIP → factorize → predicate pushdown → filter reorder → projection pushdown → partition pruning → fusion → DCE.

Execute — Fused morsel-driven bytecode processes 1024-element chunks that stay L1-resident. Radix-partitioned hash joins size partitions to fit L2. Thread pool dispatches morsels in parallel.

Execution engine

  • Lazy operation DAG — nothing runs until ray_execute
  • Multi-pass optimizer with sideways information passing
  • Fused morsel-driven bytecode — element-wise ops merged into single-pass chunks
  • Radix-partitioned hash joins sized for L2 cache
  • Thread pool with parallel morsel dispatch

Graph engine

  • Double-indexed CSR storage (forward + reverse), mmap support
  • BFS, DFS, Dijkstra, A*, PageRank, Louvain, Betweenness, LFTJ, and more
  • Factorized execution avoids materializing cross-products
  • SIP propagates selection bitmaps backward through expand chains

Rayfall language

  • Arithmetic, string, aggregation, joins, higher-order, I/O builtins
  • Lambdas compile lazily to bytecode, run in computed-goto VM
  • select/update/pivot bridge to the DAG optimizer at runtime

Memory

  • Buddy allocator with slab cache — O(1) for ~90% of allocations
  • Thread-local arenas, lock-free allocation, COW ref counting
  • No system allocator — ray_alloc/ray_free for everything

Vector search

  • Multi-metric HNSW index (cosine / L2 / inner-product) with save/load
  • Rayfall builtins: cos-dist / l2-dist / inner-prod / norm / knn and the HNSW lifecycle hnsw-build / ann / hnsw-save / hnsw-load / hnsw-free / hnsw-info
  • Filter-aware ANN via select ... where ... nearest (ann handle query) take k
  • Iterative streaming scan: the where predicate is pushed into HNSW's beam loop so rejected candidates don't consume result slots

Storage

  • Columnar files with mmap, splayed tables, date-partitioned tables
  • CSV reader with parallel mmap parse, type inference, null handling
include/rayforce.h         Single public header
src/mem/                    Buddy allocator, slab cache, arena, COW
src/core/                   Type system, platform abstraction, runtime
src/vec/                    Vector, list, string, selection bitmap ops
src/table/                  Table, symbol intern table
src/store/                  Column files, CSR, splayed/parted tables, HNSW
src/ops/                    DAG, optimizer, fused executor, LFTJ
src/io/                     CSV reader/writer (parallel mmap)
src/lang/                   Rayfall parser, evaluator, bytecode VM
src/app/                    REPL, terminal, pretty-printer
test/                       Test suites
examples/rfl/               Rayfall example scripts
examples/                   C API examples
website/                    Documentation site (GitHub Pages)

Full docs: rayforcedb.github.io/rayforce

Rayforce has Python bindings at rayforce-py — contributions welcome.

Contributions are welcome. You can help by:

  • Reporting bugs and requesting features via GitHub Issues
  • Submitting pull requests
  • Creating example scripts and use cases
  • Improving documentation

Rayforce is jointly developed with and sponsored by Lynx.

This partnership has been instrumental in making Rayforce a mature, production-ready engine. Lynx's active involvement in development and their commitment to innovative open-source technologies in the financial sector has enabled Rayforce to reach its full potential.

MIT

联系我们 contact @ memedata.com