Show HN:用于并行处理的Zig拓扑排序库
Show HN: Zig Topological Sort Library for Parallel Processing

原始链接: https://github.com/williamw520/toposort

TopoSort 是一个高效的 Zig 库,用于对依赖图进行拓扑排序。它提供图构建、拓扑排序、用于并行处理的无依赖子集生成、循环检测/报告等功能,并支持多种节点类型。 使用方法:使用 `zig fetch` 获取库,更新 `build.zig.zon` 和 `build.zig`,然后在你的 Zig 源代码中导入它。你可以使用节点类型和分配器初始化 `TopoSort`。使用 `add` 或 `add_dep` 函数添加依赖关系。使用 `sort` 对图进行排序,使用 `has_cycle()` 检查循环,并通过 `get_sorted_sets` 访问用于并行处理的排序子集。对于简单的节点类型,值会被复制;对于切片,内存由调用者管理。初始化时可以使用 `verbose` 和 `max_range` 等配置选项。还提供 CLI 工具 `toposort-cli` 和基准测试用于性能测试。

Hacker News 上的一篇文章展示了一个新的 Zig 库,用于拓扑排序,旨在辅助并行处理。这个库可在 GitHub 上找到,其功能包括构建依赖关系图、执行拓扑排序、生成无依赖子集以及检测循环。作者将其开发为一个学习项目,以熟悉 Zig 包的发布。 评论者 Cieric 称赞了 Zig 及其潜力,同时也指出了大型项目中缺少的一些功能。他们建议在一个 OpenCV 节点编辑器中使用该库来优化节点重新处理并避免冗余计算。Cieric 还询问了该库如何处理依赖关系图中的循环,并询问它是终止程序还是尝试尽力解决。
相关文章

原文

TopoSort is a highly efficient Zig library for performing topological sort on dependency graph. This small library is packed with the following features:

  • Building dependency graph from dependency data.
  • Performing topological sort on the dependency graph.
  • Generating dependence-free subsets for parallel processing.
  • Cycle detection and cycle reporting.
  • Support different node types.

Go to the Releases page. Pick a release to add to your project. Identify the file asset URL for the release version. E.g. https://github.com/williamw520/toposort/archive/refs/tags/1.0.2.tar.gz

Use zig fetch to add the TopoSort package to your Zig project. Run the following command to fetch the TopoSort package:

  zig fetch --save https://github.com/williamw520/toposort/archive/refs/tags/<VERSION>.tar.gz

zig fetch updates your build.zig.zon file with the URL with file hash added in the .dependency section of the file.

.{
    .name = "my-project",
    ...
    .dependencies = .{
+       .toposort = .{
+           .url = "zig fetch https://github.com/williamw520/toposort/archive/refs/tags/<VERSION>.tar.gz",
+           .hash = "toposort-...",
+       },
    },
}

Update your build.zig with the lines for toposort.

  pub fn build(b: *std.Build) void {
      ...
+     const opts = .{ .target = target, .optimize = optimize };
+     const toposort_module = b.dependency("toposort", opts).module("toposort");
      ...
      const exe = b.addExecutable(.{
          .name = "my_project",
          .root_module = exe_mod,
      });
+     exe.root_module.addImport("toposort", toposort_module);

The .addImport("toposort") call let you import the module into your Zig source files.

const toposort = @import("toposort");

Usage typically follows the following steps in your Zig source file.

const toposort = @import("toposort");
const TopoSort = toposort.TopoSort;
const SortResult = toposort.SortResult;

Initialization and memory management.

    const T = usize;  // node data type
    var tsort = try TopoSort(T).init(allocator, .{});
    defer tsort.deinit();

The data type of the node value is provided as a comptime type to TopoSort(T).

    try tsort.add(101, 102);    // node 102 depends on the leading node 101
    try tsort.add(102, 103);
    try tsort.add(101, 104);

Performing the topological sort

    const result = try tsort.sort();
    if (result.has_cycle()) {
        for (result.get_cycle_set().items) |id| {
            const cyclic_node = result.get_node(id);
            ...
        }
    }

Otherwise, process the sorted non-cyclic result

    const sets: ArrayList(ArrayList(T)) = result.get_sorted_sets();
    for (sets.items) |subset| {     // the node sets are in topological order
        for (subset.items) |node| { // nodes within each set are dependence free from each other.
            ...
        }
    }

TopoSort figures out the nodes that have no dependence with each other in the linear order of the topological sequence and groups them together as subsets. This allows you to run/process the nodes of each subset in parallel.

The subsets themselves are in topological order. If there's no need for parallel processing, the nodes in each subset can be processed sequentially, which fit in the overall topological order of all the nodes.

Nodes are passed in by value in add() and are stored by value in the TopoSort's Data struct. For simple type like integer (e.g. u16, u32), the node values are simply copied. For slice and pointer node type (e.g. []const u8), the memory for the nodes are not duplicated. Memory is owned and managed by the caller.

The Toposort.init() function takes in optional configurations. E.g.

    const T = usize;  // node data type
    var tsort = try TopoSort(T).init(allocator, .{
        verbose = true,
        max_range = 4000,
    });

Setting the verbose flag prints internal messages while sorting.

The max_range property sets the maximum value of the node item value. E.g. For node values ranging from 1, 2, 3, 20, 75, ... 100, 100 is the maximum value. If all your node values are positive integers, passing in a number type (u16, u32, u64, etc) for the node data type and setting the max_range let TopoSort use a simpler data structure with faster performance. Building a dependency tree can be more than 3X or 4X faster. Compare the 3rd benchmark and 4th benchmark in tests.zig.

To use a slice/string for the node type,

    const T = []const u8;
    var tsort = try TopoSort(T).init(allocator, .{});

To get a list of topologically sorted nodes.

    const T = []const u8;
    var list = ArrayList(T).init(allocator);    // list to hold the returned nodes.
    defer list.deinit();
    for ((try result.get_sorted_list(&list)).items) |node| {
        ...
    }

To add dependency similar to the Makefile rule format.

Add the dependent node A to the leading B node. A: B
Add the dependent node B to the leading C node. B: C
Add the dependent node B to a list of leading nodes. B: E F G

    const T = []const u8;
    var tsort = try TopoSort(T).init(allocator, .{});
    try tsort.add_dep("A", "B");    // A: B
    try tsort.add_dep("B", "C");    // B: C
    try tsort.add_dep("B", "D");    // B: D
    try tsort.add_deps("B", &[_]T{ "E", "F", "G" });    // B: E F G
    
    var nodes = ArrayList(T).init(allocator);
    try nodes.append("E");
    try nodes.append("F");
    try nodes.append("G");
    try tsort.add_deps("B", nodes.items);

To traverse the list of nodes in the graph,

    for (result.get_nodes().items) |node| {
        ...
    }

To traverse the dependency graph recursively,

    const T = usize;  // node data type
    var tsort = try TopoSort(T).init(allocator, .{});
    ...
    const result = try tsort.sort();
    visit_tree(result, null, result.get_root_set());

    fn visit_tree(result: SortResult(T), lead_id: ?u32, dependent_ids: ArrayList(u32)) {
        if (lead_id) |id| { // lead_id is optional since the root nodes have no leading nodes.
            const lead_node = result.get_node(lead_id);
            ...
        }
        for (dependent_ids.items) |node_id| {
            const dependent_node = result.get_node(node_id);
            ...
            visit_tree(result, node_id, result.get_dependents(node_id));
        }
    }

TopoSort comes with a command line interface (CLI) tool toposort-cli, which uses the TopoSort library internally. The data file it used follows the simple dependent rule format of Makefile. E.g.

Sample invocations on the test data:

  zig-out/bin/toposort-cli --data data/data.txt
  zig-out/bin/toposort-cli --data data/data.txt --verbose
  zig-out/bin/toposort-cli --data data/data2.txt
  zig-out/bin/toposort-cli --data data/data_cycle1.txt
  zig-out/bin/toposort-cli --data data/data_cycle2.txt
  zig-out/bin/toposort-cli --data data/data_num.txt --int

TopoSort comes with some benchmark tests.

Rnn zig build test -Doptimize=ReleaseFast to run the benchmarks.

TopoSort is MIT licensed.

For more information on the Zig build system, check out these resources:

联系我们 contact @ memedata.com