彩色Petri网，大型语言模型和分布式应用

彩色Petri网，大型语言模型和分布式应用
Colored Petri Nets, LLMs, and distributed applications

原始链接: https://blog.sao.dev/cpns-llms-distributed-apps/

## 彩色Petri网用于并发应用：摘要作者探讨了使用**彩色Petri网 (CPNs)** – 一种允许token携带数据的Petri网扩展 – 来简化和正式验证并发应用开发，尤其是在大型语言模型 (LLMs) 的帮助下。CPNs提供了一种建模状态和转换的方法，可能类似于Rust的类型状态模式，并解决了编写可靠并发代码的固有困难。主要优势包括**编译时验证**的可能性、简化的**状态同步**以及自动处理诸如**死锁避免**等问题。作者设想CPNs可以管理复杂的场景，例如网络爬虫（代理租赁、速率限制、重试）和数据构建过程，提供比传统方法更结构化的方法。考虑的实现策略包括使用Postgres数据库的传统应用程序，或具有快照持久化的高性能内存Rust二进制文件。一个主要挑战是当状态增长到单个服务器无法容纳时，如何进行**分区**。作者建议通过**使用CPN语义重新实现`spider-rs`网络爬虫的核心决策逻辑**来验证这种方法，并将性能和代码复杂度与原始实现进行比较。核心问题是CPN范式是否能减少错误和协调开销，最终使并发编程更轻松、更可靠。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录彩色Petri网，LLM和分布式应用 (sao.dev) 5 分，作者 stuartaxelowen 24分钟前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

CPNs, LLMs, and Distributed Applications

A big theme in LLM-enabled software dev is that verifiable correctness makes it much easier to take bigger leaps with LLMs. E.g. tests, compilers, state machines, etc. While researching for databuild, I recently came across colored petri nets, and instantly saw opportunity.

Colored petri nets (CPNs) are an extension of petri nets. Petri nets are essentially directed bipartite graphs where places can contain tokens, and places are connected by transitions (where the side effects happen). In petri nets, a single token contains no data, and represents an identity-less tokens location in the net. Importantly, petri nets with transitions that have only one input and output and one token globally are equivalent to finite state machines. Colored petri nets extend this model, allowing individual tokens to have data associated with them. This allows CPNs to match closely to the Rust typestate pattern, and suggests Rust may be able to implement CPN semantics easily.

CPNs are of particular interest because a) it is still hard to write concurrent applications and b) provide the potential for formally verifying concurrent programs at build time. Also, if we can implement a high performance datastore underneath, a CPN framework might be able to handle the hard parts of concurrent applications: state synchronization, conflict detection, deadlock avoidance, and coordinating access to shared resources. These are enabled via two other features of CPNs: guards and multi-token consumption/production.

A guard is a list of boolean conditions that may apply to a transition: things that must be true for a token to take that transition. For instance, there need to be more than 0 connections available in the connection pool to claim one. Multi-token consumption/production is just what it sounds like: a fork in the network via a transition, where P1 -> T1 -> (P2, P3), so T1 consuming a token from P1 produces a token in each of P2 and P3 simultaneously. Conversely, a join in the network via a transition, where (P1, P2) -> T1 -> P3, would require that both P1 and P2 to have tokens present that are able to transition via T1 into P3, which will also be simultaneous.

One application that has light concurrency involved is web scraping with leased proxies and scrape targets. You have a limited number of proxies to use to proxy your requests, and need to rate limit your usage of all of them to ensure they don't over-request a given target. Also, you both want to avoid requesting the same target multiple times concurrently, and also avoid making requests to the domain too frequently, to be a responsible user. Traditionally, this is solved with leasing of resources via a central database, implemented via select for update style semantics in the database. We can imagine implementing this with CPN semantics as having the scrape_target transition being the join of the available_proxies and prioritized_targets places, with a scrape only starting when a proxy is available and a prioritized target is available. You can also imagine implementing other complex distributed scraper features with CPN semantics:

Scrape target cooldowns: after being scraped, a target token transitions to a cool_down state, where using a timed petri net allows us to delay transitioning back to prioritized_targets after the delay period.
Domain-level rate limiting: add a separate domains token, so that the scrape_target transition is a 3-way join domains x available_proxies x prioritized_targets.
Retry with backoff: Failed scrapes fork: one token to failed_log, one back to prioritized_targets with incremented retry count and a longer cooldown. Guard prevents retries beyond max attempts.
Result pipeline — Post-scrape, tokens flow through raw_html → parsed → validated → stored, each with its own concurrency limits, which naturally implements back pressure.

Another application is databuild, where the network is made up of partitions (which are effectively leased to job runs), wants, and job runs, and we would benefit from some self-organizing dynamic process to propagate the data deps and solve user stated partition wants in a safe, efficient, and fast manner. But more on that guy later.

I'm still figuring this part out, but it seems like there are a few sensible or interesting strategies for implementing CPNs:

As a traditional app + postgres combo, with transactions implementing simultaneous token movement and state storage, and select for update enabling claiming of tokens for transitions to prevent double moves.
As a single-process Rust binary, with all CPN state stored in memory, enabling move semantics and the typestate pattern to implement the CPN semantics (this could be extremely fast, but need to figure out persistence - maybe snapshotted event log?)

One thing I'm trying to figure out is how to solve the partitiong problem: if app state grows to be too large for 1 server's memory, is there a way to automatically partition the network/tokens to enable fearless concurrency and horizontal scalability? So far the answers seem to be either a) solve in application with archive place/transition being part of the network or b) solving this at the database level. Or maybe something even crazier, where the whole network is made up of multiple CPN apps, which themselves expose query/consume interfaces?

Tying it all back, if we can figure out a CPN app framework with sensible persistence, we could impose more compile-time constraints on agent-authored code (good for dev velocity) while also providing correctness guarantees and simulation capabilities out of the box that unconstrained computation apps don't provide. Exciting!

Random Thoughts

!! Is there an existing open source service that I can use the test suite from or benchmark against to validate this? Especially if it means just setting claude code off against it?
Timed petri nets - could we emit metrics by default and simulate the net impact of a change to the whole system?
Join transitions as literal joins in the database, with transition conditions implying indexes, and token transitions as inserts and deletes inside a transaction
What extent are we talking about just making a datastore or CPN database?
Distributed network of CPN apps? Or an app as a distributed network of CPN services? It's all about partitioning?
CPNs don't need explicit re-entrance? Every "step" is a movement of tokens, which is doable completely from the state of all tokens involved?

Potential Bets (LLM Written)

To validate this, we need a real hill to climb: reimplement something with CPN semantics and compare. The core question isn't "can CPNs be fast/correct"—of course they can. It's: does the paradigm make writing simple, correct, and fast concurrent programs easier? Does the formalism pay for itself in reduced bugs and less bespoke coordination code?

The bet: scraper scheduler (spider-rs)

Implement the decision-making core of a scraper—URL prioritization, domain rate limiting, proxy assignment, retry scheduling—using CPNs. Mock the HTTP layer; benchmark decisions/sec and compare coordination code complexity against the original.

Why scrapers? Traditional implementations are full of ad-hoc coordination: rate limit maps, cooldown trackers, retry counters, domain queues. This is exactly what CPNs formalize. Correctness matters (politeness violations get you banned), and timed transitions are central, not incidental. spider-rs is already Rust, so we can fork and compare directly.

Why not the others first?

Option	Why not
Job queue (Faktory)	Coordination is incidental; well-understood pattern where CPN overhead may not pay off
Connection pooler (pgcat)	Too small—might not exercise enough CPN capabilities to be compelling
CI runner (Buildkite)	Too large—incidental complexity (artifacts, logs, shell execution) obscures the coordination story

Implementation approach

In-memory Rust with SQLite snapshots: single-threaded, move semantics apply, extremely fast. Partitioning = separate binaries with disjoint token ownership (enforceable at net-definition time via partition key requirements). Simulation = same CPN execution with mocked effects, enabling correctness testing, failure injection, and time fast-forwarding.