At the core of every DE engine there’s some form of persistent durable execution log. You can think of this a bit like the write-ahead log of a database. It captures the intent to execute a given flow step, which makes it possible to retry that step should it fail, using the same parameter values. Once successfully executed, a step’s result will also be recorded in the log, so that it can be replayed from there if needed, without having to actually re-execute the step itself.
DE logs come in two flavours largely speaking; one is in the form of an external state store which is accessed via some sort of SDK. Example frameworks taking this approach include Temporal, Restate, Resonate, and Inngest. The other option is to persist DE state in the local database of a given application or (micro)service. One solution in this category is DBOS, which implements DE on top of Postgres.
To keep things simple, I went with the local database model for Persistasaurus, using SQLite for storing the execution log. But as we’ll see later on, depending on your specific use case, SQLite actually might also be a great choice for a production scenario, for instance when building a self-contained agentic system.
The structure of the execution log table in SQLite is straight-forward. It contains one entry for each durable execution step:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE TABLE IF NOT EXISTS execution_log (
flowId TEXT NOT NULL, (1)
step INTEGER NOT NULL, (2)
timestamp INTEGER NOT NULL, (3)
class_name TEXT NOT NULL, (4)
method_name TEXT NOT NULL, (5)
delay INTEGER, (6)
status TEXT (7)
CHECK( status IN ('PENDING','WAITING_FOR_SIGNAL','COMPLETE') )
NOT NULL,
attempts INTEGER NOT NULL DEFAULT 1, (8)
parameters BLOB, (9)
return_value BLOB, (10)
PRIMARY KEY (flowId, step)
)
| 1 | The UUID of the flow |
| 2 | The sequence number of the step within the flow, in the order of execution |
| 3 | The timestamp of first running this step |
| 4 | The name of the class defining the step method |
| 5 | The name of the step method (currently ignoring overloaded methods for this PoC) |
| 6 | For delayed steps, the delay in milli-seconds |
| 7 | The current status of the step |
| 8 | A counter for keeping track of how many times the step has been tried |
| 9 | The serialized form of the step’s input parameters, if any |
| 10 | The serialized form of the step’s result, if any |
This log table stores all information needed to capture execution intent and persist results. More details on the notion of delays and signals follow further down.
When running a flow, the engine needs to know when a given step gets executed so it can be logged. One common way for doing so is via explicit API calls into the engine, e.g. like so with DBOS Transact:
1
2
3
4
5
@Workflow
public void workflow() {
DBOS.runStep(() -> stepOne(), "stepOne");
DBOS.runStep(() -> stepTwo(), "stepTwo");
}
This works, but tightly couples workflows to the DE engine’s API. For Persistaurus I aimed to avoid this dependency as much as possible. Instead, the idea is to transparently intercept the invocations of all step methods and track them in the execution log, allowing for a very concise flow expression, without any API dependencies:
1
2
3
4
5
@Flow
public void workflow() {
stepOne();
stepTwo();
}
In order for the DE engine to know when a flow or step method gets invoked, the proxy pattern is being used: a proxy wraps the actually flow object and handles each of its method invocations, updating the state in the execution log before and after passing the call on to the flow itself. Thanks to Java’s dynamic nature, creating such a proxy is relatively easy, requiring just a little bit of bytecode generation. Unsurprisingly, I’m using the ByteBuddy library for this job:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private static <T> T getFlowProxy(Class<T> clazz, UUID id) {
try {
return new ByteBuddy()
.subclass(clazz) (1)
.method(ElementMatchers.any()) (2)
.intercept( (3)
MethodDelegation.withDefaultConfiguration()
.withBinders(
Morph.Binder.install(OverrideCallable.class))
.to(new Interceptor(id)))
.make()
.load(Persistasaurus.class.getClassLoader()) (4)
.getLoaded()
.getDeclaredConstructor()
.newInstance(); (5)
}
catch (Exception e) {
throw new RuntimeException("Couldn't instantiate flow", e);
}
}
| 1 | Create a sub-class proxy for the flow type |
| 2 | Intercept all method invocations on this proxy… |
| 3 | …and delegate them to an Interceptor object |
| 4 | Load the generated proxy class |
| 5 | Instantiate the flow proxy |
As an aside, Claude Code does an excellent job in creating code using the ByteBuddy API, which is not always self-explanatory.
Now, whenever a method is invoked on the flow proxy,
the call is delegated to the Interceptor class,
which will record the step in the execution log before invoking the actual flow method.
I am going to spare you the complete details of the method interceptor implementation
(you can find it here on GitHub),
but the high-level logic looks like so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public Object intercept(@This Object instance,
@Origin Method method,
@AllArguments Object[] args,
@Morph OverrideCallable callable) throws Throwable {
if (!isFlowOrStep(method)) {
return callable.call(args);
}
Invocation loggedInvocation = executionLog.getInvocation(id, step);
if (loggedInvocation != null &&
loggedInvocation.status() == InvocationStatus.COMPLETE) { (1)
step++;
return loggedInvocation.returnValue();
}
else {
executionLog.logInvocationStart(
id, step, method.getName(), InvocationStatus.PENDING, args); (2)
int currentStep = step;
step++;
Object result = callable.call(args); (3)
executionLog.logInvocationCompletion(id, currentStep, result); (4)
return result;
}
}
| 1 | Replay completed step if present |
| 2 | Log invocation |
| 3 | Execute the actual step method |
| 4 | Log result |
Replaying completed steps from the log is essential for ensuring deterministic execution. Each step typically runs exactly once, capturing non-deterministic values such as the current time or random numbers while doing so.
There’s an important failure mode, though: if the system crashes after a step has been executed but before the result can be recorded in the log, that step would be repeated when rerunning the flow. Odds for this to happen are pretty small, but whether it is acceptable or not depends on the particular use case. When executing steps with side-effects, such as remote API calls, it may be a good idea to add idempotency keys to the requests, which lets the invoked services detect and ignore any potential duplicate calls.
The actual execution log implementation isn’t that interesting, you can find its source code here.
All it does is persist step invocations and their status in the execution_log SQLite table shown above.