Sounds familiar? This is how I historically have been running
benchmarks and other experiments requiring a repeated sequence of
commands — type them manually once, then rely on shell history (and
maybe some terminal splits) for reproduction. These past few years
I’ve arrived at a much better workflow pattern — make.ts.
I was forced to adapt it once I started working with multiprocess
applications, where manually entering commands is borderline
infeasible. In retrospect, I should have adapted the workflow years
earlier.
Use a consistent filename for the script. I use make.ts, and so there’s a make.ts in the root
of most projects I work on. Correspondingly, I have make.ts line in project’s .git/info/exclude
— the .gitignore file which is not shared. The fixed
name reduces fixed costs — whenever I need complex interactivity I
don’t need to come up with a name for a new file, I open my
pre-existing make.ts, wipe whatever was there and start
hacking. Similarly, I have ./make.ts in my shell
history, so
fish autosuggestions
work for me. At one point, I had a VS Code task to run make.ts, though I now use
terminal editor.
Start the script with hash bang,
#!/usr/bin/env -S deno run
--allow-all
in my case, and
chmod a+x make.ts
the file, to make it easy to run.
Write the script in a language that:
- you are comfortable with,
- doesn’t require huge setup,
- makes it easy to spawn subprocesses,
- has good support for concurrency.
For me, that is TypeScript. Modern JavaScript is sufficiently ergonomic, and structural, gradual typing is a sweet spot that gives you reasonable code completion, but still allows brute-forcing any problem by throwing enough stringly dicts at it.
JavaScript’s tagged template syntax is brilliant for scripting use-cases:
function $(literal, ...interpolated) {
console.log({ literal, interpolated });
}
const dir = "hello, world";
$`ls ${dir}`;
prints
{
literal: [ "ls ", "" ],
interpolated: [ "hello, world" ]
}
What happens here is that $ gets a list of literal
string fragments inside the backticks, and then, separately, a list
of values to be interpolated in-between. It could
concatenate everything to just a single string, but it doesn’t have
to. This is precisely what is required for process spawning, where
you want to pass an array of strings to the exec
syscall.
Specifically, I use dax library with Deno, which is excellent as a single-binary
batteries-included scripting environment (see <3 Deno). Bun has a dax-like library in the box and is a
good alternative (though I personally stick with Deno because of
deno fmt and deno lsp). You could also use
famous zx, though be mindful that it
uses your shell as a middleman, something I consider to be
sloppy (explanation).
While dax makes it convenient to spawn a single
program, async/await is excellent for herding a slither
of processes:
await Promise.all([
$`sleep 5`,
$`sleep 10`,
]);
Here’s how I applied this pattern earlier today. I wanted to measure how TigerBeetle cluster recovers from the crash of the primary. The manual way to do that would be to create a bunch of ssh sessions for several cloud machines, format datafiles, start replicas, and then create some load. I almost started to split my terminal up, but then figured out I can do it the smart way.
The first step was cross-compiling the binary, uploading it to the cloud machines, and running the cluster (using my box from the other week):
await $`./zig/zig build -Drelease -Dtarget=x86_64-linux`;
await $`box sync 0-5 ./tigerbeetle`;
await $`box run 0-5
./tigerbeetle format --cluster=0 --replica-count=6 --replica=?? 0_??.tigerbeetle`;
await $`box run 0-5
./tigerbeetle start --addresses=?0-5? 0_??.tigerbeetle`;
Running the above the second time, I realized that I need to kill the old cluster first, so two new commands are “interactively” inserted:
await $`./zig/zig build -Drelease -Dtarget=x86_64-linux`;
await $`box sync 0-5 ./tigerbeetle`;
await $`box run 0-5 rm 0_??.tigerbeetle`.noThrow();
await $`box run 0-5 pkill tigerbeetle`.noThrow();
await $`box run 0-5
./tigerbeetle format --cluster=0 --replica-count=6 --replica=?? 0_??.tigerbeetle`;
await $`box run 0-5
./tigerbeetle start --addresses=?0-5? 0_??.tigerbeetle`;
At this point, my investment in writing this file and not just entering the commands one-by-one already paid off!
The next step is to run the benchmark load in parallel with the cluster:
await Promise.all([
$`box run 0-5 ./tigerbeetle start --addresses=?0-5? 0_??.tigerbeetle`,
$`box run 6 ./tigerbeetle benchmark --addresses=?0-5?`,
])
I don’t need two terminals for two processes, and I get to copy-paste-edit the mostly same command.
For the next step, I actually want to kill one of the replicas, and
I also want to capture live logs, to see in real-time how the
cluster reacts. This is where 0-5 multiplexing syntax
of box falls short, but, given that this is JavaScript, I can just
write a for loop:
const replicas = range(6).map((it) =>
$`box run ${it}
./tigerbeetle start --addresses=?0-5? 0_??.tigerbeetle
&> logs/${it}.log`
.noThrow()
.spawn()
);
await Promise.all([
$`box run 6 ./tigerbeetle benchmark --addresses=?0-5?`,
(async () => {
await $.sleep("20s");
console.log("REDRUM");
await $`box run 1 pkill tigerbeetle`;
})(),
]);
replicas.forEach((it) => it.kill());
await Promise.all(replicas);
At this point, I do need two terminals. One runs ./make.ts and shows the log from the benchmark itself, the
other runs tail -f logs/2.log to watch the next replica
to become primary.
I have definitelly crossed the line where writing a script makes sense, but the neat thing is that the gradual evolution up to this point. There isn’t a discontinuity where I need to spend 15 minutes trying to shape various ad-hoc commands from five terminals into a single coherent script, it was in the file to begin with.
And then the script is easy to evolve. Once you realize that it’s a
good idea to also run the same benchmark against a different,
baseline version TigerBeetle, you replace ./tigerbeetle
with
./${tigerbeetle} and wrap everything into
async function benchmark(tigerbeetle: string) {
}
const tigerbeetle = Deno.args[0]
await benchmark(tigerbeetle);
$ ./make.ts tigerbeetle-baseline
$ ./make.ts tigerbeetle
A bit more hacking, and you end up with a repeatable benchmark schedule for a matrix of parameters:
for (const attempt of [0, 1])
for (const tigerbeetle of ["baseline", "tigerbeetle"])
for (const mode of ["normal", "viewchange"]) {
const results = $.path(
`./results/${tigerbeetle}-${mode}-${attempt}`,
);
await benchmark(tigerbeetle, mode, results);
}
That’s the gist of it. Don’t let the shell history be your source, capture it into the file first!