Why I built this: I was dealing with a really annoying bug in my M:N scheduler. Under heavy load, throughput would just flatline to zero. I ran ASAN and TSAN, but they came up empty because no memory was actually corrupted. It turned out to be a "lost wakeup"—coroutines were stuck forever waiting on a closed file descriptor. Traditional tools just can't catch these logical state machine breaks. I wrote coroTracer to track this exact issue down, and it worked.
coroTracer is an out-of-process tracer for M:N coroutine schedulers. It tracks down logical deadlocks, broken state machines, and coroutine leaks.
+-----------------------+ +-----------------------+
| Target Application | | Go Tracer Engine |
| (C++, Rust, Zig...) | | |
| | [ Lock-Free SHM ] | |
| +-----------------+ | +-----------------+ | +-----------------+ |
| | cTP SDK Probe |=======> | StationData [N] | <=======| Harvester Loop | |
| +-----------------+ | Write +-----------------+ Read | +-----------------+ |
| | ^ | |
| [ Socket ] |---(Wakeup)---UDS---(Listen)---| [ File I/O ] |
+-----------------------+ +-----------------------+
| (Append)
v
+-------------------------+ [ DeepDive ] +---------------+
| Interactive HTML Portal | <--- analyzer.go --------- | trace.jsonl |
+-------------------------+ (Heuristics) +---------------+
The main idea is simple: keep the tracer out of the target process's way.
- Execution Plane: The C++/Rust SDK writes state changes directly into pre-allocated shared memory using lock-free data structures.
- Observation Plane: A separate Go engine pulls this data in the background to build the topology. No network overhead, zero context switching.
- cTP Memory Contract: It runs on
mmap. We force a strict 1024-byte alignment so different compilers don't mess things up with implicit padding. - 64-byte Cache Line Alignment: Event slots match CPU cache lines exactly. This stops multi-threaded false sharing dead in its tracks during concurrent writes.
- Zero-Copy: Data moves purely via pointer offsets and hardware atomics. No RPCs, zero serialization.
- Smart UDS Wakeup:
- When the Go engine is idle, it sets a
TracerSleepingflag in the shared memory. - The SDK does a quick atomic load to check this flag before writing.
- It only fires a 1-byte Unix Domain Socket (UDS) signal to wake the engine if it's actually asleep. This prevents syscall storms when throughput is high.
- When the Go engine is idle, it sets a
The Go engine handles the SHM/UDS allocation and starts your app.
# Build the tracer
go build -o coroTracer main.go
# Run it
./coroTracer -n 256 -cmd "./your_target_app" -out trace.jsonlYour app grabs the IPC config automatically from environment variables.
#include "coroTracer.h"
int main() {
corotracer::InitTracer(); // Sets up mmap and connections
// ... start your scheduler
}Inherit PromiseMixin to hook the lifecycle:
struct promise_type : public corotracer::PromiseMixin {
// Your code here. coroTracer handles await_suspend / await_resume under the hood.
};# Markdown report (auto-detects SIGBUS and lost wakeups)
./coroTracer -deepdive -out trace.jsonl
# Interactive HTML dashboard
./coroTracer -html -out trace.jsonlWhen I was testing my tiny_coro scheduler, it kept freezing under heavy load. Throughput dropped to zero, but the sanitizers said everything was fine.
I attached coroTracer, and the report showed exactly 47 coroutines permanently stuck in a Suspended state. Their instruction pointers were all parked at co_await AsyncRead(fd).
What went wrong:
During a massive spike of EOF/RST events, the worker thread correctly called close(fd), but it completely missed calling .resume() for the coroutines tied to that descriptor. The socket was gone, but the state machine logic was broken. Those coroutines were just stranded in the heap, waiting for a wakeup that would never happen.
| Offset | Field | Size | Description |
|---|---|---|---|
0x00 |
MagicNumber |
8B | 0x434F524F54524352 |
0x14 |
SleepFlag |
4B | Engine sleep flag (1 = Sleeping) |
0x40 |
EventSlots |
512B | 8 ring buffers aligned to 64B |
Full spec in cTP.md
Right now, I've provided a C++20 SDK. But since the core just relies on a strict memory mapping contract, you can easily write a probe for Rust, Zig, or C—basically anything that supports mmap.
Contact: [email protected]
