展示HN:我制作了一个零拷贝协程追踪器来查找我的调度器的丢失唤醒。
Show HN: I made a zero-copy coroutine tracer to find my scheduler's lost wakeups

原始链接: https://github.com/lixiasky-back/coroTracer

## coroTracer:协程调度器调试工具 coroTracer 是一种新颖的进程外跟踪器,旨在识别 M:N 协程调度器中的逻辑错误——传统工具如 ASAN/TSAN 无法检测到的问题。它的创建者构建它是为了解决调度器中的性能瓶颈,即在负载下吞吐量会意外降至零,原因是“丢失的唤醒”,协程无限期地等待已关闭的文件描述符。 该工具通过利用锁无关的共享内存段(mmap)来工作,目标应用程序(C++、Rust、Zig 等)将状态更改写入该段。一个单独的 Go 引擎读取此数据,以最小的开销重建协程执行拓扑——没有网络调用或上下文切换。关键优化包括缓存行对齐和使用 Unix 域套接字“智能唤醒”系统,以避免不必要的系统调用。 coroTracer 输出一个 `trace.jsonl` 文件,可以通过 markdown 报告(检测 SIGBUS 和丢失的唤醒等问题)或交互式 HTML 仪表板进行分析。创建者成功地使用它定位了 47 个等待已关闭套接字读取的卡住的协程,揭示了一个 `close(fd)` 之后没有跟随必要的 `.resume()` 调用的错误。 该项目提供了一个 C++20 SDK,并设计为可适应支持 `mmap` 的其他语言。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Show HN: 我制作了一个零拷贝协程追踪器来寻找我的调度器的丢失唤醒 (github.com/lixiasky-back) 6 分,来自 lixiasky 1 小时前 | 隐藏 | 过去的 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Go Engine SDK C++ Arch License zread

UDSWakeupMechanics.gif

Why I built this: I was dealing with a really annoying bug in my M:N scheduler. Under heavy load, throughput would just flatline to zero. I ran ASAN and TSAN, but they came up empty because no memory was actually corrupted. It turned out to be a "lost wakeup"—coroutines were stuck forever waiting on a closed file descriptor. Traditional tools just can't catch these logical state machine breaks. I wrote coroTracer to track this exact issue down, and it worked.

coroTracer is an out-of-process tracer for M:N coroutine schedulers. It tracks down logical deadlocks, broken state machines, and coroutine leaks.


+-----------------------+                               +-----------------------+
|   Target Application  |                               |    Go Tracer Engine   |
|  (C++, Rust, Zig...)  |                               |                       |
|                       |       [ Lock-Free SHM ]       |                       |
|  +-----------------+  |      +-----------------+      |  +-----------------+  |
|  |  cTP SDK Probe  |=======> | StationData [N] | <=======|  Harvester Loop   |  |
|  +-----------------+  |  Write +-----------------+ Read |  +-----------------+  |
|                       |               ^               |                       |
|       [ Socket ]      |---(Wakeup)---UDS---(Listen)---|      [ File I/O ]     |
+-----------------------+                               +-----------------------+
                                                                        | (Append)
                                                                        v
        +-------------------------+      [ DeepDive ]           +---------------+
        | Interactive HTML Portal | <--- analyzer.go ---------  |  trace.jsonl  |
        +-------------------------+      (Heuristics)           +---------------+

The main idea is simple: keep the tracer out of the target process's way.

  • Execution Plane: The C++/Rust SDK writes state changes directly into pre-allocated shared memory using lock-free data structures.
  • Observation Plane: A separate Go engine pulls this data in the background to build the topology. No network overhead, zero context switching.
  • cTP Memory Contract: It runs on mmap. We force a strict 1024-byte alignment so different compilers don't mess things up with implicit padding.
  • 64-byte Cache Line Alignment: Event slots match CPU cache lines exactly. This stops multi-threaded false sharing dead in its tracks during concurrent writes.
  • Zero-Copy: Data moves purely via pointer offsets and hardware atomics. No RPCs, zero serialization.
  • Smart UDS Wakeup:
    • When the Go engine is idle, it sets a TracerSleeping flag in the shared memory.
    • The SDK does a quick atomic load to check this flag before writing.
    • It only fires a 1-byte Unix Domain Socket (UDS) signal to wake the engine if it's actually asleep. This prevents syscall storms when throughput is high.

The Go engine handles the SHM/UDS allocation and starts your app.

# Build the tracer
go build -o coroTracer main.go

# Run it
./coroTracer -n 256 -cmd "./your_target_app" -out trace.jsonl

2. Drop in the SDK (C++20 Example)

Your app grabs the IPC config automatically from environment variables.

#include "coroTracer.h"

int main() {
    corotracer::InitTracer(); // Sets up mmap and connections
    // ... start your scheduler
}

Inherit PromiseMixin to hook the lifecycle:

struct promise_type : public corotracer::PromiseMixin {
    // Your code here. coroTracer handles await_suspend / await_resume under the hood.
};
# Markdown report (auto-detects SIGBUS and lost wakeups)
./coroTracer -deepdive -out trace.jsonl

# Interactive HTML dashboard
./coroTracer -html -out trace.jsonl

When I was testing my tiny_coro scheduler, it kept freezing under heavy load. Throughput dropped to zero, but the sanitizers said everything was fine.

I attached coroTracer, and the report showed exactly 47 coroutines permanently stuck in a Suspended state. Their instruction pointers were all parked at co_await AsyncRead(fd).

What went wrong: During a massive spike of EOF/RST events, the worker thread correctly called close(fd), but it completely missed calling .resume() for the coroutines tied to that descriptor. The socket was gone, but the state machine logic was broken. Those coroutines were just stranded in the heap, waiting for a wakeup that would never happen.


Offset Field Size Description
0x00 MagicNumber 8B 0x434F524F54524352
0x14 SleepFlag 4B Engine sleep flag (1 = Sleeping)
0x40 EventSlots 512B 8 ring buffers aligned to 64B

Full spec in cTP.md


Right now, I've provided a C++20 SDK. But since the core just relies on a strict memory mapping contract, you can easily write a probe for Rust, Zig, or C—basically anything that supports mmap.

Contact: [email protected]

联系我们 contact @ memedata.com