岌岌可危
Hung by a thread

原始链接: https://campedersen.com/rayon-mutex-deadlock

经过八小时的调试,一位机器人工程师发现人行道机器人冻结的原因:由于代码与`rerun`可视化SDK之间意外的交互导致了死锁。机器人核心控制循环以100Hz的频率运行,但在通过WebRTC连接LiDAR流后16秒停止。 最初修复问题的尝试——调整线程模型和互斥锁——失败了。一个心跳线程显示循环并没有变慢,而是*被阻塞*,从而发现了`rerun`内部生成的错误Rayon工作线程。问题源于在持有互斥锁时调用`rerun.log()`,从而触发了Rayon工作窃取死锁。 解决方案很简单:减少互斥锁的持有时间。这位工程师学到了宝贵的经验:GDB对于死锁至关重要,日志不足以进行线程状态分析,并且依赖项可能会引入隐藏的线程复杂性。他们还提倡使用心跳线程来检测停滞的循环,并向`rerun`提交了一个PR来记录这个问题。

这个Hacker News讨论围绕一篇博客文章,详细描述了与机器人调试8小时的沮丧经历。核心问题是:机器人会莫名其妙地冻结,不会崩溃或报错,只是停止所有功能。 评论者讨论调试理念,一些人提倡首先使用调试器,而不是仅仅依赖打印语句——这通常是Windows C++和其他开发文化之间的差异。这个事件引发了关于安全关键系统需要细致代码审查的讨论,因为存在隐藏的假设。 进一步的讨论涉及现代软件的复杂性,渴望像Haskell这样强类型语言的安全性,以及对博客文章呈现方式的沮丧(特别是令人讨厌的图像翻转效果和感知到的AI生成内容)。最终,修复是一个简单的两行代码修改:减少锁保持的时间。
相关文章

原文
Late night debugging session with robot on workbench

It's 2am. My robot is frozen. Not crashed, not erroring, just... vibing. Sitting there. Motors off. Completely checked out.

I've been debugging for 8 hours and I'm about to mass delete my entire codebase and become a farmer.

The Setup

I'm building autonomous sidewalk robots. The control loop runs at 100Hz — every 10ms we read sensors, do math, send motor commands. It's the heartbeat. The one thing that absolutely cannot stop.

It had been rock solid for weeks. Then I added LiDAR streaming over WebRTC.

Now, ~16 seconds after a client connects, the loop just stops. Doesn't crash. Doesn't throw. Just ghosts me. The watchdog starts barking, the robot coasts to a stop, and my laptop shows a beautiful 3D point cloud of a robot that has given up on life.

control_loop.rs

iteration:0

The Wrong Turns

I tried everything.

"It's tokio starving the loop" — switched to std::thread::sleep. Nope.

"It's the async mutex" — swapped for std::sync::Mutex. Nope.

"It's running on the wrong thread" — moved the whole loop to std::thread::spawn. Complete isolation. Nope nope nope.

Same freeze. Same spot. Iteration 1,615. Every single time.

The consistency was almost insulting. Like the bug was laughing at me.

The Breakthrough

Ok new plan. I add a heartbeat thread. Just a lil guy that watches a counter and screams if it stops: