rr – C/C++ 的记录和重播调试器
rr – record and replay debugger for C/C++

原始链接: https://rr-project.org/

用户描述了如何使用“rr”命令行工具来记录和重播应用程序的执行,特别关注其用于调试目的的用法。 如果出现失败,可以保存整个执行过程以供进一步分析。 用户可以使用 GDB(GNU 调试器)开始调试记录的跟踪,设置断点,继续执行,并观察特定变量。 重新启动调试会话需要使用 GDB 的运行命令,允许它们从中断的地方继续。 观察内存分配和利用反向执行特别有用,因为它可以观察随时间的变化并跟踪问题。 用户指出,由于重放的确定性,收集的信息在整个调试过程中保持一致和相关,与传统方法相比,更容易查明问题的根本原因。 此外,该工具还提供与模糊器和随机故障注入等其他工具的兼容性,从而提高尝试重现和修复错误时的整体效率。 总体而言,“rr”简化了查找和解决软件错误的过程,提高了工作效率,同时使调试体验变得愉快。

C++20 中引入的“[[no_unique_address]]”允许多个空字段占用结构内的同一内存位置,与具有不同的空字段相比,减少了结构的总体大小。 然而,这一功能是有限的,因为每种类型只允许有一个这样的字段。 因此,在实践中,它的使用会导致在 C++20 之前从空基类继承时遇到类似的问题。 作者发现此功能很麻烦且不必要,因为他们在空对象中找不到实用性,而更喜欢使用非空对象。 另一方面,Rust 提供了一个名为 '!' 的空类型 它表示不包含任何值并导致无法访问的代码路径的表达式。 此外,Rust 有几个转换特征,例如 From、Into、TryFrom 和 TryInto,其中实现“From”等特征会自动暗示其他三个特征的实现。 这些特征允许类型之间的灵活转换,包括可以通过各种方法处理的潜在故障场景,从而消除在某些情况下显式错误处理的需要。 最后,作者对 Rust 更严格的借用检查器表示赞赏,声称它有助于防止在 C 和 C++ 等不存在借用检查器的语言中进行代码重构期间引入的微妙错误。 虽然承认 Rust 的调试器支持可以从改进中受益,但他们欣赏能够充满信心地编写更复杂的借用逻辑的能力,因为知道借用检查器将捕获可能导致意外行为的边缘情况。 此外,作者建议跨不同工具(包括调试器)的语言支持理想情况下应该提供一致的体验,无论目标代码是提前编译、解释还是通过即时 (JIT) 编译器执行。
相关文章

原文

Start by using rr to record your application:

$ rr record /your/application --args
...
FAIL: oh no!

The entire execution, including the failure, was saved to disk. That recording can now be debugged.

$ rr replay
GNU gdb (GDB) ...
...
0x4cee2050 in _start () from /lib/ld-linux.so.2
(gdb)

Remember, you're debugging the recorded trace deterministically; not a live, nondeterministic execution. The replayed execution's address spaces, register contents, syscall data etc are exactly the same in every run.

Most of the common gdb commands can be used.

(gdb) break mozilla::dom::HTMLMediaElement::HTMLMediaElement
...
(gdb) continue
Continuing.
...
Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...)
...

If you need to restart the debugging session, for example because you missed breaking on some critical execution point, no problem. Just use gdb's run command to restart replay.

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
...
Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...)
...
(gdb) 

The run command started another replay run of your recording from the beginning. But after the session restarted, the same execution was replayed again. And all your debugging state was preserved across the restart.

Note that the this pointer of the dynamically-allocated object was the same in both replay sessions. Memory allocations are exactly the same in each replay, meaning you can hard-code addresses you want to watch.

Even more powerful is reverse execution. Suppose we're debugging Firefox layout:

Breakpoint 1, nsCanvasFrame::BuildDisplayList (this=0x2aaadd7dbeb0, aBuilder=0x7fffffffaaa0, aDirtyRect=..., aLists=...)
    at /home/roc/mozilla-inbound/layout/generic/nsCanvasFrame.cpp:460
460   if (GetPrevInFlow()) {
(gdp) p mRect.width
12000
We happen to know that that value is wrong. We want to find out where it was set. rr makes that quick and easy.
(gdb) watch -l mRect.width
(gdb) reverse-cont
Continuing.
Hardware watchpoint 2: -location mRect.width
Old value = 12000
New value = 11220
0x00002aaab100c0fd in nsIFrame::SetRect (this=0x2aaadd7dbeb0, aRect=...)
    at /home/roc/mozilla-inbound/layout/base/../generic/nsIFrame.h:718
718       mRect = aRect;
This combination of hardware data watchpoints with reverse execution is extremely powerful!

rr's original motivation was to make debugging of intermittent failures easier. These failures are hard to debug because any given program run may not show the failure. We wanted to create a tool that would record program executions with low overhead, so you can record test executions until you see a failure, and then replay the failing execution repeatedly under a debugger until it has been completely understood.

We also hoped that deterministic replay would make debugging of any kind of bug easier. With normal debuggers, information you learn during the debugging session (e.g. the addresses of objects of interest, and the ordering of important events) often becomes obsolete when you have to rerun the testcase. With deterministic replay, that never needs to happen: your knowledge of what happens during the failing run increases monotonically.

Furthermore, since debugging is the process of tracing effects to their causes, it's much easier if your debugger can execute backwards in time. It's well-known that given a record/replay system which provides restartable checkpoints during replay, you can simulate reverse execution to a particular point in time by restoring the previous checkpoint and executing forwards to the desired point. So we hoped that if we built a low-overhead record-and-replay system that works well on the applications we care about (Firefox), we could build a really usable backend for gdb's reverse execution commands.

These goals have all been met. rr is not only a working tool, but it's being used regularly by developers on many large and small projects.

rr records a group of Linux user-space processes and captures all inputs to those processes from the kernel, plus any nondeterministic CPU effects performed by those processes (of which there are very few). rr replay guarantees that execution preserves instruction-level control flow and memory and register contents. The memory layout is always the same, the addresses of objects don't change, register values are identical, syscalls return the same data, etc.

Tools like fuzzers and randomized fault injectors become even more powerful when used with rr. Those tools are very good at triggering some intermittent failure, but it's often hard to reproduce that same failure again to debug it. With rr, the randomized execution can simply be recorded. If the execution failed, then the saved recording can be used to deterministically debug the problem.

rr lowers the cost of fixing bugs. rr helps produce higher-quality software for the same cost. rr also makes debugging more fun.

联系我们 contact @ memedata.com