Subscribe Now
Trending News

Blog Post

Rr
News

Rr 

rr features:

  • Low overhead compared to other similar tools, especially on mostly-single-threaded workloads
  • Supports recording and replay of all kinds of applications: Firefox, Chrome, QEMU, LibreOffice,
    Go programs, …
  • Record, replay and debug multiple-process workloads, including entire containers
  • Works with gdb scripting and IDE integration
  • Durable,
    compact traces that
    can be ported between machines
  • Chaos mode to
    make intermittent bugs more reproducible

the rr debugging experience

Start by using rr to record your application:

$ rr record /your/application --args
...
FAIL: oh no!

The entire execution, including the failure, was saved to disk.
That recording can now be debugged.

$ rr replay
GNU gdb (GDB) ...
...
0x4cee2050 in _start () from /lib/ld-linux.so.2
(gdb)

Remember, you’re debugging the recorded trace
deterministically; not a live, nondeterministic
execution. The replayed execution’s address spaces, register
contents, syscall data etc are exactly the same in every run.

Most of the common gdb commands can be used.

(gdb) break mozilla::dom::HTMLMediaElement::HTMLMediaElement
...
(gdb) continue
Continuing.
...
Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...)
...

If you need to restart the debugging session, for example
because you missed breaking on some critical execution point, no
problem. Just use gdb’s run command to restart
replay.

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
...
Breakpoint 1, mozilla::dom::HTMLMediaElement::HTMLMediaElement (this=0x61362f70, aNodeInfo=...)
...
(gdb) 

The run command started another replay run of your
recording from the beginning. But after the session restarted,
the same execution was replayed again. And all your
debugging state was preserved across the restart.

Note that the this pointer of the
dynamically-allocated object was the same in both replay
sessions. Memory allocations are exactly the same in each
replay, meaning you can hard-code addresses you want to watch.

Even more powerful is reverse execution. Suppose we’re debugging Firefox layout:

Breakpoint 1, nsCanvasFrame::BuildDisplayList (this=0x2aaadd7dbeb0, aBuilder=0x7fffffffaaa0, aDirtyRect=..., aLists=...)
    at /home/roc/mozilla-inbound/layout/generic/nsCanvasFrame.cpp:460
460   if (GetPrevInFlow()) {
(gdp) p mRect.width
12000

We happen to know that that value is wrong. We want to find out where it was set.
rr makes that quick and easy.

(gdb) watch -l mRect.width
(gdb) reverse-cont
Continuing.
Hardware watchpoint 2: -location mRect.width
Old value=12000
New value=11220
0x00002aaab100c0fd in nsIFrame::SetRect (this=0x2aaadd7dbeb0, aRect=...)
    at /home/roc/mozilla-inbound/layout/base/../generic/nsIFrame.h:718
718       mRect=aRect;

This combination of hardware data watchpoints with reverse execution is extremely powerful!

video

This video shows a quick demo of rr recording and replaying Firefox.

This video demonstrates rr’s basic capabilities in a bit more
detail.

This video is a high-level technical talk by Robert O’Callahan about rr.

getting started

Build from source

Follow
these instructions.
Recommended if the packages don’t work for you — kernel changes and OS updates sometimes require rr changes.

Or in Fedora:

cd /tmp
wget https://github.com/rr-debugger/rr/releases/download/5.5.0/rr-5.5.0-Linux-$(uname -m).rpm
sudo dnf install rr-5.5.0-Linux-$(uname -m).rpm

Or in Ubuntu:

cd /tmp
wget https://github.com/rr-debugger/rr/releases/download/5.5.0/rr-5.5.0-Linux-$(uname -m).deb
sudo dpkg -i rr-5.5.0-Linux-$(uname -m).deb

background and motivation

rr’s original motivation was to make debugging of intermittent failures
easier. These failures are hard to debug because any given program run
may not show the failure. We wanted to create a tool that would record
program executions with low overhead, so you can record test executions
until you see a failure, and then replay the failing execution
repeatedly under a debugger until it has been completely understood.

We also hoped that deterministic replay would make debugging of any kind of bug
easier. With normal debuggers, information you learn during the debugging
session (e.g. the addresses of objects of interest, and the ordering of important
events) often becomes obsolete when you have to rerun the testcase.
With deterministic replay, that never needs to happen: your knowledge of
what happens during the failing run increases monotonically.

Furthermore, since debugging is the process of tracing effects to
their causes, it’s much easier if your debugger can execute backwards in time.
It’s well-known that
given a record/replay system which provides restartable checkpoints during replay,
you can simulate reverse execution to a particular point in time by restoring
the previous checkpoint and executing forwards to the desired point. So we hoped
that if we built a low-overhead record-and-replay system that works well on the
applications we care about (Firefox), we could build a really usable backend for
gdb’s reverse execution commands.

These goals have all been met. rr is not only a working
tool, but it’s being used regularly by developers on many large and small projects.

rr records a group of Linux user-space processes and captures all
inputs to those processes from the kernel, plus any nondeterministic CPU
effects performed by those processes (of which there are very few).
rr replay guarantees that execution preserves instruction-level control flow
and memory and register contents.
The memory layout is always the same, the addresses of objects
don’t change, register values are identical, syscalls return the
same data, etc.

Tools like fuzzers and randomized fault injectors become even
more powerful when used with rr. Those tools are very good at
triggering some intermittent failure, but it’s often
hard to reproduce that same failure again to debug it.
With rr, the randomized execution can simply be recorded. If
the execution failed, then the saved recording can be used to
deterministically debug the problem.

rr lowers the cost of fixing bugs. rr helps produce
higher-quality software for the same cost. rr also makes
debugging more fun.

rr in context

Record-and-replay debugging is an old idea; many systems
preceded rr. What makes rr different are the
design goals:

  • Initial focus on Firefox. Many record
    and replay techniques require specific programming languages or
    don’t scale well and thus can’t handle Firefox — or were just
    experimental and were never fleshed out. Firefox is
    a complex application, so given rr is useful for debugging
    Firefox, it is likely to be generally useful.
  • Deployability. rr runs on stock
    Linux kernels, on commodity hardware, and requires no
    system configuration changes. Many record and replay techniques
    require kernel changes. Many rely on running the OS in a virtual
    machine.
  • Low run-time overload. We want rr to replace
    gdb in your workflow. That means you need to start getting
    results with rr about as quickly as you would if you were
    using gdb. Low overhead also means less perturbation of tests.
  • Simplicity of design. We didn’t have a lot of resources
    to develop rr, so we avoided approaches that rely on complex techniques
    such as dynamic binary instrumentation. This simplicity has also made
    rr more robust and lower overhead.

The overhead of rr depends on your application’s workload. On
Firefox test suites, rr’s recording performance is quite usable.
We see slowdowns down to ≤ 1.2x. A 1.2x slowdown means that if
the suite takes 10 minutes to run by itself, it will take around
12 minutes to be recorded by rr. However, overhead can vary dramatically
depending on the workload. For mostly-single-threaded programs, rr has
much lower overhead than any competing record-and-replay system we know of.

limitations

rr …

  • emulates a single-core machine. So, parallel programs incur
    the slowdown of running on a single core. This is an inherent
    feature of the design.
  • cannot record processes that share memory with processes
    outside the recording tree. This is an inherent feature of the
    design. rr automatically disables features such as X shared
    memory for recorded processes to avoid this problem.
  • requires a reasonably modern x86 CPU. It depends on certain
    performance counter features that are not available in older
    CPUs.
  • requires knowledge of every system call executed by the
    recorded processes. It already supports a wide range of
    syscalls — those needed by Firefox and other applications people
    have tackled with rr — but support
    isn’t complete, so running rr on your application may
    uncover a syscall that needs to be implemented. Please
    file github issues
    for unsupported system calls.
  • sometimes needs to be updated in response to kernel changes,
    updates to system libraries, or new CPU families. If rr isn’t working
    for you (and the above caveats do not apply), please
    file an issue.

further reference

The Extended Technical Report
is our best overview of how rr works and performs.

The rr wiki
contains pages that cover technical topics related to rr.

Ask on the mailing
list
or on #rr on chat.mozilla.org if you have questions about rr.

Read More

Related posts

© Copyright 2022, All Rights Reserved