震动代码警告 – 一份个人案例研究

原文

A stateful SWD protocol implementation for debugging RP2350 RISC-V cores (Hazard3) from any Raspberry Pi Pico2 (target) using GPIO's on another Pico (probe).

0. VIBE CODE WARNING (WRITTEN BY HUMAN)

About 80% of the code is vibe coded; The readme is almost completely generated (except the whole vibe-code-warning section). I spent many nights with the oscilloscope and the docs and made a working prototype that was able ti do sba/read/write regs and do abstract commands and progbuf, the rest was done with claude code. The tests are quite comprehensive test suite and I use the core of the library in my own projects, but, as they say, "hic sunt dracones". I also read the readme and the code didn't notice anything wrong (and removed the wrong/unclear parts).

This project was my casestudy of vibecoding a more complicated project that I dont understand 100% and there is no obvious existing code that can be "used". It started as ~1000 loc that I have written and knew very well, reading the rp2350, arm swd and riscv debug docs, capturing data with oscilloscope and openocd then decoding it and analyzing the wakeup sequence and then read/write commands. After I got it working I gave it to claude to make it into a library that I can use in other projects, and then I slowly built it up.

After about 3-4k lines of code I completely lost track of what is going on, and I woudn't consider this code that I have written, but adding more and more tests felt "nice", or at least reassuring.

There was a some gaslighting, particularly when it misunderstood dap_read_mem32 thinking it is reading from ram and not MEM-AP TAR/DRW/RDBUFF protocol, which lead to incredible amount of nonsense.

Overall I would say it was a horrible experience, even though it took 10 hours to write close to 10000 lines of code, I don't consider this my project, and I have no sense of acomplishment or growth.

In contrast, using AI to read all the docs (which are thousands of pages) and write helpful scripts to decode the oscilloscope data, create packed C structs from docs and etc, was very nice, and I did feel good after. The moment I read the first register and then when I was able to read memory via SBA I felt amazing.

The main issue is taste, when I write code I feel if its good or bad, as I am writing it, I know if its wrong, but using claude code I get desensitized very quickly and I just can't tell, it "reads" OK, but I don't know how it feels. In this case it happened when the code grew about 4x, from 1k to 4k lines. And worse of all, my mental model of the code is completely gone, and with it my ownership.

The tokens have no reason or purpose, which makes reading code ridiculously difficult, as and each token can be complete nonsense. When reading human code the symbols have a purpose, someone thought "I will put this in a variable, later I will check its status.", so I pretend I am them, and think why would they have written this? Shortly after I understand, as they are human and I am human. But the AI symbols have no reason, and worse of all, they all look deceptively correct, so I have to think 10 times harder if it is wrong. With any human code (including your own) it is quite easy to gauge how much you can trust it, and it is quite consistent, with the AI code, one function can be much better than what you woudld've written, and the code 2 lines below can be cargo culted gunk that looks incredibly good, but is structurally wrong.

In the end I would say I have gained good understanding of the wires, timings, and the lower level ap/dp mechanics, sba and progbuf, but I regret not writing the whole thing myself, even if it would've taken 10x the time.

I fucking hate this.

And I can not help, but feel dusgust and shame. Is this what programming is now? I really hope this is some intermediate stage and it changes for the better, the problem is I dont know what "better" is, it seems for some people is not writing the code, for others is not modeling the problem and for third is not having to think. For me, I am not sure, I do want to make things, and many times I dont want to know something, but I want to use it, e.g. the rp2350 usb host controller the way you have to re-arm interrupts and the way the epx register is shared is super annoying, for good reasons probably, but I just want to use it to make my CBI driver.

I guess the question is what is the thing I want to make, because you can go way up the stack, from the USB chip registers to CBI to UFI to FAT16 to the OS of the old school computer I am making, but why stop? make the schematics, the pcbs, the cad files, maybe automatically send it to the factory? and then just ship it to me? but why stop? make my webshop, start selling, make a community, ads, marketing, generate some unboxing videos, maybe some viral memes? process the orders directly to the factory, on demand, if there is an issue, it is ready with customer support.

What do I do in the meanwhile? Sit on the beach? I hate the beach.

Where does it stop?

1. ARCHITECTURAL OVERVIEW

This library implements a complete three-layer abstraction for Serial Wire Debug protocol communication with RP2350's RISC-V Debug Module, modeled after the Debug Access Port specification and informed by ARM Debug Interface Architecture Specification v5.2.

┌────────────────────────────────────────┐
│  Application Layer                     │
│  (User Code)                           │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│  Debug Module Layer (rp2350.c)         │
│  - RISC-V Debug Specification v0.13    │
│  - Hart control via DMCONTROL          │
│  - Abstract commands for GPR access    │
│  - System Bus Access (non-intrusive)   │
│  - PROGBUF execution for CSR access    │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│  Debug Access Port Layer (dap.c)       │
│  - DP/AP register transactions         │
│  - RP2350-specific DP_SELECT encoding  │
│  - Bank selection caching              │
│  - Memory-mapped debug register access │
└────────────────┬───────────────────────┘
                 │
┌────────────────▼───────────────────────┐
│  Serial Wire Debug Layer (swd*.c)      │
│  - 2-wire bidirectional protocol       │
│  - PIO state machine bit-banging       │
│  - Request/ACK/Data phase handling     │
│  - Parity computation and verification │
│  - Line reset and dormant sequences    │
└────────────────────────────────────────┘

The separation of concerns follows classical protocol stack design: each layer exposes a well-defined interface and maintains independent state, with lower layers unaware of higher-layer semantics.

2. RISC-V DEBUG ARCHITECTURE: A FORMAL MODEL

Before examining the protocol implementation, we must establish the theoretical foundations of RISC-V external debugging. This section develops the debug architecture from first principles, following the RISC-V External Debug Support Specification v0.13.

2.A The Hart State Machine

A RISC-V hart (hardware thread) exists in one of three abstract states:

                    ┌─────────────┐
                    │   RUNNING   │
                    │  (Normal)   │
                    └──────┬──────┘
                           │
                  halt_request, ebreak,
                  trigger_fire, step_complete
                           │
                           ▼
                    ┌─────────────┐
                    │   HALTED    │
                    │  (Debug)    │
                    └──────┬──────┘
                           │
                     resume_request
                           │
                           ▼
                    ┌─────────────┐
                    │  RESUMING   │
                    │ (Transient) │
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   RUNNING   │
                    └─────────────┘

State 1: RUNNING - The hart executes instructions from main memory. PC advances according to program flow. All architectural state (GPRs, CSRs, memory) is accessible to the executing program.

State 2: HALTED - The hart has entered debug mode. No instructions from main memory execute. The hart is "parked" in a special debug ROM or implicit debug loop within the Debug Module. Debug-specific CSRs (DPC, DCSR, DSCRATCH) become accessible.

State 3: RESUMING - A transient state where the hart has received a resume request but has not yet returned to normal execution. This state exists to model the asynchronous nature of resume operations.

2.B The Debug Module: An Independent Controller

The Debug Module (DM) is a hardware block separate from the hart itself. It acts as a "shadow controller" that can:

Observe hart state without halting (DMSTATUS register)
Command hart transitions (halt, resume, reset via DMCONTROL)
Access hart registers when halted (abstract commands)
Access system memory independently of hart state (System Bus Access)

The DM is itself controlled by an external debugger via a Debug Transport Module (DTM). In our case, the DTM is the SWD interface.

┌──────────────────────────────────────────────────┐
│  External Debugger (Host CPU)                    │
└────────────────────┬─────────────────────────────┘
                     │
                     ▼ SWD Protocol
┌──────────────────────────────────────────────────┐
│  Debug Transport Module (DTM)                    │
│  - Exposes DM registers as memory-mapped space   │
└────────────────────┬─────────────────────────────┘
                     │
                     ▼ Internal Bus
┌──────────────────────────────────────────────────┐
│  Debug Module (DM)                               │
│  ┌──────────────┐  ┌──────────────┐              │
│  │  Abstract    │  │  System Bus  │              │
│  │  Command     │  │  Master      │              │
│  │  Engine      │  │              │              │
│  └──────┬───────┘  └──────┬───────┘              │
│         │                 │                      │
└─────────┼─────────────────┼──────────────────────┘
          │                 │
          ▼                 ▼
    ┌─────────────┐   ┌──────────────┐
    │  Hart 0     │   │ System Bus   │
    │  (Hazard3)  │   │              │
    ├─────────────┤   └──────────────┘
    │  Hart 1     │
    │  (Hazard3)  │
    └─────────────┘

2.C Debug Mode: A Privileged Exception Context

When a hart enters debug mode, it is not simply "stopped." Rather, it enters a special execution context analogous to an exception handler:

PC is saved to DPC (Debug Program Counter, CSR 0x7b1)
Privilege level is elevated to M-mode (Machine mode, highest privilege)
DCSR.cause records the entry reason (halt request, ebreak, trigger, etc.)
Hart begins executing from the debug exception vector (typically in debug ROM)

The debug exception vector contains a tight polling loop that repeatedly checks for commands from the Debug Module. This loop is architecturally invisible to the debugger—we simply observe the hart as "halted."

2.D Abstract Commands: The Debug Module API

The Abstract Command mechanism provides a hardware-implemented function call interface. Each abstract command is a 32-bit word written to the COMMAND register that encodes:

31                            24 23                            0
┌────────────────────────────────┬────────────────────────────────┐
│         cmdtype                │         command-specific       │
└────────────────────────────────┴────────────────────────────────┘

cmdtype=0: Access Register

31      24 23          20 19 18 17 16 15                          0
┌──────────┬─────────────┬─┬──┬──┬──┬──────────────────────────────┐
│    0     │   aarsize   │0│pc│tr│wr│         regno                │
└──────────┴─────────────┴─┴──┴──┴──┴──────────────────────────────┘

aarsize: Access size (2 = 32-bit)
aarpostincrement: Ignored
postexec: Execute program buffer after transfer
transfer: Perform the transfer (1=yes)
write: Direction (1=write, 0=read)
regno: Register number (0x1000-0x101f for GPRs x0-x31)

The Debug Module hardware interprets this command and performs the register access autonomously. From the debugger's perspective, this is a synchronous operation: write COMMAND, poll ABSTRACTCS.busy until clear, read result from DATA0.

2.E The Program Buffer: A Programmable Exception Handler

The Program Buffer (PROGBUF) is a small instruction memory (2-16 entries) within the Debug Module. When abstract commands cannot accomplish a task (e.g., accessing debug-only CSRs), the debugger can:

Write RISC-V instructions to PROGBUF
Issue an abstract command with the postexec bit set
The hart executes PROGBUF instructions while still in debug mode
The final ebreak instruction returns control to the Debug Module

This is not a "code injection" attack—the hart never leaves debug mode. It's analogous to a debugger writing instructions into a trap handler's stack frame.

2.F System Bus Access: A Parallel Execution Path

SBA provides a second path to memory that bypasses the hart entirely:

         Debugger Commands
                │
                ▼
         ┌─────────────┐
         │     DM      │
         └──┬───────┬──┘
            │       │
   ┌────────┘       └───────┐
   │                        │
   ▼ Abstract Cmd           ▼ SBA
┌──────┐                ┌─────────┐
│ Hart │───────────────▶│ Memory  │
└──────┘  Hart Accesses └─────────┘

The hart and SBA compete for memory bus bandwidth. The hart's view of memory may differ from SBA's view due to:

Cache: Hart caches writes; SBA sees stale memory
MMU/PMP: Hart accesses are translated/protected; SBA bypasses
Atomicity: Hart's atomic operations (LR/SC) are invisible to SBA

This is not a bug—it's a fundamental architectural trade-off. SBA provides speed and non-intrusiveness at the cost of coherency guarantees.

2.G The Debugging Contract

RISC-V debugging rests on several invariants:

Invariant 1: Debug Mode is Atomic While in debug mode, the hart executes no instructions from main memory. The debugger has exclusive control.

Invariant 2: Architectural Transparency Entering and exiting debug mode does not change architected state (except DPC/DCSR). The program cannot detect it was halted (modulo real-time constraints).

Invariant 3: Debug Privilege Debug mode executes at maximum privilege (M-mode). All memory is accessible, all CSRs are readable.

Invariant 4: No Interrupts in Debug Interrupts are masked while in debug mode. The debugger must explicitly re-enable them.

These invariants enable reproducible debugging: halting twice at the same PC should show identical state.

3. THE SERIAL WIRE DEBUG PROTOCOL

Serial Wire Debug (SWD) is a 2-wire replacement for JTAG's 5-wire interface, developed by ARM. The protocol operates over two signals:

SWCLK: Clock signal driven by the debugger (host)
SWDIO: Bidirectional data signal with turnaround phases

3.A Protocol Packet Structure

Each SWD transaction consists of three phases:

Request Phase (8 bits, host drives SWDIO):

Bit 0:     Start (always 1)
Bit 1:     APnDP (0=DP access, 1=AP access)
Bit 2:     RnW (0=Write, 1=Read)
Bit 3-4:   A[3:2] (register address bits)
Bit 5:     Parity (even parity of bits 1-4)
Bit 6:     Stop (always 0)
Bit 7:     Park (always 1)

Acknowledge Phase (3 bits, target drives SWDIO):

OK    (001): Transaction accepted
WAIT  (010): Target requests retry
FAULT (100): Error condition

Data Phase (33 bits, direction depends on RnW):
```
Bits 0-31: Data word
Bit 32:    Parity bit
```

Turnaround cycles (host releases SWDIO, target can drive) occur between request→ack and during data phase direction changes.

Our implementation of the packet construction is in swd_protocol.c:97-113:

static uint8_t make_swd_request(bool APnDP, bool RnW, uint8_t addr) {
    uint8_t a2 = (addr >> 2) & 1;
    uint8_t a3 = (addr >> 3) & 1;
    uint8_t parity = (APnDP + RnW + a2 + a3) & 1;

    uint8_t request = 0;
    request |= (1 << 0);          // Start bit
    request |= (APnDP << 1);      // AP/DP select
    request |= (RnW << 2);        // Read/Write
    request |= (a2 << 3);         // Address bit 2
    request |= (a3 << 4);         // Address bit 3
    request |= (parity << 5);     // Parity
    request |= (0 << 6);          // Stop bit
    request |= (1 << 7);          // Park bit
    return request;
}

3.B PIO-Based Physical Layer

Unlike software bit-banging (which suffers from timing jitter and CPU overhead), this implementation uses RP2040/RP2350's Programmable I/O (PIO) blocks for deterministic timing.

The PIO program (swd.pio) implements a command-based interface where each FIFO entry encodes either a command or data payload. Command format:

Bits 0-7:   Bit count - 1
Bit 8:      Direction (0=input, 1=output)
Bits 9-13:  Target instruction address

The state machine operates at 4 cycles per clock period, providing precise SWCLK generation independent of system clock frequency. See swd.pio:45-68 for the complete implementation.

Clock divider calculation (swd_protocol.c:313-330) accounts for this 4-cycle period:

uint32_t clk_sys_khz = clock_get_hz(clk_sys) / 1000;
uint32_t divider = (((clk_sys_khz + freq_khz - 1) / freq_khz) + 3) / 4;

3.C The Dormant State and Protocol Selection

ARM Debug Interface Architecture v6 introduces a Dormant State to enable coexistence of multiple debug protocols (JTAG and SWD) on the same pins. At power-up, RP2350's SW-DP enters the Dormant state, requiring explicit activation before SWD operations can proceed.

The dormant state solves a fundamental problem: JTAG uses 5 signals (TMS, TCK, TDI, TDO, TRST), while SWD uses 2 (SWCLK, SWDIO). When both protocols share physical pins, the debug port must determine which protocol the debugger intends to use. The solution is to require a protocol-specific "unlock" sequence that:

Cannot be generated accidentally by non-debug traffic on the pins
Is sufficiently long to avoid false positives (128 bits)
Uniquely identifies the target protocol (JTAG vs SWD)

3.C.1 The State Transition Model

The SW-DP implements a finite state machine with three protocol modes:

Power-On → [Default State] → Dormant
                                 │
                    ┌────────────┼────────────┐
                    │                         │
         JTAG Activation              SWD Activation
         Sequence (0x33bbbbba)        Sequence (0x1a)
                    │                         │
                    ▼                         ▼
              ┌──────────┐              ┌──────────┐
              │   JTAG   │              │   SWD    │
              │  Active  │              │  Active  │
              └────┬─────┘              └────┬─────┘
                   │                         │
         JTAG-to-Dormant              SWD-to-Dormant
         Sequence                     Sequence
                   │                         │
                   └──────────┬──────────────┘
                              ▼
                         ┌──────────┐
                         │ Dormant  │
                         └──────────┘

Once activated, the debug port remains in the selected protocol mode until:

A transition-to-dormant sequence is sent
Power is cycled
The external reset (RUN) pin is asserted

3.C.2 The Selection Alert Sequence

Before sending a protocol-specific activation code, ARM requires transmission of a 128-bit Selection Alert Sequence. This sequence serves as a "wake-up call" that:

Synchronizes the target's bit-stream parser
Ensures the target is listening for an activation sequence
Provides sufficient entropy to avoid accidental activation

The Selection Alert Sequence is a fixed 128-bit pattern defined in the ADI v6 specification:

0x19bc0ea2_e3ddafe9_86852d95_6209f392 (transmitted LSB-first)

This constant was chosen for its Hamming distance properties—it is unlikely to occur in normal signal traffic or be generated by crosstalk, glitches, or other non-debug activity.

3.C.3 Implementation: Robust Activation Strategy

Our implementation (swd_protocol.c:357-382) uses a defensive activation strategy that ensures reliable connection regardless of the SW-DP's initial state:

// Phase 1: Exit any prior protocol mode
static const uint8_t seq_jtag_to_dormant[] = {
    0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xbc,0xe3
};

// Phase 2: Activate SWD mode
static const uint8_t seq_dormant_to_swd[] = {
    0xff,                                        // Line reset (8 ones)
    0x92,0xf3,0x09,0x62,0x95,0x2d,0x85,0x86,     // Selection Alert
    0xe9,0xaf,0xdd,0xe3,0xa2,0x0e,0xbc,0x19,     //   (128 bits)
    0xa0,0xf1,0xff,                              // SWD Activation Code (0x1a)
    0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xff,    // Line reset (>50 ones)
    0x00                                         // Idle low
};

Why this two-phase approach?

The problem is that we don't know the SW-DP's current state:

Fresh power-up: SW-DP is in Dormant mode (default)
Prior debug session: SW-DP may be in SWD or JTAG mode
Failed connection attempt: SW-DP may be in an undefined state

If the SW-DP is already in SWD or JTAG mode, sending the Selection Alert Sequence will be interpreted as data transactions, potentially putting the DP into an error state. Our solution:

Phase 1: Force transition to Dormant

Send the JTAG-to-Dormant sequence (56 ones followed by 0xbc, 0xe3). This sequence:

If in JTAG mode: transitions to Dormant
If in SWD mode: interpreted as line reset + invalid transactions (harmless)
If already Dormant: has no effect (dormant state ignores invalid input)

The sequence consists of:

56 clock cycles high (JTAG TMS=1 → Test-Logic-Reset state)
0x3cbe (0xbc, 0xe3 LSB-first): JTAG-specific exit pattern

Phase 2: Activate SWD from Dormant

Now that we're guaranteed to be in Dormant mode (or already in SWD mode where line reset is idempotent), we send:

Line reset (8 ones): Clears any pending SWD transactions
Selection Alert Sequence (128 bits): Wakes dormant state machine
SWD Activation Code (0x1a, 8 bits): Selects SWD protocol
Line reset (>50 ones): Enters SWD Reset state, clearing sticky errors
Idle cycles: Ensures clean transition

The SWD Activation Code 0x1a decodes as:

Bits[7:0] = 0x1a = 0b00011010

This specific bit pattern was chosen to be distinct from valid JTAG TMS sequences, ensuring protocol disambiguation.

3.C.4 Why Not Use the RP2350 Datasheet Sequence?

The RP2350 datasheet (Section 3.5.1) describes a simpler connection sequence:

1. At least 8 × SWCLK cycles with SWDIO high
2. The 128-bit Selection Alert sequence
3. Four SWCLK cycles with SWDIO low
4. SWD activation code: 0x1a, LSB first
5. At least 50 × SWCLK cycles with SWDIO high (line reset)
6. A DPIDR read to exit the Reset state

This sequence assumes the SW-DP is in Dormant mode at power-up. However, in real-world scenarios:

The target may have been previously debugged (SW-DP in SWD mode)
A debugger crash may have left the SW-DP in an error state
Multi-drop SWD configurations may require explicit state reset

Our JTAG→Dormant→SWD sequence provides universal robustness: it works regardless of the SW-DP's initial state. The cost is negligible—approximately 100 extra clock cycles, taking ~100µs at 1 MHz SWCLK—while the benefit is reliable connection without manual power-cycling.

3.C.5 Post-Activation Verification

After activation, we immediately read DP_IDCODE (swd_protocol.c:386-397):

uint32_t idcode = 0;
err = swd_read_dp_raw(target, DP_IDCODE, &idcode);
if (err != SWD_OK) {
    swd_set_error(target, err, "Failed to read IDCODE");
    return err;
}

if ((idcode & 0x0fffffff) == 0) {
    swd_set_error(target, SWD_ERROR_PROTOCOL, "Invalid IDCODE: 0x%08x", idcode);
    return SWD_ERROR_PROTOCOL;
}

A successful IDCODE read confirms:

SWD protocol is active
The SW-DP is responding to transactions
The SWCLK frequency is within tolerance
SWDIO signal integrity is sufficient

For RP2350, the IDCODE is 0x4c013477

This defensive activation strategy, while not strictly necessary for fresh power-up scenarios, ensures our library works reliably across the full range of real-world debug connection scenarios—a critical property for a reusable debug library.

4. DEBUG ACCESS PORT ARCHITECTURE

The Debug Access Port (DAP) provides memory-mapped access to debug resources through two register banks:

The Debug Port (DP) manages power domains and AP selection:

DP_IDCODE (0x0): Designer and part number identification
DP_CTRL_STAT (0x4): Power control and status flags
DP_SELECT (0x8): AP and register bank selection
DP_RDBUFF (0xC): Read buffer for pipelined AP reads

4.B Access Port Registers

Access Ports (AP) provide interfaces to debug resources. RP2350 implements multiple APs:

AP 0x0: ROM Table
AP 0x2: ARM Core 0 AHB-AP
AP 0x4: ARM Core 1 AHB-AP
AP 0x8: RP2350-specific AP
AP 0xA: RISC-V APB-AP (target of this library)

Each AP has standardized registers:

AP_CSW (0x00): Control/Status Word
AP_TAR (0x04): Transfer Address Register
AP_DRW (0x0C): Data Read/Write Register
AP_IDR (0xFC): Identification Register

4.C RP2350-Specific DP_SELECT Encoding

Standard ARM DP_SELECT format uses bits[31:24] for APSEL and bits[7:4] for APBANKSEL. RP2350 implements a non-standard encoding (dap.c:18-22):

uint32_t make_dp_select_rp2350(uint8_t apsel, uint8_t bank, bool ctrlsel) {
    // [15:12] = APSEL, [11:8] = 0xD, [7:4] = bank, [0] = ctrlsel
    return ((apsel & 0xF) << 12) | (0xD << 8) | ((bank & 0xF) << 4) | (ctrlsel ? 1 : 0);
}

The magic constant 0xD in bits[11:8] is undocumented but required for correct AP selection.

4.D Bank Selection Caching

AP registers are accessed through a banking mechanism where DP_SELECT must be written before each AP access. To minimize SWD transactions, the library maintains a cache of the current bank selection (dap.c:28-55):

static swd_error_t select_ap_bank(swd_target_t *target, uint8_t apsel, uint8_t bank) {
    if (target->dap.current_apsel == apsel &&
        target->dap.current_bank == bank &&
        target->dap.ctrlsel == true) {
        return SWD_OK;  // Already selected
    }
    // Write DP_SELECT...
    target->dap.current_apsel = apsel;
    target->dap.current_bank = bank;
    // ...
}

This caching reduces transaction count by approximately 50% in typical debug sessions.

5. DEBUG DOMAIN POWER SEQUENCING

Before any debug operations can proceed, the Debug Power Domain (DPD) and System Power Domain (SPD) must be powered up. This is not a physical power operation but rather clock and reset domain enabling.

The power-up sequence (dap.c:61-110) follows the ARM Debug Interface specification:

Clear sticky errors: Write 0 to DP_CTRL_STAT
Request power-up: Set CDBGPWRUPREQ (bit 28) and CSYSPWRUPREQ (bit 30)
Poll acknowledgment: Wait for CDBGPWRUPACK (bit 29) and CSYSPWRUPACK (bit 31)

uint32_t ctrl_stat = (1 << 28) | (1 << 30);
swd_write_dp_raw(target, DP_CTRL_STAT, ctrl_stat);

for (int i = 0; i < 10; i++) {
    swd_read_dp_raw(target, DP_CTRL_STAT, &status);
    bool cdbgpwrupack = (status >> 29) & 1;
    bool csyspwrupack = (status >> 31) & 1;
    if (cdbgpwrupack && csyspwrupack) {
        return SWD_OK;
    }
    sleep_ms(20);
}

Failure to complete this sequence results in all subsequent debug operations returning WAIT responses indefinitely.

6. RP2350 DEBUG MODULE INITIALIZATION

After DAP power-up, the RP2350-specific Debug Module must be initialized through an undocumented activation handshake. This sequence was reverse-engineered from OpenOCD's RP2350 support with an oscilloscope and patience.

The activation sequence (rp2350.c:106-194) consists of:

6.A AP Selection and CSW Configuration

uint32_t sel_bank0 = make_dp_select_rp2350(AP_RISCV, 0, true);
dap_write_dp(target, DP_SELECT, sel_bank0);

uint32_t csw = 0xA2000002;  // 32-bit access, auto-increment disabled
dap_write_ap(target, AP_RISCV, AP_CSW, csw);

6.B Bank 1 Activation Handshake

The Debug Module registers are normally accessed through Bank 0, but activation requires Bank 1:

uint32_t sel_bank1 = make_dp_select_rp2350(AP_RISCV, 1, true);
dap_write_dp(target, DP_SELECT, sel_bank1);

// Three-phase handshake
dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000000);  // Reset
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);

dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000001);  // Activate
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);

dap_write_ap(target, AP_RISCV, AP_CSW, 0x07FFFFC1);  // Configure
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);

The expected status response is 0x04010001.

7. RISC-V DEBUG MODULE INTERFACE

The RISC-V Debug Module implements the RISC-V External Debug Support specification v0.13. Debug Module registers are memory-mapped at base address 0x40 (register addresses are byte offsets × 4).

7.A Debug Module Registers

Key registers (rp2350.c:17-29):

#define DM_DMCONTROL   (0x10 * 4)  // Hart control
#define DM_DMSTATUS    (0x11 * 4)  // Hart status
#define DM_ABSTRACTCS  (0x16 * 4)  // Abstract command status
#define DM_COMMAND     (0x17 * 4)  // Abstract command execution
#define DM_DATA0       (0x04 * 4)  // Data transfer register
#define DM_PROGBUF0    (0x20 * 4)  // Program buffer word 0
#define DM_PROGBUF1    (0x21 * 4)  // Program buffer word 1
#define DM_SBCS        (0x38 * 4)  // System Bus Access Control
#define DM_SBADDRESS0  (0x39 * 4)  // SBA Address
#define DM_SBDATA0     (0x3C * 4)  // SBA Data

7.B Hart Control via DMCONTROL

Hart (hardware thread) execution is controlled through DMCONTROL register fields:

dmactive (bit 0): Debug Module active (must be 1)
haltreq (bit 31): Request hart halt
resumereq (bit 30): Request hart resume

Halt sequence (rp2350.c:205-240):

uint32_t dmcontrol = (1 << 31) | (1 << 0);  // haltreq | dmactive
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);

// Poll DMSTATUS.allhalted (bit 9)
for (int i = 0; i < 10; i++) {
    swd_result_t result = dap_read_mem32(target, DM_DMSTATUS);
    bool allhalted = (result.value >> 9) & 1;
    if (allhalted) {
        target->rp2350.hart_halted = true;
        return SWD_OK;
    }
    sleep_ms(10);
}

7.C Abstract Commands for Register Access

Abstract commands provide a high-level interface to hart state without halting. The COMMAND register format for GPR access:

Bits 0-15:   regno (0x1000 + reg_num for GPRs)
Bit 16:      write (1=write, 0=read)
Bit 17:      transfer (1=execute transfer)
Bits 20-22:  aarsize (2=32-bit access)

GPR read implementation (rp2350.c:333-389):

uint32_t command = 0;
command |= (0x1000 + reg_num) << 0;    // regno
command |= (1 << 17);                  // transfer
command |= (2 << 20);                  // aarsize=32-bit

dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target);  // Poll ABSTRACTCS.busy
result = dap_read_mem32(target, DM_DATA0);

7.D Program Buffer Execution Model

The Program Buffer (PROGBUF) is a 16-entry instruction memory within the Debug Module that enables execution of arbitrary RISC-V code in the debug context. Understanding its operation requires examining the execution model, register preservation semantics, and synchronization mechanisms.

7.D.1 The Dual-Context Execution Model

A RISC-V hart operates in one of two contexts:

Normal Context: The hart executes from main memory, PC advances sequentially, and all architectural state is visible to the program.
Debug Context: Upon entering debug mode (via halt request, ebreak, or trigger), the hart:
- Saves PC to DPC (Debug Program Counter, CSR 0x7b1)
- Enters a special execution mode where PROGBUF instructions execute
- Maintains all GPRs and CSRs in their pre-halt state
- Cannot access main memory without explicit instructions

The Debug Module provides a "scratch pad" where debugger-supplied instructions execute with full access to hart state, but without disturbing that state beyond explicit modifications.

7.D.2 PROGBUF Entry Layout

RP2350's Debug Module provides 2 program buffer entries (PROGBUF0 and PROGBUF1), though the specification allows up to 16. Each entry holds one 32-bit RISC-V instruction:

#define DM_PROGBUF0  (0x20 * 4)  // First instruction
#define DM_PROGBUF1  (0x21 * 4)  // Second instruction (typically ebreak)

The execution model assumes the final instruction is ebreak (0x00100073), which returns control to the Debug Module and makes the hart available for further debug operations.

7.D.3 The Abstract Command Postexec Mechanism

Abstract commands can trigger PROGBUF execution through the postexec bit (bit 18 of the COMMAND register). This creates a transactional execution model:

┌──────────────────────────────────────────┐
│ 1. Debugger writes PROGBUF instructions  │
├──────────────────────────────────────────┤
│ 2. Debugger writes DATA0 (optional)      │
├──────────────────────────────────────────┤
│ 3. Abstract command with postexec=1      │
│   - Transfers DATA0 → GPR (if transfer=1)│
│   - Executes PROGBUF[0]..PROGBUF[N]      │
│   - Executes ebreak (returns to DM)      │
│   - Transfers GPR → DATA0 (if transfer=1)│
└──────────────────────────────────────────┘

This mechanism eliminates race conditions: the data transfer and program execution form an atomic operation from the debugger's perspective.

7.D.4 Case Study: Reading Debug CSR (DPC)

The Debug Program Counter (DPC, CSR 0x7b1) cannot be accessed via abstract commands—it exists only in debug context and abstract commands target normal context registers. Reading DPC requires PROGBUF execution (rp2350.c:804-833):

Phase 1: Preserve scratch register

swd_result_t saved_s0 = rp2350_read_reg(target, hart_id, 8);  // x8 = s0

The RISC-V ABI designates s0 (x8) as a saved register, but we must preserve it because our PROGBUF code will clobber it.

Phase 2: Write PROGBUF instructions

dap_write_mem32(target, DM_PROGBUF0, 0x7b102473);  // csrr s0, dpc
dap_write_mem32(target, DM_PROGBUF1, 0x00100073);  // ebreak

The instruction csrr s0, dpc (CSR Read) has the encoding:

31      20 19   15 14  12 11    7 6      0
┌─────────┬───────┬──────┬───────┬────────┐
│ 0x7b1   │ 0x00  │ 0x2  │ 0x08  │ 0x73   │
│ CSR addr│ rs1   │funct3│  rd   │ opcode │
│  DPC    │  x0   │CSRRS │  s0   │ SYSTEM │
└─────────┴───────┴──────┴───────┴────────┘

funct3=0x2 (CSRRS): CSR Read and Set. Since rs1=x0, no bits are set (read-only operation).
CSR 0x7b1: DPC is defined in RISC-V Debug Spec v0.13, section 4.8.2

Phase 3: Execute with postexec

uint32_t command = (1 << 18);  // postexec=1, transfer=0
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target);  // Poll ABSTRACTCS.busy

The hart now executes:

csrr s0, dpc → DPC value loaded into s0
ebreak → Return to Debug Module, s0 contains DPC

Phase 4: Extract result via abstract command

result = rp2350_read_reg(target, hart_id, 8);  // Read s0 (now contains DPC)

Phase 5: Restore architectural state

rp2350_write_reg(target, hart_id, 8, saved_s0.value);  // Restore s0

This five-phase sequence is invisible to the hart's normal execution: when resumed, all registers appear unchanged.

7.D.5 Writing Debug CSRs: The Inverse Operation

Writing DPC uses the inverse data flow (rp2350.c:879-909):

// Phase 1: Transfer new PC value to s0
err = rp2350_write_reg(target, hart_id, 8, new_pc_value);

// Phase 2: Write PROGBUF to copy s0 → DPC
dap_write_mem32(target, DM_PROGBUF0, 0x7b141073);  // csrw dpc, s0
dap_write_mem32(target, DM_PROGBUF1, 0x00100073);  // ebreak

// Phase 3: Execute
uint32_t command = (1 << 18);  // postexec=1
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target);

The instruction csrw dpc, s0 (CSR Write) has encoding 0x7b141073:

31      20 19   15 14  12 11    7 6      0
┌─────────┬───────┬──────┬───────┬────────┐
│ 0x7b1   │ 0x08  │ 0x1  │ 0x00  │ 0x73   │
│ CSR addr│ rs1   │funct3│  rd   │ opcode │
│  DPC    │  s0   │CSRRW │  x0   │ SYSTEM │
└─────────┴───────┴──────┴───────┴────────┘

funct3=0x1 (CSRRW): CSR Read and Write. The old CSR value is discarded (rd=x0), and s0's value is written to DPC.

7.D.6 PROGBUF Execution Constraints

The PROGBUF execution environment imposes several constraints:

Memory Access Limitation: PROGBUF instructions execute in debug mode, where memory access depends on Debug Module configuration. Standard loads/stores may fault.
Instruction Count: With only 2 entries, complex operations require multiple PROGBUF sequences. Each sequence incurs the cost of abstract command execution (~100µs typical).
No Branching: PROGBUF is linear. Conditional execution requires host-side logic to decide which PROGBUF sequence to execute.
Register Pressure: Only one scratch register (s0) is conventionally used. More complex operations require additional saves/restores.
Ebreak Requirement: The final instruction must be ebreak. Omitting it causes the hart to hang in debug mode.

This execution model provides a "remote procedure call" mechanism where the host supplies short instruction sequences that execute atomically on the hart, providing a window into debug-only architectural state.

8. SYSTEM BUS ACCESS: NON-INTRUSIVE MEMORY OPERATIONS

System Bus Access (SBA) represents a fundamental departure from the traditional halt-based debugging model. Where classical debugging requires stopping the hart, transferring data through GPRs, and resuming, SBA provides a "back door" to the memory subsystem that operates concurrently with hart execution.

The Debug Module contains a bus master that can initiate memory transactions on the system bus independently of the harts. This master has the following characteristics:

Separate Bus Master: SBA transactions do not consume hart resources or execution time
Concurrent Operation: Memory reads/writes occur while harts execute normally
Cache Coherency Dependency: SBA bypasses hart caches; coherency is NOT guaranteed
Bus Arbitration: SBA competes with harts for bus bandwidth

The SBA interface consists of three memory-mapped registers in the Debug Module:

#define DM_SBCS        (0x38 * 4)  // System Bus Access Control and Status
#define DM_SBADDRESS0  (0x39 * 4)  // System Bus Address (32-bit)
#define DM_SBDATA0     (0x3C * 4)  // System Bus Data (32-bit)

8.B SBCS: Control and Status Word

The SBCS register (offset 0x38) contains configuration and status fields defined in RISC-V Debug Spec v0.13.2, section 3.12.18:

31:29 sbversion        (read-only)  SBA version
28:23 (reserved)       0
   22 sbbusyerror      (W1C)        Bus error occurred
   21 sbbusy           (read-only)  Bus master is busy
   20 sbreadonaddr     (read-write) Auto-read on SBADDRESS0 write
19:17 sbaccess         (read-write) Access width: 0=8-bit, 1=16-bit, 2=32-bit
   16 sbautoincrement  (read-write) Auto-increment address after access
   15 sbreadondata     (read-write) Auto-read on SBDATA0 read
14:12 sberror          (W1C)        Error status (0=none, 1=timeout, 2=bad addr, 3=alignment, 4=size, 7=other)
11:5  sbasize          (read-only)  Address width in bits (32 for RP2350)

8.C SBA Initialization: Capability Discovery

The SBA subsystem initialization (rp2350.c:958-992) follows a capability discovery pattern:

Phase 1: Read SBCS to detect supported features

swd_result_t result = dap_read_mem32(target, DM_SBCS);

Phase 2: Verify SBA capability

// Check sbasize field (bits [11:5]) to verify SBA is present
uint32_t sbasize = (result.value >> 5) & 0x7F;
if (sbasize == 0) {
    return SWD_ERROR_INVALID_STATE;  // SBA not available
}

The sbasize field indicates the system bus address width (32 bits for RP2350). RP2350 supports 8-bit, 16-bit, and 32-bit access widths. We configure for 32-bit:

Phase 3: Configure access mode

uint32_t sbcs = 0;
sbcs |= (2 << 17);  // sbaccess = 2 (32-bit)
sbcs |= (1 << 20);  // sbreadonaddr = 1 (auto-read trigger)
dap_write_mem32(target, DM_SBCS, sbcs);

The sbreadonaddr flag is critical: it converts the address write into an atomic read-trigger operation.

8.D The Auto-Read Mechanism

Without sbreadonaddr, a memory read requires three transactions:

1. Write address to SBADDRESS0
2. Write SBCS with read trigger
3. Read data from SBDATA0

With sbreadonaddr=1, the middle step is eliminated:

1. Write address to SBADDRESS0  ← Triggers bus read automatically
2. Read data from SBDATA0       ← Data is ready

Implementation (rp2350.c:1013-1020):

dap_write_mem32(target, DM_SBADDRESS0, addr);  // Write triggers read
result = dap_read_mem32(target, DM_SBDATA0);   // Data is already valid

The Debug Module's state machine looks like:

IDLE → [SBADDRESS0 written] → BUSY → [bus read completes] → DATA_READY
                                ↓
                           [bus timeout] → SBERROR=1

8.E SBA Write Transactions

Memory writes use SBDATA0 as the trigger register:

dap_write_mem32(target, DM_SBADDRESS0, addr);   // Set address
dap_write_mem32(target, DM_SBDATA0, value);     // Write triggers bus write

The write to SBDATA0 initiates the system bus write transaction. The debugger should poll SBCS.sbbusyerror to detect completion (though in practice, pipelined writes are often used).

The SBCS.sberror field reports transaction failures:

0: No error
1: Timeout (bus did not respond)
2: Bad address (unmapped region)
3: Bad alignment (misaligned access)
4: Bad size (unsupported width)
7: Other error

Errors are sticky and must be explicitly cleared by writing 1 to SBCS.sberror (W1C = Write-1-to-Clear).

9. STATE MANAGEMENT AND CACHING

The library maintains comprehensive state tracking to avoid redundant SWD transactions:

typedef struct {
    bool connected;
    uint32_t idcode;
    bool resource_registered;
    // ...
} swd_target_t;

typedef struct {
    uint8_t current_apsel;
    uint8_t current_bank;
    bool ctrlsel;
    uint32_t select_cache;
    bool powered;
    uint retry_count;
} dap_state_t;

9.C Per-Hart State Tracking

RP2350 contains two RISC-V harts (hardware threads) that execute independently. The library maintains per-hart state to avoid redundant operations and enable concurrent debugging:

typedef struct {
    bool halt_state_known;  // false after resume, true after halt/read status
    bool halted;            // true if hart is currently halted

    // Register cache
    bool cache_valid;       // true if cached values are current
    uint32_t cached_pc;
    uint32_t cached_gprs[32];
    uint64_t cache_timestamp;  // For LRU if needed
} hart_state_t;

The top-level RP2350 state maintains an array of hart states:

#define RP2350_NUM_HARTS 2

typedef struct {
    bool initialized;
    bool sba_initialized;

    // Per-hart state
    hart_state_t harts[RP2350_NUM_HARTS];

    // Shared cache configuration
    bool cache_enabled;
} rp2350_state_t;

9.C.1 Halt State Tracking

The halt_state_known flag implements a three-state model:

Unknown (halt_state_known=false): Hart state is uncertain (after resume or initialization)
Known Halted (halt_state_known=true, halted=true): Hart is confirmed halted
Known Running (halt_state_known=true, halted=false): Hart is confirmed running

This prevents expensive DMSTATUS polls when the state is known. State transitions:

                ┌─────────────┐
                │   Unknown   │
                └──────┬──────┘
                       │
         ┌─────────────┼─────────────┐
         │                           │
    halt_request()             read_dmstatus()
         │                           │
         ▼                           ▼
   ┌────────────┐             ┌──────────────┐
   │   Halted   │             │   Running    │
   └─────┬──────┘             └──────┬───────┘
         │                           │
         │         resume()          │
         └───────────────────────────┘
                      │
                      ▼
                 ┌─────────┐
                 │ Unknown │  (state invalidated)
                 └─────────┘

When cache_enabled=true, the library caches register values after reads. This optimization benefits:

Repeated reads of the same register (e.g., polling loop variables)
Bulk register dumps where rp2350_read_all_regs() populates the cache
Reduced SWD traffic (each register read requires ~6 SWD transactions)

Cache invalidation occurs on:

Hart resume (execution changes registers)
Register write (specific register invalidated)
Hart halt request (conservative invalidation)

The cache is per-hart, allowing concurrent debugging of both harts without interference.

PIO resources are scarce: RP2040/RP2350 provide 2 PIO blocks with 4 state machines each. The library implements a global resource tracker for multi-target support.

10.A Global Resource Tracking

typedef struct {
    swd_target_t *pio0_sm_owners[4];
    swd_target_t *pio1_sm_owners[4];
    uint active_count;
} resource_tracker_t;

extern resource_tracker_t g_resources;

10.B Automatic Allocation

When SWD_PIO_AUTO or SWD_SM_AUTO is specified in configuration, the library scans for free resources (swd.c:105-125):

swd_error_t allocate_pio_sm(PIO *pio, uint *sm) {
    for (uint i = 0; i < 4; i++) {
        if (g_resources.pio0_sm_owners[i] == NULL) {
            *pio = pio0;
            *sm = i;
            return SWD_OK;
        }
    }
    // Try PIO1...
}

Up to 8 simultaneous target connections are supported (limited by hardware resources).

11. ERROR HANDLING AND RECOVERY

The library provides comprehensive error reporting through enumerated error codes and detailed message strings.

typedef enum {
    SWD_OK = 0,
    SWD_ERROR_TIMEOUT,        // Transaction timeout
    SWD_ERROR_FAULT,          // Target FAULT response
    SWD_ERROR_PROTOCOL,       // Malformed packet
    SWD_ERROR_PARITY,         // Parity check failure
    SWD_ERROR_WAIT,           // WAIT response retry exhausted
    SWD_ERROR_NOT_CONNECTED,  // No active connection
    SWD_ERROR_NOT_HALTED,     // Operation requires halted hart
    SWD_ERROR_ALREADY_HALTED, // Hart already halted (informational)
    // ...
} swd_error_t;

Each target maintains a 128-byte error detail buffer for formatted diagnostic messages (swd.c:67-84):

void swd_set_error(swd_target_t *target, swd_error_t error,
                   const char *detail, ...) {
    target->last_error = error;
    va_list args;
    va_start(args, detail);
    vsnprintf(target->error_detail, sizeof(target->error_detail),
              detail, args);
    va_end(args);
}

11.C ACK Response Mapping

SWD protocol ACK responses are mapped to error codes (swd.c:91-99):

swd_error_t swd_ack_to_error(uint8_t ack) {
    switch (ack) {
        case 0x1: return SWD_OK;            // OK
        case 0x2: return SWD_ERROR_WAIT;    // WAIT
        case 0x4: return SWD_ERROR_FAULT;   // FAULT
        default:  return SWD_ERROR_PROTOCOL;
    }
}

WAIT responses trigger automatic retry with backoff (swd_protocol.c:197-208):

for (uint retry = 0; retry < target->dap.retry_count; retry++) {
    err = swd_io_raw(target, request, value, false);
    if (err != SWD_ERROR_WAIT) break;
    sleep_us(100);
}

Default retry count is 5, configurable via swd_config_t.

12.A Target Creation and Connection

swd_config_t config = swd_config_default();
config.pin_swclk = 2;
config.pin_swdio = 3;
config.freq_khz = 1000;
config.enable_caching = true;

swd_target_t *target = swd_target_create(&config);
swd_connect(target);
rp2350_init(target);

// Halt hart 0
rp2350_halt(target, 0);

// Read program counter
swd_result_t pc = rp2350_read_pc(target, 0);
if (pc.error == SWD_OK) {
    printf("PC: 0x%08x\n", pc.value);
}

// Read all registers
uint32_t regs[32];
rp2350_read_all_regs(target, 0, regs);

// Single-step execution
rp2350_step(target, 0);

// Resume execution
rp2350_resume(target, 0);

// Reset hart
rp2350_reset(target, 0, true);  // Reset and halt

// Read memory (non-intrusive via SBA)
swd_result_t result = rp2350_read_mem32(target, 0x20000000);

// Write memory
rp2350_write_mem32(target, 0x20000000, 0xDEADBEEF);

// Block operations
uint32_t buffer[256];
rp2350_read_mem_block(target, 0x20000000, buffer, 256);

const uint32_t program[] = {
    0x200415b7,  // lui  a1, 0x20040
    0xabcd0537,  // lui  a0, 0xabcd0
    0x23450513,  // addi a0, a0, 0x234
    0x00a5a223,  // sw   a0, 4(a1)
    0x0000006f,  // j    0 (infinite loop)
};

rp2350_execute_code(target, 0, 0x20000000, program, 5);

// Trace callback receives each executed instruction
bool trace_callback(const trace_record_t *record, void *user_data) {
    printf("PC: 0x%08x  Instruction: 0x%08x\n",
           record->pc, record->instruction);

    // Optional: inspect registers
    if (record->regs) {
        printf("  x5=0x%08x\n", record->regs[5]);
    }

    return true;  // Continue tracing (false = stop)
}

// Trace 100 instructions on hart 0, capturing registers
int count = rp2350_trace(target, 0, 100, trace_callback, NULL, true);
printf("Traced %d instructions\n", count);

12.F Dual-Hart Operations

// Operate on both harts independently
rp2350_halt(target, 0);
rp2350_halt(target, 1);

// Read registers from both harts
uint32_t h0_regs[32], h1_regs[32];
rp2350_read_all_regs(target, 0, h0_regs);
rp2350_read_all_regs(target, 1, h1_regs);

// Execute different programs on each hart
rp2350_execute_code(target, 0, 0x20000000, program0, len0);
rp2350_execute_code(target, 1, 0x20001000, program1, len1);

// Trace hart 1 while hart 0 runs
rp2350_resume(target, 0);
rp2350_trace(target, 1, 50, trace_callback, NULL, false);

13. BUILDING AND INTEGRATION

add_subdirectory(lib/pico2-swd-riscv)
target_link_libraries(your_application pico2_swd_riscv)

13.B Debug Level Configuration

Set compile-time debug verbosity:

target_compile_definitions(your_application PRIVATE PICO2_SWD_DEBUG_LEVEL=3)

Levels: 0 (none), 1 (warnings), 2 (info), 3 (debug).

ARM Debug Interface Architecture Specification v5.2
ARM CoreSight SWD-DP Technical Reference Manual
RISC-V External Debug Support version 0.13
RP2350 Datasheet, Chapter 3.5: Debug
ADIv5.2 Supplement for Multi-drop SWD
IEEE 1149.1-2001 (JTAG)

15. RISC-V SINGLE-STEP EXECUTION

Single-step execution enables instruction-level debugging by executing exactly one instruction before re-entering debug mode. This is implemented via the DCSR.step bit (Debug Control and Status Register, bit 2).

15.A The Step Bit Mechanism

When DCSR.step=1, the hart executes one instruction after resumereq, then immediately re-halts:

Debug Mode → [resumereq + DCSR.step=1] → Execute 1 instruction → Debug Mode

Implementation (rp2350.c:431-504):

Phase 1: Read current DCSR

swd_result_t dcsr_result = read_dcsr(target, hart_id);

DCSR must be read via PROGBUF (see Section 7.D) because it's a debug-only CSR.

Phase 2: Set step bit

uint32_t dcsr_stepped = dcsr_result.value | (1 << 2);
write_dcsr(target, hart_id, dcsr_stepped);

Phase 3: Resume hart

uint32_t dmcontrol = make_dmcontrol(hart_id, false, true, false);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);

The hart now executes exactly one instruction, then:

PC is saved to DPC
Hart re-enters debug mode
DCSR.cause = 0x4 (single-step)

Phase 4: Wait for automatic halt

poll_dmstatus_halted(target, hart_id, true);

Phase 5: Clear step bit

write_dcsr(target, hart_id, dcsr_result.value);  // Restore original DCSR

This ensures subsequent rp2350_resume() calls don't single-step.

16. INSTRUCTION TRACING VIA ITERATED SINGLE-STEPPING

The library implements software instruction tracing by repeatedly single-stepping and recording each instruction. This provides a "time-travel" debugging capability at the cost of execution speed.

16.A The Trace Callback Model

Tracing uses a callback function to process each instruction (rp2350.c:1262-1337):

typedef struct {
    uint32_t pc;
    uint32_t instruction;
    uint32_t regs[32];  // Valid if capture_regs=true
} trace_record_t;

typedef bool (*trace_callback_t)(const trace_record_t *record, void *user_data);

The callback returns true to continue or false to stop.

16.B Trace Implementation

int rp2350_trace(swd_target_t *target, uint8_t hart_id,
                 uint32_t max_instructions,
                 trace_callback_t callback, void *user_data,
                 bool capture_regs);

For each instruction:

Read PC (via PROGBUF): Current instruction address
Read memory at PC: Fetch the instruction word
Optional: Read all GPRs: If capture_regs=true
Invoke callback: User processes the record
Single-step: Execute one instruction
Repeat until max_instructions or callback returns false

Loop Detection:

bool detect_loop(const trace_record_t *record, void *user_data) {
    static uint32_t entry_pc = 0;
    static int count = 0;

    if (count == 0) entry_pc = record->pc;
    if (record->pc == entry_pc && count > 0) {
        printf("Loop detected at PC=0x%08x\n", record->pc);
        return false;  // Stop trace
    }
    count++;
    return true;
}

Register State History:

bool capture_state(const trace_record_t *record, void *user_data) {
    printf("%08x: %08x  x5=%08x x6=%08x\n",
           record->pc, record->instruction,
           record->regs[5], record->regs[6]);
    return true;
}

rp2350_trace(target, 0, 100, capture_state, NULL, true);

Speed: ~5ms per instruction (200 instructions/second)
Interrupt Masking: Tracing should occur with interrupts disabled (clear mstatus.MIE)
Memory Consistency: Instructions are fetched via SBA; ensure I-cache coherency
No Hardware Triggers: Trace starts immediately; no "trace until condition"

17. HART RESET OPERATIONS

Hart reset (rp2350_reset) implements a controlled reset sequence via DMCONTROL.ndmreset (non-debug module reset, bit 1).

swd_error_t rp2350_reset(swd_target_t *target, uint8_t hart_id,
                         bool halt_on_reset);

Phase 1: Assert ndmreset

uint32_t dmcontrol = make_dmcontrol(hart_id, halt_on_reset, false, true);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
sleep_ms(10);  // Hold reset

Phase 2: Deassert ndmreset

dmcontrol = make_dmcontrol(hart_id, halt_on_reset, false, false);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
sleep_ms(50);  // Wait for reset completion

If halt_on_reset=true, the DMCONTROL.haltreq bit remains set, causing the hart to enter debug mode immediately after reset, with PC set to the reset vector.

This reset is equivalent to a power-on reset for the hart:

PC → reset vector (typically 0x00000000 for RP2350 RISC-V cores)
All CSRs → architectural reset values
GPRs → undefined
Cache → invalidated

Unlike a full chip reset, peripherals and other harts are unaffected.

18. DUAL-HART ARCHITECTURE

RP2350's two RISC-V harts (Hazard3 cores) are symmetric and independently controllable. The library provides full per-hart state tracking and concurrent operation.

18.A Hart Selection via DMCONTROL

Each Debug Module operation targets a specific hart via DMCONTROL.hartsel[9:0]:

static inline uint32_t make_dmcontrol(uint8_t hart_id, bool haltreq,
                                       bool resumereq, bool ndmreset) {
    uint32_t dmcontrol = (1 << 0);  // dmactive = 1
    dmcontrol |= ((uint32_t)hart_id << 16);  // hartsello[9:0] at bits 25:16
    if (haltreq) dmcontrol |= (1 << 31);
    if (resumereq) dmcontrol |= (1 << 30);
    if (ndmreset) dmcontrol |= (1 << 1);
    return dmcontrol;
}

Before any hart-specific operation (halt, resume, register read), the library writes DMCONTROL with the correct hart_id, switching the Debug Module's attention to that hart.

18.B Independent Hart Control

The test suite validates that harts operate independently:

// Halt hart 0, keep hart 1 running
rp2350_halt(target, 0);
rp2350_resume(target, 1);

// Read hart 0 registers while hart 1 executes
uint32_t h0_regs[32];
rp2350_read_all_regs(target, 0, h0_regs);

This enables debugging one hart while the other maintains real-time operation.

Each hart maintains independent register state. Writing x5 on hart 0 does not affect x5 on hart 1. This is validated by test_dual_hart.c:69-115:

rp2350_write_reg(target, 0, 5, 0xAAAAAAAA);
rp2350_write_reg(target, 1, 5, 0x55555555);

assert(rp2350_read_reg(target, 0, 5).value == 0xAAAAAAAA);
assert(rp2350_read_reg(target, 1, 5).value == 0x55555555);

18.D Shared Memory, Independent Caches

Both harts share the same physical memory space but maintain independent caches. This creates coherency considerations:

SBA Writes: Visible to both harts (after cache invalidation)
Hart 0 Writes: Not immediately visible to Hart 1 if cached
Explicit Synchronization: Required for inter-hart communication

The test suite exercises memory access while both harts run concurrently (test_mem.c:291-347).

This implementation does not currently support:

19.A Hardware Breakpoints (Trigger Module)

RISC-V Debug Specification defines a Trigger Module for hardware breakpoints. Implementation was removed due to complexity. Workaround: Use single-step + PC comparison in software.

The SWD protocol supports multiple targets on one bus via unique addresses. This library assumes a single target. Physical wiring for multi-target is possible but requires additional multiplexing logic.

19.C Compressed Instruction (RVC) Extension

RP2350's Hazard3 cores support the C extension (16-bit compressed instructions). The library:

Correctly reads compressed instructions during tracing
Does NOT decode compressed instruction mnemonics
Assumes 4-byte alignment for code upload

19.D Performance Profiling

No cycle-accurate performance counters are exposed. Implementing this requires:

Access to mcycle/minstret CSRs
Periodic sampling without halting (not possible with current SBA coherency)

No routines for RP2350 flash programming. This requires:

Loading flash programmer stub to SRAM
Executing stub via rp2350_execute_code()
Monitoring completion via polling

The architecture supports this; implementation is left to applications.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.