A stateful SWD protocol implementation for debugging RP2350 RISC-V cores (Hazard3) from any Raspberry Pi Pico2 (target) using GPIO's on another Pico (probe).
About 80% of the code is vibe coded; The readme is almost completely generated (except the whole vibe-code-warning section). I spent many nights with the oscilloscope and the docs and made a working prototype that was able ti do sba/read/write regs and do abstract commands and progbuf, the rest was done with claude code. The tests are quite comprehensive test suite and I use the core of the library in my own projects, but, as they say, "hic sunt dracones". I also read the readme and the code didn't notice anything wrong (and removed the wrong/unclear parts).
This project was my casestudy of vibecoding a more complicated project that I dont understand 100% and there is no obvious existing code that can be "used". It started as ~1000 loc that I have written and knew very well, reading the rp2350, arm swd and riscv debug docs, capturing data with oscilloscope and openocd then decoding it and analyzing the wakeup sequence and then read/write commands. After I got it working I gave it to claude to make it into a library that I can use in other projects, and then I slowly built it up.
After about 3-4k lines of code I completely lost track of what is going on, and I woudn't consider this code that I have written, but adding more and more tests felt "nice", or at least reassuring.
There was a some gaslighting, particularly when it misunderstood dap_read_mem32 thinking it is reading from ram and not MEM-AP TAR/DRW/RDBUFF protocol, which lead to incredible amount of nonsense.
Overall I would say it was a horrible experience, even though it took 10 hours to write close to 10000 lines of code, I don't consider this my project, and I have no sense of acomplishment or growth.
In contrast, using AI to read all the docs (which are thousands of pages) and write helpful scripts to decode the oscilloscope data, create packed C structs from docs and etc, was very nice, and I did feel good after. The moment I read the first register and then when I was able to read memory via SBA I felt amazing.
The main issue is taste, when I write code I feel if its good or bad, as I am writing it, I know if its wrong, but using claude code I get desensitized very quickly and I just can't tell, it "reads" OK, but I don't know how it feels. In this case it happened when the code grew about 4x, from 1k to 4k lines. And worse of all, my mental model of the code is completely gone, and with it my ownership.
The tokens have no reason or purpose, which makes reading code ridiculously difficult, as and each token can be complete nonsense. When reading human code the symbols have a purpose, someone thought "I will put this in a variable, later I will check its status.", so I pretend I am them, and think why would they have written this? Shortly after I understand, as they are human and I am human. But the AI symbols have no reason, and worse of all, they all look deceptively correct, so I have to think 10 times harder if it is wrong. With any human code (including your own) it is quite easy to gauge how much you can trust it, and it is quite consistent, with the AI code, one function can be much better than what you woudld've written, and the code 2 lines below can be cargo culted gunk that looks incredibly good, but is structurally wrong.
In the end I would say I have gained good understanding of the wires, timings, and the lower level ap/dp mechanics, sba and progbuf, but I regret not writing the whole thing myself, even if it would've taken 10x the time.
I fucking hate this.
And I can not help, but feel dusgust and shame. Is this what programming is now? I really hope this is some intermediate stage and it changes for the better, the problem is I dont know what "better" is, it seems for some people is not writing the code, for others is not modeling the problem and for third is not having to think. For me, I am not sure, I do want to make things, and many times I dont want to know something, but I want to use it, e.g. the rp2350 usb host controller the way you have to re-arm interrupts and the way the epx register is shared is super annoying, for good reasons probably, but I just want to use it to make my CBI driver.
I guess the question is what is the thing I want to make, because you can go way up the stack, from the USB chip registers to CBI to UFI to FAT16 to the OS of the old school computer I am making, but why stop? make the schematics, the pcbs, the cad files, maybe automatically send it to the factory? and then just ship it to me? but why stop? make my webshop, start selling, make a community, ads, marketing, generate some unboxing videos, maybe some viral memes? process the orders directly to the factory, on demand, if there is an issue, it is ready with customer support.
What do I do in the meanwhile? Sit on the beach? I hate the beach.
Where does it stop?
This library implements a complete three-layer abstraction for Serial Wire Debug protocol communication with RP2350's RISC-V Debug Module, modeled after the Debug Access Port specification and informed by ARM Debug Interface Architecture Specification v5.2.
┌────────────────────────────────────────┐
│ Application Layer │
│ (User Code) │
└────────────────┬───────────────────────┘
│
┌────────────────▼───────────────────────┐
│ Debug Module Layer (rp2350.c) │
│ - RISC-V Debug Specification v0.13 │
│ - Hart control via DMCONTROL │
│ - Abstract commands for GPR access │
│ - System Bus Access (non-intrusive) │
│ - PROGBUF execution for CSR access │
└────────────────┬───────────────────────┘
│
┌────────────────▼───────────────────────┐
│ Debug Access Port Layer (dap.c) │
│ - DP/AP register transactions │
│ - RP2350-specific DP_SELECT encoding │
│ - Bank selection caching │
│ - Memory-mapped debug register access │
└────────────────┬───────────────────────┘
│
┌────────────────▼───────────────────────┐
│ Serial Wire Debug Layer (swd*.c) │
│ - 2-wire bidirectional protocol │
│ - PIO state machine bit-banging │
│ - Request/ACK/Data phase handling │
│ - Parity computation and verification │
│ - Line reset and dormant sequences │
└────────────────────────────────────────┘
The separation of concerns follows classical protocol stack design: each layer exposes a well-defined interface and maintains independent state, with lower layers unaware of higher-layer semantics.
Before examining the protocol implementation, we must establish the theoretical foundations of RISC-V external debugging. This section develops the debug architecture from first principles, following the RISC-V External Debug Support Specification v0.13.
A RISC-V hart (hardware thread) exists in one of three abstract states:
┌─────────────┐
│ RUNNING │
│ (Normal) │
└──────┬──────┘
│
halt_request, ebreak,
trigger_fire, step_complete
│
▼
┌─────────────┐
│ HALTED │
│ (Debug) │
└──────┬──────┘
│
resume_request
│
▼
┌─────────────┐
│ RESUMING │
│ (Transient) │
└──────┬──────┘
│
▼
┌─────────────┐
│ RUNNING │
└─────────────┘
State 1: RUNNING - The hart executes instructions from main memory. PC advances according to program flow. All architectural state (GPRs, CSRs, memory) is accessible to the executing program.
State 2: HALTED - The hart has entered debug mode. No instructions from main memory execute. The hart is "parked" in a special debug ROM or implicit debug loop within the Debug Module. Debug-specific CSRs (DPC, DCSR, DSCRATCH) become accessible.
State 3: RESUMING - A transient state where the hart has received a resume request but has not yet returned to normal execution. This state exists to model the asynchronous nature of resume operations.
The Debug Module (DM) is a hardware block separate from the hart itself. It acts as a "shadow controller" that can:
- Observe hart state without halting (DMSTATUS register)
- Command hart transitions (halt, resume, reset via DMCONTROL)
- Access hart registers when halted (abstract commands)
- Access system memory independently of hart state (System Bus Access)
The DM is itself controlled by an external debugger via a Debug Transport Module (DTM). In our case, the DTM is the SWD interface.
┌──────────────────────────────────────────────────┐
│ External Debugger (Host CPU) │
└────────────────────┬─────────────────────────────┘
│
▼ SWD Protocol
┌──────────────────────────────────────────────────┐
│ Debug Transport Module (DTM) │
│ - Exposes DM registers as memory-mapped space │
└────────────────────┬─────────────────────────────┘
│
▼ Internal Bus
┌──────────────────────────────────────────────────┐
│ Debug Module (DM) │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Abstract │ │ System Bus │ │
│ │ Command │ │ Master │ │
│ │ Engine │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
└─────────┼─────────────────┼──────────────────────┘
│ │
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Hart 0 │ │ System Bus │
│ (Hazard3) │ │ │
├─────────────┤ └──────────────┘
│ Hart 1 │
│ (Hazard3) │
└─────────────┘
When a hart enters debug mode, it is not simply "stopped." Rather, it enters a special execution context analogous to an exception handler:
- PC is saved to DPC (Debug Program Counter, CSR 0x7b1)
- Privilege level is elevated to M-mode (Machine mode, highest privilege)
- DCSR.cause records the entry reason (halt request, ebreak, trigger, etc.)
- Hart begins executing from the debug exception vector (typically in debug ROM)
The debug exception vector contains a tight polling loop that repeatedly checks for commands from the Debug Module. This loop is architecturally invisible to the debugger—we simply observe the hart as "halted."
The Abstract Command mechanism provides a hardware-implemented function call interface. Each abstract command is a 32-bit word written to the COMMAND register that encodes:
31 24 23 0
┌────────────────────────────────┬────────────────────────────────┐
│ cmdtype │ command-specific │
└────────────────────────────────┴────────────────────────────────┘
cmdtype=0: Access Register
31 24 23 20 19 18 17 16 15 0
┌──────────┬─────────────┬─┬──┬──┬──┬──────────────────────────────┐
│ 0 │ aarsize │0│pc│tr│wr│ regno │
└──────────┴─────────────┴─┴──┴──┴──┴──────────────────────────────┘
aarsize: Access size (2 = 32-bit)
aarpostincrement: Ignored
postexec: Execute program buffer after transfer
transfer: Perform the transfer (1=yes)
write: Direction (1=write, 0=read)
regno: Register number (0x1000-0x101f for GPRs x0-x31)
The Debug Module hardware interprets this command and performs the register access autonomously. From the debugger's perspective, this is a synchronous operation: write COMMAND, poll ABSTRACTCS.busy until clear, read result from DATA0.
The Program Buffer (PROGBUF) is a small instruction memory (2-16 entries) within the Debug Module. When abstract commands cannot accomplish a task (e.g., accessing debug-only CSRs), the debugger can:
- Write RISC-V instructions to PROGBUF
- Issue an abstract command with the
postexecbit set - The hart executes PROGBUF instructions while still in debug mode
- The final
ebreakinstruction returns control to the Debug Module
This is not a "code injection" attack—the hart never leaves debug mode. It's analogous to a debugger writing instructions into a trap handler's stack frame.
SBA provides a second path to memory that bypasses the hart entirely:
Debugger Commands
│
▼
┌─────────────┐
│ DM │
└──┬───────┬──┘
│ │
┌────────┘ └───────┐
│ │
▼ Abstract Cmd ▼ SBA
┌──────┐ ┌─────────┐
│ Hart │───────────────▶│ Memory │
└──────┘ Hart Accesses └─────────┘
The hart and SBA compete for memory bus bandwidth. The hart's view of memory may differ from SBA's view due to:
- Cache: Hart caches writes; SBA sees stale memory
- MMU/PMP: Hart accesses are translated/protected; SBA bypasses
- Atomicity: Hart's atomic operations (LR/SC) are invisible to SBA
This is not a bug—it's a fundamental architectural trade-off. SBA provides speed and non-intrusiveness at the cost of coherency guarantees.
RISC-V debugging rests on several invariants:
Invariant 1: Debug Mode is Atomic While in debug mode, the hart executes no instructions from main memory. The debugger has exclusive control.
Invariant 2: Architectural Transparency Entering and exiting debug mode does not change architected state (except DPC/DCSR). The program cannot detect it was halted (modulo real-time constraints).
Invariant 3: Debug Privilege Debug mode executes at maximum privilege (M-mode). All memory is accessible, all CSRs are readable.
Invariant 4: No Interrupts in Debug Interrupts are masked while in debug mode. The debugger must explicitly re-enable them.
These invariants enable reproducible debugging: halting twice at the same PC should show identical state.
Serial Wire Debug (SWD) is a 2-wire replacement for JTAG's 5-wire interface, developed by ARM. The protocol operates over two signals:
- SWCLK: Clock signal driven by the debugger (host)
- SWDIO: Bidirectional data signal with turnaround phases
Each SWD transaction consists of three phases:
-
Request Phase (8 bits, host drives SWDIO):
Bit 0: Start (always 1) Bit 1: APnDP (0=DP access, 1=AP access) Bit 2: RnW (0=Write, 1=Read) Bit 3-4: A[3:2] (register address bits) Bit 5: Parity (even parity of bits 1-4) Bit 6: Stop (always 0) Bit 7: Park (always 1) -
Acknowledge Phase (3 bits, target drives SWDIO):
OK (001): Transaction accepted WAIT (010): Target requests retry FAULT (100): Error condition -
Data Phase (33 bits, direction depends on RnW):
Bits 0-31: Data word Bit 32: Parity bit
Turnaround cycles (host releases SWDIO, target can drive) occur between request→ack and during data phase direction changes.
Our implementation of the packet construction is in swd_protocol.c:97-113:
static uint8_t make_swd_request(bool APnDP, bool RnW, uint8_t addr) {
uint8_t a2 = (addr >> 2) & 1;
uint8_t a3 = (addr >> 3) & 1;
uint8_t parity = (APnDP + RnW + a2 + a3) & 1;
uint8_t request = 0;
request |= (1 << 0); // Start bit
request |= (APnDP << 1); // AP/DP select
request |= (RnW << 2); // Read/Write
request |= (a2 << 3); // Address bit 2
request |= (a3 << 4); // Address bit 3
request |= (parity << 5); // Parity
request |= (0 << 6); // Stop bit
request |= (1 << 7); // Park bit
return request;
}Unlike software bit-banging (which suffers from timing jitter and CPU overhead), this implementation uses RP2040/RP2350's Programmable I/O (PIO) blocks for deterministic timing.
The PIO program (swd.pio) implements a command-based interface where each FIFO entry encodes either a command or data payload. Command format:
Bits 0-7: Bit count - 1
Bit 8: Direction (0=input, 1=output)
Bits 9-13: Target instruction address
The state machine operates at 4 cycles per clock period, providing precise SWCLK generation independent of system clock frequency. See swd.pio:45-68 for the complete implementation.
Clock divider calculation (swd_protocol.c:313-330) accounts for this 4-cycle period:
uint32_t clk_sys_khz = clock_get_hz(clk_sys) / 1000;
uint32_t divider = (((clk_sys_khz + freq_khz - 1) / freq_khz) + 3) / 4;ARM Debug Interface Architecture v6 introduces a Dormant State to enable coexistence of multiple debug protocols (JTAG and SWD) on the same pins. At power-up, RP2350's SW-DP enters the Dormant state, requiring explicit activation before SWD operations can proceed.
The dormant state solves a fundamental problem: JTAG uses 5 signals (TMS, TCK, TDI, TDO, TRST), while SWD uses 2 (SWCLK, SWDIO). When both protocols share physical pins, the debug port must determine which protocol the debugger intends to use. The solution is to require a protocol-specific "unlock" sequence that:
- Cannot be generated accidentally by non-debug traffic on the pins
- Is sufficiently long to avoid false positives (128 bits)
- Uniquely identifies the target protocol (JTAG vs SWD)
The SW-DP implements a finite state machine with three protocol modes:
Power-On → [Default State] → Dormant
│
┌────────────┼────────────┐
│ │
JTAG Activation SWD Activation
Sequence (0x33bbbbba) Sequence (0x1a)
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ JTAG │ │ SWD │
│ Active │ │ Active │
└────┬─────┘ └────┬─────┘
│ │
JTAG-to-Dormant SWD-to-Dormant
Sequence Sequence
│ │
└──────────┬──────────────┘
▼
┌──────────┐
│ Dormant │
└──────────┘
Once activated, the debug port remains in the selected protocol mode until:
- A transition-to-dormant sequence is sent
- Power is cycled
- The external reset (RUN) pin is asserted
Before sending a protocol-specific activation code, ARM requires transmission of a 128-bit Selection Alert Sequence. This sequence serves as a "wake-up call" that:
- Synchronizes the target's bit-stream parser
- Ensures the target is listening for an activation sequence
- Provides sufficient entropy to avoid accidental activation
The Selection Alert Sequence is a fixed 128-bit pattern defined in the ADI v6 specification:
0x19bc0ea2_e3ddafe9_86852d95_6209f392 (transmitted LSB-first)
This constant was chosen for its Hamming distance properties—it is unlikely to occur in normal signal traffic or be generated by crosstalk, glitches, or other non-debug activity.
Our implementation (swd_protocol.c:357-382) uses a defensive activation strategy that ensures reliable connection regardless of the SW-DP's initial state:
// Phase 1: Exit any prior protocol mode
static const uint8_t seq_jtag_to_dormant[] = {
0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xbc,0xe3
};
// Phase 2: Activate SWD mode
static const uint8_t seq_dormant_to_swd[] = {
0xff, // Line reset (8 ones)
0x92,0xf3,0x09,0x62,0x95,0x2d,0x85,0x86, // Selection Alert
0xe9,0xaf,0xdd,0xe3,0xa2,0x0e,0xbc,0x19, // (128 bits)
0xa0,0xf1,0xff, // SWD Activation Code (0x1a)
0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xff, // Line reset (>50 ones)
0x00 // Idle low
};Why this two-phase approach?
The problem is that we don't know the SW-DP's current state:
- Fresh power-up: SW-DP is in Dormant mode (default)
- Prior debug session: SW-DP may be in SWD or JTAG mode
- Failed connection attempt: SW-DP may be in an undefined state
If the SW-DP is already in SWD or JTAG mode, sending the Selection Alert Sequence will be interpreted as data transactions, potentially putting the DP into an error state. Our solution:
Phase 1: Force transition to Dormant
Send the JTAG-to-Dormant sequence (56 ones followed by 0xbc, 0xe3). This sequence:
- If in JTAG mode: transitions to Dormant
- If in SWD mode: interpreted as line reset + invalid transactions (harmless)
- If already Dormant: has no effect (dormant state ignores invalid input)
The sequence consists of:
- 56 clock cycles high (JTAG TMS=1 → Test-Logic-Reset state)
- 0x3cbe (0xbc, 0xe3 LSB-first): JTAG-specific exit pattern
Phase 2: Activate SWD from Dormant
Now that we're guaranteed to be in Dormant mode (or already in SWD mode where line reset is idempotent), we send:
- Line reset (8 ones): Clears any pending SWD transactions
- Selection Alert Sequence (128 bits): Wakes dormant state machine
- SWD Activation Code (0x1a, 8 bits): Selects SWD protocol
- Line reset (>50 ones): Enters SWD Reset state, clearing sticky errors
- Idle cycles: Ensures clean transition
The SWD Activation Code 0x1a decodes as:
Bits[7:0] = 0x1a = 0b00011010
This specific bit pattern was chosen to be distinct from valid JTAG TMS sequences, ensuring protocol disambiguation.
The RP2350 datasheet (Section 3.5.1) describes a simpler connection sequence:
1. At least 8 × SWCLK cycles with SWDIO high
2. The 128-bit Selection Alert sequence
3. Four SWCLK cycles with SWDIO low
4. SWD activation code: 0x1a, LSB first
5. At least 50 × SWCLK cycles with SWDIO high (line reset)
6. A DPIDR read to exit the Reset state
This sequence assumes the SW-DP is in Dormant mode at power-up. However, in real-world scenarios:
- The target may have been previously debugged (SW-DP in SWD mode)
- A debugger crash may have left the SW-DP in an error state
- Multi-drop SWD configurations may require explicit state reset
Our JTAG→Dormant→SWD sequence provides universal robustness: it works regardless of the SW-DP's initial state. The cost is negligible—approximately 100 extra clock cycles, taking ~100µs at 1 MHz SWCLK—while the benefit is reliable connection without manual power-cycling.
After activation, we immediately read DP_IDCODE (swd_protocol.c:386-397):
uint32_t idcode = 0;
err = swd_read_dp_raw(target, DP_IDCODE, &idcode);
if (err != SWD_OK) {
swd_set_error(target, err, "Failed to read IDCODE");
return err;
}
if ((idcode & 0x0fffffff) == 0) {
swd_set_error(target, SWD_ERROR_PROTOCOL, "Invalid IDCODE: 0x%08x", idcode);
return SWD_ERROR_PROTOCOL;
}A successful IDCODE read confirms:
- SWD protocol is active
- The SW-DP is responding to transactions
- The SWCLK frequency is within tolerance
- SWDIO signal integrity is sufficient
For RP2350, the IDCODE is 0x4c013477
This defensive activation strategy, while not strictly necessary for fresh power-up scenarios, ensures our library works reliably across the full range of real-world debug connection scenarios—a critical property for a reusable debug library.
The Debug Access Port (DAP) provides memory-mapped access to debug resources through two register banks:
The Debug Port (DP) manages power domains and AP selection:
- DP_IDCODE (0x0): Designer and part number identification
- DP_CTRL_STAT (0x4): Power control and status flags
- DP_SELECT (0x8): AP and register bank selection
- DP_RDBUFF (0xC): Read buffer for pipelined AP reads
Access Ports (AP) provide interfaces to debug resources. RP2350 implements multiple APs:
- AP 0x0: ROM Table
- AP 0x2: ARM Core 0 AHB-AP
- AP 0x4: ARM Core 1 AHB-AP
- AP 0x8: RP2350-specific AP
- AP 0xA: RISC-V APB-AP (target of this library)
Each AP has standardized registers:
- AP_CSW (0x00): Control/Status Word
- AP_TAR (0x04): Transfer Address Register
- AP_DRW (0x0C): Data Read/Write Register
- AP_IDR (0xFC): Identification Register
Standard ARM DP_SELECT format uses bits[31:24] for APSEL and bits[7:4] for APBANKSEL. RP2350 implements a non-standard encoding (dap.c:18-22):
uint32_t make_dp_select_rp2350(uint8_t apsel, uint8_t bank, bool ctrlsel) {
// [15:12] = APSEL, [11:8] = 0xD, [7:4] = bank, [0] = ctrlsel
return ((apsel & 0xF) << 12) | (0xD << 8) | ((bank & 0xF) << 4) | (ctrlsel ? 1 : 0);
}The magic constant 0xD in bits[11:8] is undocumented but required for correct AP selection.
AP registers are accessed through a banking mechanism where DP_SELECT must be written before each AP access. To minimize SWD transactions, the library maintains a cache of the current bank selection (dap.c:28-55):
static swd_error_t select_ap_bank(swd_target_t *target, uint8_t apsel, uint8_t bank) {
if (target->dap.current_apsel == apsel &&
target->dap.current_bank == bank &&
target->dap.ctrlsel == true) {
return SWD_OK; // Already selected
}
// Write DP_SELECT...
target->dap.current_apsel = apsel;
target->dap.current_bank = bank;
// ...
}This caching reduces transaction count by approximately 50% in typical debug sessions.
Before any debug operations can proceed, the Debug Power Domain (DPD) and System Power Domain (SPD) must be powered up. This is not a physical power operation but rather clock and reset domain enabling.
The power-up sequence (dap.c:61-110) follows the ARM Debug Interface specification:
- Clear sticky errors: Write 0 to DP_CTRL_STAT
- Request power-up: Set CDBGPWRUPREQ (bit 28) and CSYSPWRUPREQ (bit 30)
- Poll acknowledgment: Wait for CDBGPWRUPACK (bit 29) and CSYSPWRUPACK (bit 31)
uint32_t ctrl_stat = (1 << 28) | (1 << 30);
swd_write_dp_raw(target, DP_CTRL_STAT, ctrl_stat);
for (int i = 0; i < 10; i++) {
swd_read_dp_raw(target, DP_CTRL_STAT, &status);
bool cdbgpwrupack = (status >> 29) & 1;
bool csyspwrupack = (status >> 31) & 1;
if (cdbgpwrupack && csyspwrupack) {
return SWD_OK;
}
sleep_ms(20);
}Failure to complete this sequence results in all subsequent debug operations returning WAIT responses indefinitely.
After DAP power-up, the RP2350-specific Debug Module must be initialized through an undocumented activation handshake. This sequence was reverse-engineered from OpenOCD's RP2350 support with an oscilloscope and patience.
The activation sequence (rp2350.c:106-194) consists of:
uint32_t sel_bank0 = make_dp_select_rp2350(AP_RISCV, 0, true);
dap_write_dp(target, DP_SELECT, sel_bank0);
uint32_t csw = 0xA2000002; // 32-bit access, auto-increment disabled
dap_write_ap(target, AP_RISCV, AP_CSW, csw);The Debug Module registers are normally accessed through Bank 0, but activation requires Bank 1:
uint32_t sel_bank1 = make_dp_select_rp2350(AP_RISCV, 1, true);
dap_write_dp(target, DP_SELECT, sel_bank1);
// Three-phase handshake
dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000000); // Reset
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);
dap_write_ap(target, AP_RISCV, AP_CSW, 0x00000001); // Activate
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);
dap_write_ap(target, AP_RISCV, AP_CSW, 0x07FFFFC1); // Configure
dap_read_dp(target, DP_RDBUFF);
sleep_ms(50);The expected status response is 0x04010001.
The RISC-V Debug Module implements the RISC-V External Debug Support specification v0.13. Debug Module registers are memory-mapped at base address 0x40 (register addresses are byte offsets × 4).
Key registers (rp2350.c:17-29):
#define DM_DMCONTROL (0x10 * 4) // Hart control
#define DM_DMSTATUS (0x11 * 4) // Hart status
#define DM_ABSTRACTCS (0x16 * 4) // Abstract command status
#define DM_COMMAND (0x17 * 4) // Abstract command execution
#define DM_DATA0 (0x04 * 4) // Data transfer register
#define DM_PROGBUF0 (0x20 * 4) // Program buffer word 0
#define DM_PROGBUF1 (0x21 * 4) // Program buffer word 1
#define DM_SBCS (0x38 * 4) // System Bus Access Control
#define DM_SBADDRESS0 (0x39 * 4) // SBA Address
#define DM_SBDATA0 (0x3C * 4) // SBA DataHart (hardware thread) execution is controlled through DMCONTROL register fields:
- dmactive (bit 0): Debug Module active (must be 1)
- haltreq (bit 31): Request hart halt
- resumereq (bit 30): Request hart resume
Halt sequence (rp2350.c:205-240):
uint32_t dmcontrol = (1 << 31) | (1 << 0); // haltreq | dmactive
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
// Poll DMSTATUS.allhalted (bit 9)
for (int i = 0; i < 10; i++) {
swd_result_t result = dap_read_mem32(target, DM_DMSTATUS);
bool allhalted = (result.value >> 9) & 1;
if (allhalted) {
target->rp2350.hart_halted = true;
return SWD_OK;
}
sleep_ms(10);
}Abstract commands provide a high-level interface to hart state without halting. The COMMAND register format for GPR access:
Bits 0-15: regno (0x1000 + reg_num for GPRs)
Bit 16: write (1=write, 0=read)
Bit 17: transfer (1=execute transfer)
Bits 20-22: aarsize (2=32-bit access)
GPR read implementation (rp2350.c:333-389):
uint32_t command = 0;
command |= (0x1000 + reg_num) << 0; // regno
command |= (1 << 17); // transfer
command |= (2 << 20); // aarsize=32-bit
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target); // Poll ABSTRACTCS.busy
result = dap_read_mem32(target, DM_DATA0);The Program Buffer (PROGBUF) is a 16-entry instruction memory within the Debug Module that enables execution of arbitrary RISC-V code in the debug context. Understanding its operation requires examining the execution model, register preservation semantics, and synchronization mechanisms.
A RISC-V hart operates in one of two contexts:
-
Normal Context: The hart executes from main memory, PC advances sequentially, and all architectural state is visible to the program.
-
Debug Context: Upon entering debug mode (via halt request, ebreak, or trigger), the hart:
- Saves PC to DPC (Debug Program Counter, CSR 0x7b1)
- Enters a special execution mode where PROGBUF instructions execute
- Maintains all GPRs and CSRs in their pre-halt state
- Cannot access main memory without explicit instructions
The Debug Module provides a "scratch pad" where debugger-supplied instructions execute with full access to hart state, but without disturbing that state beyond explicit modifications.
RP2350's Debug Module provides 2 program buffer entries (PROGBUF0 and PROGBUF1), though the specification allows up to 16. Each entry holds one 32-bit RISC-V instruction:
#define DM_PROGBUF0 (0x20 * 4) // First instruction
#define DM_PROGBUF1 (0x21 * 4) // Second instruction (typically ebreak)The execution model assumes the final instruction is ebreak (0x00100073), which returns control to the Debug Module and makes the hart available for further debug operations.
Abstract commands can trigger PROGBUF execution through the postexec bit (bit 18 of the COMMAND register). This creates a transactional execution model:
┌──────────────────────────────────────────┐
│ 1. Debugger writes PROGBUF instructions │
├──────────────────────────────────────────┤
│ 2. Debugger writes DATA0 (optional) │
├──────────────────────────────────────────┤
│ 3. Abstract command with postexec=1 │
│ - Transfers DATA0 → GPR (if transfer=1)│
│ - Executes PROGBUF[0]..PROGBUF[N] │
│ - Executes ebreak (returns to DM) │
│ - Transfers GPR → DATA0 (if transfer=1)│
└──────────────────────────────────────────┘
This mechanism eliminates race conditions: the data transfer and program execution form an atomic operation from the debugger's perspective.
The Debug Program Counter (DPC, CSR 0x7b1) cannot be accessed via abstract commands—it exists only in debug context and abstract commands target normal context registers. Reading DPC requires PROGBUF execution (rp2350.c:804-833):
Phase 1: Preserve scratch register
swd_result_t saved_s0 = rp2350_read_reg(target, hart_id, 8); // x8 = s0The RISC-V ABI designates s0 (x8) as a saved register, but we must preserve it because our PROGBUF code will clobber it.
Phase 2: Write PROGBUF instructions
dap_write_mem32(target, DM_PROGBUF0, 0x7b102473); // csrr s0, dpc
dap_write_mem32(target, DM_PROGBUF1, 0x00100073); // ebreakThe instruction csrr s0, dpc (CSR Read) has the encoding:
31 20 19 15 14 12 11 7 6 0
┌─────────┬───────┬──────┬───────┬────────┐
│ 0x7b1 │ 0x00 │ 0x2 │ 0x08 │ 0x73 │
│ CSR addr│ rs1 │funct3│ rd │ opcode │
│ DPC │ x0 │CSRRS │ s0 │ SYSTEM │
└─────────┴───────┴──────┴───────┴────────┘
- funct3=0x2 (CSRRS): CSR Read and Set. Since rs1=x0, no bits are set (read-only operation).
- CSR 0x7b1: DPC is defined in RISC-V Debug Spec v0.13, section 4.8.2
Phase 3: Execute with postexec
uint32_t command = (1 << 18); // postexec=1, transfer=0
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target); // Poll ABSTRACTCS.busyThe hart now executes:
csrr s0, dpc→ DPC value loaded into s0ebreak→ Return to Debug Module, s0 contains DPC
Phase 4: Extract result via abstract command
result = rp2350_read_reg(target, hart_id, 8); // Read s0 (now contains DPC)Phase 5: Restore architectural state
rp2350_write_reg(target, hart_id, 8, saved_s0.value); // Restore s0This five-phase sequence is invisible to the hart's normal execution: when resumed, all registers appear unchanged.
Writing DPC uses the inverse data flow (rp2350.c:879-909):
// Phase 1: Transfer new PC value to s0
err = rp2350_write_reg(target, hart_id, 8, new_pc_value);
// Phase 2: Write PROGBUF to copy s0 → DPC
dap_write_mem32(target, DM_PROGBUF0, 0x7b141073); // csrw dpc, s0
dap_write_mem32(target, DM_PROGBUF1, 0x00100073); // ebreak
// Phase 3: Execute
uint32_t command = (1 << 18); // postexec=1
dap_write_mem32(target, DM_COMMAND, command);
wait_abstract_command(target);The instruction csrw dpc, s0 (CSR Write) has encoding 0x7b141073:
31 20 19 15 14 12 11 7 6 0
┌─────────┬───────┬──────┬───────┬────────┐
│ 0x7b1 │ 0x08 │ 0x1 │ 0x00 │ 0x73 │
│ CSR addr│ rs1 │funct3│ rd │ opcode │
│ DPC │ s0 │CSRRW │ x0 │ SYSTEM │
└─────────┴───────┴──────┴───────┴────────┘
funct3=0x1 (CSRRW): CSR Read and Write. The old CSR value is discarded (rd=x0), and s0's value is written to DPC.
The PROGBUF execution environment imposes several constraints:
-
Memory Access Limitation: PROGBUF instructions execute in debug mode, where memory access depends on Debug Module configuration. Standard loads/stores may fault.
-
Instruction Count: With only 2 entries, complex operations require multiple PROGBUF sequences. Each sequence incurs the cost of abstract command execution (~100µs typical).
-
No Branching: PROGBUF is linear. Conditional execution requires host-side logic to decide which PROGBUF sequence to execute.
-
Register Pressure: Only one scratch register (s0) is conventionally used. More complex operations require additional saves/restores.
-
Ebreak Requirement: The final instruction must be
ebreak. Omitting it causes the hart to hang in debug mode.
This execution model provides a "remote procedure call" mechanism where the host supplies short instruction sequences that execute atomically on the hart, providing a window into debug-only architectural state.
System Bus Access (SBA) represents a fundamental departure from the traditional halt-based debugging model. Where classical debugging requires stopping the hart, transferring data through GPRs, and resuming, SBA provides a "back door" to the memory subsystem that operates concurrently with hart execution.
The Debug Module contains a bus master that can initiate memory transactions on the system bus independently of the harts. This master has the following characteristics:
- Separate Bus Master: SBA transactions do not consume hart resources or execution time
- Concurrent Operation: Memory reads/writes occur while harts execute normally
- Cache Coherency Dependency: SBA bypasses hart caches; coherency is NOT guaranteed
- Bus Arbitration: SBA competes with harts for bus bandwidth
The SBA interface consists of three memory-mapped registers in the Debug Module:
#define DM_SBCS (0x38 * 4) // System Bus Access Control and Status
#define DM_SBADDRESS0 (0x39 * 4) // System Bus Address (32-bit)
#define DM_SBDATA0 (0x3C * 4) // System Bus Data (32-bit)The SBCS register (offset 0x38) contains configuration and status fields defined in RISC-V Debug Spec v0.13.2, section 3.12.18:
31:29 sbversion (read-only) SBA version
28:23 (reserved) 0
22 sbbusyerror (W1C) Bus error occurred
21 sbbusy (read-only) Bus master is busy
20 sbreadonaddr (read-write) Auto-read on SBADDRESS0 write
19:17 sbaccess (read-write) Access width: 0=8-bit, 1=16-bit, 2=32-bit
16 sbautoincrement (read-write) Auto-increment address after access
15 sbreadondata (read-write) Auto-read on SBDATA0 read
14:12 sberror (W1C) Error status (0=none, 1=timeout, 2=bad addr, 3=alignment, 4=size, 7=other)
11:5 sbasize (read-only) Address width in bits (32 for RP2350)
The SBA subsystem initialization (rp2350.c:958-992) follows a capability discovery pattern:
Phase 1: Read SBCS to detect supported features
swd_result_t result = dap_read_mem32(target, DM_SBCS);Phase 2: Verify SBA capability
// Check sbasize field (bits [11:5]) to verify SBA is present
uint32_t sbasize = (result.value >> 5) & 0x7F;
if (sbasize == 0) {
return SWD_ERROR_INVALID_STATE; // SBA not available
}The sbasize field indicates the system bus address width (32 bits for RP2350). RP2350 supports 8-bit, 16-bit, and 32-bit access widths. We configure for 32-bit:
Phase 3: Configure access mode
uint32_t sbcs = 0;
sbcs |= (2 << 17); // sbaccess = 2 (32-bit)
sbcs |= (1 << 20); // sbreadonaddr = 1 (auto-read trigger)
dap_write_mem32(target, DM_SBCS, sbcs);The sbreadonaddr flag is critical: it converts the address write into an atomic read-trigger operation.
Without sbreadonaddr, a memory read requires three transactions:
1. Write address to SBADDRESS0
2. Write SBCS with read trigger
3. Read data from SBDATA0
With sbreadonaddr=1, the middle step is eliminated:
1. Write address to SBADDRESS0 ← Triggers bus read automatically
2. Read data from SBDATA0 ← Data is ready
Implementation (rp2350.c:1013-1020):
dap_write_mem32(target, DM_SBADDRESS0, addr); // Write triggers read
result = dap_read_mem32(target, DM_SBDATA0); // Data is already validThe Debug Module's state machine looks like:
IDLE → [SBADDRESS0 written] → BUSY → [bus read completes] → DATA_READY
↓
[bus timeout] → SBERROR=1
Memory writes use SBDATA0 as the trigger register:
dap_write_mem32(target, DM_SBADDRESS0, addr); // Set address
dap_write_mem32(target, DM_SBDATA0, value); // Write triggers bus writeThe write to SBDATA0 initiates the system bus write transaction. The debugger should poll SBCS.sbbusyerror to detect completion (though in practice, pipelined writes are often used).
The SBCS.sberror field reports transaction failures:
0: No error
1: Timeout (bus did not respond)
2: Bad address (unmapped region)
3: Bad alignment (misaligned access)
4: Bad size (unsupported width)
7: Other error
Errors are sticky and must be explicitly cleared by writing 1 to SBCS.sberror (W1C = Write-1-to-Clear).
The library maintains comprehensive state tracking to avoid redundant SWD transactions:
typedef struct {
bool connected;
uint32_t idcode;
bool resource_registered;
// ...
} swd_target_t;typedef struct {
uint8_t current_apsel;
uint8_t current_bank;
bool ctrlsel;
uint32_t select_cache;
bool powered;
uint retry_count;
} dap_state_t;RP2350 contains two RISC-V harts (hardware threads) that execute independently. The library maintains per-hart state to avoid redundant operations and enable concurrent debugging:
typedef struct {
bool halt_state_known; // false after resume, true after halt/read status
bool halted; // true if hart is currently halted
// Register cache
bool cache_valid; // true if cached values are current
uint32_t cached_pc;
uint32_t cached_gprs[32];
uint64_t cache_timestamp; // For LRU if needed
} hart_state_t;The top-level RP2350 state maintains an array of hart states:
#define RP2350_NUM_HARTS 2
typedef struct {
bool initialized;
bool sba_initialized;
// Per-hart state
hart_state_t harts[RP2350_NUM_HARTS];
// Shared cache configuration
bool cache_enabled;
} rp2350_state_t;The halt_state_known flag implements a three-state model:
- Unknown (
halt_state_known=false): Hart state is uncertain (after resume or initialization) - Known Halted (
halt_state_known=true, halted=true): Hart is confirmed halted - Known Running (
halt_state_known=true, halted=false): Hart is confirmed running
This prevents expensive DMSTATUS polls when the state is known. State transitions:
┌─────────────┐
│ Unknown │
└──────┬──────┘
│
┌─────────────┼─────────────┐
│ │
halt_request() read_dmstatus()
│ │
▼ ▼
┌────────────┐ ┌──────────────┐
│ Halted │ │ Running │
└─────┬──────┘ └──────┬───────┘
│ │
│ resume() │
└───────────────────────────┘
│
▼
┌─────────┐
│ Unknown │ (state invalidated)
└─────────┘
When cache_enabled=true, the library caches register values after reads. This optimization benefits:
- Repeated reads of the same register (e.g., polling loop variables)
- Bulk register dumps where
rp2350_read_all_regs()populates the cache - Reduced SWD traffic (each register read requires ~6 SWD transactions)
Cache invalidation occurs on:
- Hart resume (execution changes registers)
- Register write (specific register invalidated)
- Hart halt request (conservative invalidation)
The cache is per-hart, allowing concurrent debugging of both harts without interference.
PIO resources are scarce: RP2040/RP2350 provide 2 PIO blocks with 4 state machines each. The library implements a global resource tracker for multi-target support.
typedef struct {
swd_target_t *pio0_sm_owners[4];
swd_target_t *pio1_sm_owners[4];
uint active_count;
} resource_tracker_t;
extern resource_tracker_t g_resources;When SWD_PIO_AUTO or SWD_SM_AUTO is specified in configuration, the library scans for free resources (swd.c:105-125):
swd_error_t allocate_pio_sm(PIO *pio, uint *sm) {
for (uint i = 0; i < 4; i++) {
if (g_resources.pio0_sm_owners[i] == NULL) {
*pio = pio0;
*sm = i;
return SWD_OK;
}
}
// Try PIO1...
}Up to 8 simultaneous target connections are supported (limited by hardware resources).
The library provides comprehensive error reporting through enumerated error codes and detailed message strings.
typedef enum {
SWD_OK = 0,
SWD_ERROR_TIMEOUT, // Transaction timeout
SWD_ERROR_FAULT, // Target FAULT response
SWD_ERROR_PROTOCOL, // Malformed packet
SWD_ERROR_PARITY, // Parity check failure
SWD_ERROR_WAIT, // WAIT response retry exhausted
SWD_ERROR_NOT_CONNECTED, // No active connection
SWD_ERROR_NOT_HALTED, // Operation requires halted hart
SWD_ERROR_ALREADY_HALTED, // Hart already halted (informational)
// ...
} swd_error_t;Each target maintains a 128-byte error detail buffer for formatted diagnostic messages (swd.c:67-84):
void swd_set_error(swd_target_t *target, swd_error_t error,
const char *detail, ...) {
target->last_error = error;
va_list args;
va_start(args, detail);
vsnprintf(target->error_detail, sizeof(target->error_detail),
detail, args);
va_end(args);
}SWD protocol ACK responses are mapped to error codes (swd.c:91-99):
swd_error_t swd_ack_to_error(uint8_t ack) {
switch (ack) {
case 0x1: return SWD_OK; // OK
case 0x2: return SWD_ERROR_WAIT; // WAIT
case 0x4: return SWD_ERROR_FAULT; // FAULT
default: return SWD_ERROR_PROTOCOL;
}
}WAIT responses trigger automatic retry with backoff (swd_protocol.c:197-208):
for (uint retry = 0; retry < target->dap.retry_count; retry++) {
err = swd_io_raw(target, request, value, false);
if (err != SWD_ERROR_WAIT) break;
sleep_us(100);
}Default retry count is 5, configurable via swd_config_t.
swd_config_t config = swd_config_default();
config.pin_swclk = 2;
config.pin_swdio = 3;
config.freq_khz = 1000;
config.enable_caching = true;
swd_target_t *target = swd_target_create(&config);
swd_connect(target);
rp2350_init(target);// Halt hart 0
rp2350_halt(target, 0);
// Read program counter
swd_result_t pc = rp2350_read_pc(target, 0);
if (pc.error == SWD_OK) {
printf("PC: 0x%08x\n", pc.value);
}
// Read all registers
uint32_t regs[32];
rp2350_read_all_regs(target, 0, regs);
// Single-step execution
rp2350_step(target, 0);
// Resume execution
rp2350_resume(target, 0);
// Reset hart
rp2350_reset(target, 0, true); // Reset and halt// Read memory (non-intrusive via SBA)
swd_result_t result = rp2350_read_mem32(target, 0x20000000);
// Write memory
rp2350_write_mem32(target, 0x20000000, 0xDEADBEEF);
// Block operations
uint32_t buffer[256];
rp2350_read_mem_block(target, 0x20000000, buffer, 256);const uint32_t program[] = {
0x200415b7, // lui a1, 0x20040
0xabcd0537, // lui a0, 0xabcd0
0x23450513, // addi a0, a0, 0x234
0x00a5a223, // sw a0, 4(a1)
0x0000006f, // j 0 (infinite loop)
};
rp2350_execute_code(target, 0, 0x20000000, program, 5);// Trace callback receives each executed instruction
bool trace_callback(const trace_record_t *record, void *user_data) {
printf("PC: 0x%08x Instruction: 0x%08x\n",
record->pc, record->instruction);
// Optional: inspect registers
if (record->regs) {
printf(" x5=0x%08x\n", record->regs[5]);
}
return true; // Continue tracing (false = stop)
}
// Trace 100 instructions on hart 0, capturing registers
int count = rp2350_trace(target, 0, 100, trace_callback, NULL, true);
printf("Traced %d instructions\n", count);// Operate on both harts independently
rp2350_halt(target, 0);
rp2350_halt(target, 1);
// Read registers from both harts
uint32_t h0_regs[32], h1_regs[32];
rp2350_read_all_regs(target, 0, h0_regs);
rp2350_read_all_regs(target, 1, h1_regs);
// Execute different programs on each hart
rp2350_execute_code(target, 0, 0x20000000, program0, len0);
rp2350_execute_code(target, 1, 0x20001000, program1, len1);
// Trace hart 1 while hart 0 runs
rp2350_resume(target, 0);
rp2350_trace(target, 1, 50, trace_callback, NULL, false);add_subdirectory(lib/pico2-swd-riscv)
target_link_libraries(your_application pico2_swd_riscv)Set compile-time debug verbosity:
target_compile_definitions(your_application PRIVATE PICO2_SWD_DEBUG_LEVEL=3)Levels: 0 (none), 1 (warnings), 2 (info), 3 (debug).
- ARM Debug Interface Architecture Specification v5.2
- ARM CoreSight SWD-DP Technical Reference Manual
- RISC-V External Debug Support version 0.13
- RP2350 Datasheet, Chapter 3.5: Debug
- ADIv5.2 Supplement for Multi-drop SWD
- IEEE 1149.1-2001 (JTAG)
Single-step execution enables instruction-level debugging by executing exactly one instruction before re-entering debug mode. This is implemented via the DCSR.step bit (Debug Control and Status Register, bit 2).
When DCSR.step=1, the hart executes one instruction after resumereq, then immediately re-halts:
Debug Mode → [resumereq + DCSR.step=1] → Execute 1 instruction → Debug Mode
Implementation (rp2350.c:431-504):
Phase 1: Read current DCSR
swd_result_t dcsr_result = read_dcsr(target, hart_id);DCSR must be read via PROGBUF (see Section 7.D) because it's a debug-only CSR.
Phase 2: Set step bit
uint32_t dcsr_stepped = dcsr_result.value | (1 << 2);
write_dcsr(target, hart_id, dcsr_stepped);Phase 3: Resume hart
uint32_t dmcontrol = make_dmcontrol(hart_id, false, true, false);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);The hart now executes exactly one instruction, then:
- PC is saved to DPC
- Hart re-enters debug mode
- DCSR.cause = 0x4 (single-step)
Phase 4: Wait for automatic halt
poll_dmstatus_halted(target, hart_id, true);Phase 5: Clear step bit
write_dcsr(target, hart_id, dcsr_result.value); // Restore original DCSRThis ensures subsequent rp2350_resume() calls don't single-step.
The library implements software instruction tracing by repeatedly single-stepping and recording each instruction. This provides a "time-travel" debugging capability at the cost of execution speed.
Tracing uses a callback function to process each instruction (rp2350.c:1262-1337):
typedef struct {
uint32_t pc;
uint32_t instruction;
uint32_t regs[32]; // Valid if capture_regs=true
} trace_record_t;
typedef bool (*trace_callback_t)(const trace_record_t *record, void *user_data);The callback returns true to continue or false to stop.
int rp2350_trace(swd_target_t *target, uint8_t hart_id,
uint32_t max_instructions,
trace_callback_t callback, void *user_data,
bool capture_regs);For each instruction:
- Read PC (via PROGBUF): Current instruction address
- Read memory at PC: Fetch the instruction word
- Optional: Read all GPRs: If
capture_regs=true - Invoke callback: User processes the record
- Single-step: Execute one instruction
- Repeat until
max_instructionsor callback returns false
Loop Detection:
bool detect_loop(const trace_record_t *record, void *user_data) {
static uint32_t entry_pc = 0;
static int count = 0;
if (count == 0) entry_pc = record->pc;
if (record->pc == entry_pc && count > 0) {
printf("Loop detected at PC=0x%08x\n", record->pc);
return false; // Stop trace
}
count++;
return true;
}Register State History:
bool capture_state(const trace_record_t *record, void *user_data) {
printf("%08x: %08x x5=%08x x6=%08x\n",
record->pc, record->instruction,
record->regs[5], record->regs[6]);
return true;
}
rp2350_trace(target, 0, 100, capture_state, NULL, true);- Speed: ~5ms per instruction (200 instructions/second)
- Interrupt Masking: Tracing should occur with interrupts disabled (clear mstatus.MIE)
- Memory Consistency: Instructions are fetched via SBA; ensure I-cache coherency
- No Hardware Triggers: Trace starts immediately; no "trace until condition"
Hart reset (rp2350_reset) implements a controlled reset sequence via DMCONTROL.ndmreset (non-debug module reset, bit 1).
swd_error_t rp2350_reset(swd_target_t *target, uint8_t hart_id,
bool halt_on_reset);Phase 1: Assert ndmreset
uint32_t dmcontrol = make_dmcontrol(hart_id, halt_on_reset, false, true);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
sleep_ms(10); // Hold resetPhase 2: Deassert ndmreset
dmcontrol = make_dmcontrol(hart_id, halt_on_reset, false, false);
dap_write_mem32(target, DM_DMCONTROL, dmcontrol);
sleep_ms(50); // Wait for reset completionIf halt_on_reset=true, the DMCONTROL.haltreq bit remains set, causing the hart to enter debug mode immediately after reset, with PC set to the reset vector.
This reset is equivalent to a power-on reset for the hart:
- PC → reset vector (typically 0x00000000 for RP2350 RISC-V cores)
- All CSRs → architectural reset values
- GPRs → undefined
- Cache → invalidated
Unlike a full chip reset, peripherals and other harts are unaffected.
RP2350's two RISC-V harts (Hazard3 cores) are symmetric and independently controllable. The library provides full per-hart state tracking and concurrent operation.
Each Debug Module operation targets a specific hart via DMCONTROL.hartsel[9:0]:
static inline uint32_t make_dmcontrol(uint8_t hart_id, bool haltreq,
bool resumereq, bool ndmreset) {
uint32_t dmcontrol = (1 << 0); // dmactive = 1
dmcontrol |= ((uint32_t)hart_id << 16); // hartsello[9:0] at bits 25:16
if (haltreq) dmcontrol |= (1 << 31);
if (resumereq) dmcontrol |= (1 << 30);
if (ndmreset) dmcontrol |= (1 << 1);
return dmcontrol;
}Before any hart-specific operation (halt, resume, register read), the library writes DMCONTROL with the correct hart_id, switching the Debug Module's attention to that hart.
The test suite validates that harts operate independently:
// Halt hart 0, keep hart 1 running
rp2350_halt(target, 0);
rp2350_resume(target, 1);
// Read hart 0 registers while hart 1 executes
uint32_t h0_regs[32];
rp2350_read_all_regs(target, 0, h0_regs);This enables debugging one hart while the other maintains real-time operation.
Each hart maintains independent register state. Writing x5 on hart 0 does not affect x5 on hart 1. This is validated by test_dual_hart.c:69-115:
rp2350_write_reg(target, 0, 5, 0xAAAAAAAA);
rp2350_write_reg(target, 1, 5, 0x55555555);
assert(rp2350_read_reg(target, 0, 5).value == 0xAAAAAAAA);
assert(rp2350_read_reg(target, 1, 5).value == 0x55555555);Both harts share the same physical memory space but maintain independent caches. This creates coherency considerations:
- SBA Writes: Visible to both harts (after cache invalidation)
- Hart 0 Writes: Not immediately visible to Hart 1 if cached
- Explicit Synchronization: Required for inter-hart communication
The test suite exercises memory access while both harts run concurrently (test_mem.c:291-347).
This implementation does not currently support:
RISC-V Debug Specification defines a Trigger Module for hardware breakpoints. Implementation was removed due to complexity. Workaround: Use single-step + PC comparison in software.
The SWD protocol supports multiple targets on one bus via unique addresses. This library assumes a single target. Physical wiring for multi-target is possible but requires additional multiplexing logic.
RP2350's Hazard3 cores support the C extension (16-bit compressed instructions). The library:
- Correctly reads compressed instructions during tracing
- Does NOT decode compressed instruction mnemonics
- Assumes 4-byte alignment for code upload
No cycle-accurate performance counters are exposed. Implementing this requires:
- Access to mcycle/minstret CSRs
- Periodic sampling without halting (not possible with current SBA coherency)
No routines for RP2350 flash programming. This requires:
- Loading flash programmer stub to SRAM
- Executing stub via
rp2350_execute_code() - Monitoring completion via polling
The architecture supports this; implementation is left to applications.
Copyright (c) 2025
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.