SYSTEM DESCRIPTION
Illustrate the bus fabric and single cycle I/O of RP2040
COURSE OUTCOME
Apply
Bus Fabric:
Writes
Processor Subsystem:
Unit 2
Bus Fabric:
Processor Subsystem:
Unit 2
You’ll learn
Bus Fabric:
Detail the structure of the bus fabric
Writes
Processor Subsystem:
Unit 2
4 AHB-Lite masters
Pipelined bus
op Section – Masters (Initiators)
The title “4 AHB-Lite masters” refers to the four components that can initiate data transfers on the AHB-Lite bus:
AHB-Lite Crossbar (4:10)
AHB-Lite Slaves (Memory and High-Speed Peripherals)
Connected below the AHB-Lite crossbar are memory and high-speed modules:
APB Bridge and Splitter (for Low-Speed Peripherals)
System DMA (Direct Memory Access)
AHB-LITE CROSSBAR
Splitters
◦ Perform coarse address decode
◦ Route requests (addresses, write data) to the downstream port indicated by the initial address decode
◦ Route responses (read data, bus errors) from the correct arbiter back to the upstream port
Arbiters
◦ Manage concurrent requests to a downstream port
◦ Route responses (read data, bus errors) to the correct splitter
◦ Implement bus priority rules
Bus Priority
When there are multiple simultaneous accesses to same arbiter, any requests from high-priority masters (priority level 1) will be considered before any requests from low-priority masters (priority 0). If multiple masters of the same priority level attempt to access the same slave simultaneously, a round-robin tie break is applied, i.e. the arbiter grants access to each master in turn.
Bus Priority
When accessing a slave with zero wait states, such as SRAM (i.e. can be accessed once per system clock cycle), high priority masters will never observe any slowdown. it does however mean a low-priority master may get stalled until there is a free cycle.
zero wait states
BUS_PRIORITY Register
Bus Performance Counters
BUSCTRL_PERFCTRx Register
You’ll learn:
AHB-Lite Crossbar
Atomic Register Access
Discern the 4 types of atomic register access
Writes
Processor Subsystem:
Unit 2
Atomic Register Access
Each peripheral register block is allocated 4kB of address space, with registers accessed using one of 4 methods, selected by address decode.
• Addr + 0x0000 : normal read write access
• Addr + 0x1000 : atomic XOR on write
• Addr + 0x2000 : atomic bitmask set on write
• Addr + 0x3000 : atomic bitmask clear on write
This allows individual fields of a control register to be modified without performing a read-modify-write sequence in software:
16kB
You’ll learn:
APB Bridge & Narrow IO Register
Writes
Summarize the APB bridge and narrow IO register writes
Processor Subsystem:
Unit 2
Instruction Cycles
Advanced Performance Bus (APB Bridge)
The APB bridge interfaces the high-speed main AHB-Lite interconnect to the lower-bandwidth peripherals
|| AHB –Zero / APB – Cycle penalty||
APB bus
• APB bus accesses take 2 cycles minimum (setup phase and access phase)
• The bridge adds an additional cycle to read accesses, as the bus request and response are registered
• The bridge adds two additional cycles to write accesses, as the APB setup phase can not begin until the AHB-Lite write data is valid
Throughput: APB bus < AHB Lite
Instruction Cycles
Narrow IO Register Writes
Memory-mapped IO registers on RP2040 ignore the width of bus read/write accesses.
They treat all writes as though they were 32 bits in size.
To update part of an IO register, without a read-modify-write sequence, the best solution on RP2040 is atomic set/clear/XOR
List of Registers
Processor Subsystem:
Define processor subsystem with a block diagram
Unit 2
Processor Subsystem:
Define processor subsystem with a block diagram
Unit 2
Processor Subsystem
NVIC (Nested Vectored Interrupt Controller)
DAP (Debug Access Port)
Define processor subsystem with a block diagram
Intended Learning Outcome
Processor Subsystem
The processors use a number of interfaces to communicate with the rest of the system:
• Each processor uses its own independent 32-bit AHB-Lite bus to access memory and memory-mapped peripherals
• The single-cycle IO block provides high-speed, deterministic access to GPIOs via each processor’s IOPORT
• 26 system-level interrupts are routed to both processors
• A multi-drop Serial Wire Debug bus provides debug access to both processors from an external debug host
Processor Subsystem
• 32-bit AHB-Lite bus to access memory
• The single-cycle IO access to
• 26 system-level interrupts
• Serial Wire Debug
SIO – Single Cycle IO
IOPORT and other special operation
0xd0000000 0xd000017c
Discuss the registers in SIO and find how to choose between different cores and controlling GPIO
Intended Learning Outcome
CPUID
The register CPUID is the first register in the IOPORT space. Core 0 reads a value of 0 when accessing this address, and core 1 reads a value of 1.
GPIO Control
The processors have access to GPIO registers for fast and direct control of pins with GPIO functionality
GPIO_x
GPIO_HI_x
Hardware Spinlocks
Manage mutually-exclusive access to shared software resources.
Read: attempt to claim the lock. ||Write (any value): release the lock
Generally the spinlocks should be used to the short critical sections
Bit 31 | Bit 30 | Bit 29 | Bit 28 | Bit 27 | Bit 26 | Bit 25 | ... | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
– | – | GPIO29 | GPIO28 | GPIO27 | GPIO26 | GPIO25 | ... | GPIO3 | GPIO2 | GPIO1 | GPIO0 |
Bit 31 | Bit 30 | ... | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
– | – | ... | – | – | – | – | GPIO29 | GPIO28 | GPIO27 | GPIO26 |
Inter-processor FIFOs (Mailboxes)
The SIO contains two FIFOs for passing data, messages or ordered events between the two cores
Each FIFO is 32 bits wide, and eight entries deep
One of the FIFOs can only be written by core 0, and read by core 1
FIFO_WR / FIFO_RD / FIFO_ST
• Incoming FIFO contains data (VLD)
• Outgoing FIFO has room for more data (RDY)
• The incoming FIFO was read from while empty at some point in the past (ROE)
• The outgoing FIFO was written to while full at some point in the past (WOF)
FIFO_ST
write
read
The SIO has a FIFO IRQ output for each core, mapped to system IRQ numbers 15 and 16
Core 0
Core 1
Integer Divider
The SIO provides one 8-cycle signed/unsigned divide/modulo module to each of the cores.
Calculation is started by writing a dividend and divisor to the two argument registers, DIVIDEND and DIVISOR. The divider calculates the quotient / and remainder % of this division over the next 8 cycles, and on the 9th cycle the results can be read from the two result registers DIV_QUOTIENT and DIV_REMAINDER.
Interpolator
Each core is equipped with two interpolators (INTERP0 and INTERP1) which can accelerate tasks by combining certain preconfigured operations into a single processor cycle.
Examine how interpolator is used in RP2040
Intended Learning Outcome
Interpolator : Lane operations
Each lane performs these three operations, in sequence:
• A right shift by CTRL_LANEx_SHIFT (0 to 31 bits)
• A mask of bits from CTRL_LANEx_MASK_LSB to CTRL_LANEx_MASK_MSB inclusive (each ranging from bit 0 to bit 31)
• A sign extension from the top of the mask, i.e. take bit CTRL_LANEx_MASK_MSB and OR it into all more-significant bits, if CTRL_LANEx_SIGNED is set
For example, if:
• ACCUM0 = 0xdeadbeef
• CTRL_LANE0_SHIFT = 8
• CTRL_LANE0_MASK_LSB = 4
• CTRL_LANE0_MASK_MSB = 7
• CTRL_SIGNED = 1
Then lane 0 would produce the following results at each stage:
• Right shift by 8 to produce 0x00deadbe
• Mask bits 7 to 4 to produce 0x00deadbe & 0x000000f0 = 0x000000b0
• Sign-extend up from bit 7 to produce 0xffffffb0
0xdeadbeef
11011110101011011011111011101111
Original value: 11011110101011011011111011101111 Right shift by 8: 00000000110111101010110110111110
Mask bitmask: 00000000 00000000 00000000 11110000
00000000 00000000 11011110 10111110 (0x00deadbe)
& 00000000 00000000 00000000 11110000 (0x000000f0)
------------------------------------
00000000 00000000 00000000 10110000
0x000000b0
Original value: 00000000 00000000 00000000 10110000
Already positive
CTRL_LANE0_SHIFT = 8
Mask bits 7 to 4
CTRL_SIGNED = 1
Interpolator : Blend Mode
Blend mode is available on INTERP0 on each core, and is enabled by the CTRL_LANE0_BLEND control flag. It performs linear interpolation,
Blend mode has the following differences from normal mode:
The blend mode seems to perform linear interpolation between two values, which could be useful for smoothly transitioning between values
Interpolator : Clamp Mode
Clamp mode is available on INTERP1 on each core, and is enabled by the CTRL_LANE0_CLAMP control flag. In clamp mode, the
PEEK0/POP0 result is the lane value (shifted, masked, sign-extended ACCUM0) clamped between BASE0 and BASE1.
In other words, if the lane value is greater than BASE1, a value of BASE1 is produced; if less than BASE0, a value of BASE0 is produced
This mode appears to control how values are restricted within specific ranges.
Interpolator : Clamp Mode
Blend Mode 🎚️ | Clamp Mode 🛑 |
Smoothly interpolates between BASE0 and BASE1 | Keeps accumulator value within a defined min–max range |
Uses lower 8 bits of Lane 1 as fraction (α) for mixing | Prevents overflow/underflow when generating indices (safe for arrays) |
Available only on Interpolator 0, Lane 0 | Works on any lane (0 or 1) |
Priority order is established by two factors:
Interrupts
Illustrate interrupts and event signals
Intended Learning Outcome
Interrupts – Event Signals –Debug
Interrupts are signals that inform a processor to temporarily pause its current activities and start executing a specific set of instructions related to the interrupt.
We have a interrupt 0,1 and 2 on core 0
To set the priority of these three interrupts lets us use the NVIC_IPR0
00000000 00000000 01000000 10000000
Here, I have given highest priority to 2nd interrupt by giving “00” to bits 23 &22… second priority to interrupt 1 and the last priority to interrupt 0.
Lets Assume
On RP2040, only the lower 26 IRQ signals are connected on the NVIC, and IRQs 26 to 31 are tied to zero (never firing). The core can still be forced to enter the relevant interrupt handler by writing bits 26 to 31 in the NVIC ISPR register
Interrupt Set-pending Register
We have a interrupt 0,1 and 2 on core 0
To set the priority of these two interrupts lets us use the NVIC_IPR0
00000000 00000000 01000000 10000000
Here, I have given highest priority to 2nd interrupt by giving “00” to bits 23 &22
However if I want to set interrupt 1 as the non maskable interrupt, then we have to use PROC0_NMI_Register
00000000 10000000 01000100 0100000
00000000 00000000 00000000 0000010
00000000 00000000 00000000 0000010
OR
00000000 10000000 01000100 0100010
A random interrupt masking already in core 0
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 |
23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 |
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 |
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
Event Signals
When both processors are sleeping, and the DMA is inactive, RP2040 as a whole can enter a sleep state, disabling clocks on unused infrastructure such as the busfabric, and waking automatically when one of the processors wakes.
While in a WFE (or WFI) sleep state, the processor can shut off its internal clock gates, consuming much less power.
0.2 milliwatts
Debug
The 2-wire Serial Wire Debug (SWD) port provides access to hardware and software debug features including:
Each DAP will only respond to debug commands if correctly addressed by a SWD TARGETSEL command
• Core 0: 0x01002927 • Core 1: 0x11002927 • Rescue DP: 0xf1002927
Additionally, a Rescue DP is available which is connected to system control features
2-wire Serial Wire Debug (SWD)
Rescue DP
The Rescue DP (debug port) is available over the SWD bus and is only intended for use in the specific case where the chip has locked up, for example if code has been programmed into flash which permanently halts the system clock
CDBGPWRUPREQ
Core Debug GPIO Peripheral Write Request.
Core Debug Power Up Acknowledge)
CDBGPWRUPACK
Software control of SWD pins
The SWD pins for Core 0 and Core 1 can be bit-banged via registers in syscfg