Parallax Propeller 2

Documentation

2021-05-18

v35 (Rev B/C silicon)

P2X8C4M64PES

LPD1941 (Rev B) or LHU2019 (Rev C)

PHILIPPINES

(not yet updated: Boot ROM)

Design Status

Date	Progress
2018_04_25	Verilog design files sent to On Semi for Rev A silicon (8 cogs, 512KB hub, 64 smart pins)
2018_05_29	Final ROM data sent to On Semi
2018_07_09	Final Sign-off with On Semi, reticles being made
2018_09_11	Wafers done! Only took 9 weeks, instead of 14.
2018_09_27	Received 10 glob-top prototype chips from On Semi. Chips are functional, but sign-extension problems in Verilog source files caused the following problems: Cogs' IQ modulators' outputs are nonsensical. Smart pin measurement modes which are supposed to count by +1/-1 are counting by +1/+3. ALTx instructions aren't sign-extending S[17:09] before adding into D. These sign-extension problems have already been fixed in the Verilog source files and tested on the FPGA. There is also a low-glitch-on-high-to-float problem on some I/O pins due to a race condition between DIR and OUT signals. This will be fixed by timing constraints in the next silicon. A respin of the silicon is planned after more testing.
2018_11_13	Received 135 Amkor-packaged prototype chips from On Semi. These chips will have better heat dissipation than the glob-top prototypes.
2019_04_11	Rev B respin entered the fab and is due out July 15. Ten glob-top prototypes should arrive on August 1, with 2,400 production chips to follow in a few weeks. The following improvements were made to the chip: All known prior bugs fixed. Clock-gating implemented, reduces power by ~40%. PLL filter modified to reduce jitter and improve lock. System counter extended to 64 bits. GETCT WC retrieves upper 32-bits. Streamer has many new modes with SINC1/SINC2 ADC conversions for Goertzel mode. HDMI mode added to streamer with ascending and descending pinouts for easy PCB layout. SINC2/SINC3 filters added to smart pins for improving ENOB in ADC conversions. Each cog has four 8-bit sample-per-clock ADC channels that feed from new smart pin 'SCOPE' modes. BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT can now work on a span of bits (+S[9:5] bits). Prior SETQ overrides S[9:5]. DIRx/OUTx/FLTx/DRVx can now work on a span of pins (+D[10:6] pins). Prior SETQ overrides D[10:6]. WRPIN/WXPIN/WYPIN/AKPIN can now work on a span of pins (+S[10:6] pins). Prior SETQ overrides S[10:6]. BIT_DAC output now has two 4-bit settings for low and high states, instead of one 8-bit high-state setting. RDxxxx/WRxxxx+PTRx expressions now index -16..+16 with updating and -32..+31 without updating. Sensible PTRx behavior implemented for 'SETQ(2) + RDLONG/WRLONG/WMLONG' operations. RDLUT/WRLUT can now handle PTRx expressions. Cog LUT sharing is now glitch-free. POP now returns Z=1 if result=0, used to return result[30]. XORO32 improved. Main PRNG upgraded to "Xoroshiro128**". The core logic increased by a net 15%, even with significant logic reductions resulting from clock-gating. Fortunately, ON Semi was able to make it all fit within the original die area.
2019_07_13	Wafers out of fab. Packaging underway.
2019_08_01	Received 10 glob-top prototype chips from ON Semi. All bugs from prior silicon are fixed. All new features work as expected. PLL jitter is <2ns @100us at all divide/multiply settings. Power is reduced by ~50%. The new silicon works much better than expected with the improved PLL filter and new clock gating. At room temperature, the silicon runs at 390MHz and is barely warm to the touch, with the PLL now being the speed limiter, instead of the logic.
2019_08_19	One of the six new wafers exhibits frequent VIO-to-GND shorts in the 5-20 ohm range. ON Semi is looking into the cause. We know that the design is good, so we are anxious to see ON Semi resume yield testing on the other wafers, in order to get as many Amkor-packaged parts as soon as possible. The new P2 Eval board is ready to be built.
2019_08_29	ON Semi has done failure analysis on the new chips which were exhibiting VIO shorts and it's been determined that there are latch-up problems originating from differently-biased N-wells that lie adjacent to each other. The relatively low resistivity of the new wafers caused this latent design defect to emerge. We will need to modify the full-custom pad ring to fix these N-well problems. We will soon discuss with ON Semi how many reticles this is going to involve. We will need another fab run, as well, to realize the changes.
2019_09_13	ON Semi recently discovered that a voltage-stress test had been applied to the new silicon which was driving the VDD and VIO pins to +40% nominal voltages. The 4.62V on VIO was triggering the latch-up problem. The first two wafers which had been probed with this new test had developed many bad dies, as a result. ON Semi probed six remaining virgin wafers without the voltage-stress test and yielded over 1,000 good dies. These have been sent off to Amkor for packaging. From these chips, we will be able to build new P2 Eval boards and supply low volumes of chips. As for the latch-up problem, it was determined by ON Semi that latch-up was occurring as early as 4.3V on VIO. Rather than do a respin, we could lower the voltage-stress test from +40% to +25%, which would result in a peak VIO test voltage of 4.125V. Depending on what we see in the field with these new chips, we may do a respin to accommodate ON Semi's standard +40% voltage-stress test, or just lower the voltage-stress test to +25%. ON Semi's standard of +40% is quite exceptional and some other vendors only guarantee +20%. So, +25% may be just fine. We need to get the new silicon out to customers and see if anyone experiences any trouble with VIO-triggered latch-up. ON Semi is also going to run a standard latch-up test on the new silicon to ensure there is no other latent problem. The silicon has already passed ESD tests with 4kV human body model and 2kV machine model.
2019_10_16	We will be receiving about 1,000 Rev B P2 chips on 10/22. Our plan is to build 191 more P2 Eval boards and supply small quantities of P2 chips to interested customers.
2019_10_23	We received 1,000 Rev B chips. Aside from building 191 more P2 Eval boards, we will offer 125 packs of four P2 chips for $100 to interested customers. If anyone needs more than four chips, please contact Ken Gracey (kgracey@parallax.com).
2020_02_24	Received 10 Rev C chips which fix the adjacent-pin ADC crosstalk problem on prior revisions. Smart pin mode %100010_OHHHLLL no longer connects the ADC to the adjacent pin, but floats the ADC input. This mode is now useful for determining the floating bias point of the ADC. Several thousand Rev C chips will be arriving from ON Semi over the next two months.
2020_06_01	Received 7,000 Rev C chips from ON Semi.

KNOWN SILICON BUGS^[a]^[b]^[c]^[d]^[e]

Intervening ALTx/AUGS/AUGD instructions between SETQ/SETQ2 and RDLONG/WRLONG/WMLONG-PTRx instructions will cancel the special-case block-size PTRx deltas. The expected number of longs will transfer, but PTRx will only be modified according to normal PTRx expression behavior:

SETQ #16-1 'ready to load 16 longs

ALTD start_reg 'alter start register (ALTD cancels block-size PTRx deltas)

RDLONG 0,ptra++ 'ptra will only be incremented by 4 (1 long), not 16*4 as anticipated!!!

Intervening ALTx instructions with an immediate #S operand, between AUGS and the AUGS' intended target instruction (which would have an immediate #S operand), will use the AUGS value, but not cancel it. So, the intended AUGS target instruction will use and cancel the AUGS value, as expected, but the intervening ALTx instruction will also use the AUGS value (if it has an immediate #S operand). To avoid problems in these circumstances, use a register for the S operand of the ALTx instruction, and not an immediate #S operand.

AUGS #$FFFFF123 'This AUGS is intended for the ADD instruction.

ALTD index,#base 'Look out! AUGS will affect #base, too. Use a register, instead.

ADD 0-0,#$123 '#$123 will be augmented by the AUGS and cancel the AUGS.

OVERVIEW

The Propeller 2 is a microcontroller architecture consisting of 1, 2, 4, 8, or 16 identical 32-bit processors (called cogs), each with their own RAM, which connect to a common hub. The hub provides up to 1 MB of shared RAM, a CORDIC math solver, and housekeeping facilities. The architecture supports up to 64 smart I/O pins, each capable of many autonomous analog and digital functions.

The P2X8C4M64P silicon contains 8 cogs, 512 KB of hub RAM, and 64 smart I/O pins in an exposed-pad TQFP-100 package.

STARTING AND STOPPING COGS

COG RAM

DIRECT ACCESS

DUAL-PURPOSE REGISTERS

SPECIAL-PURPOSE REGISTERS

LOOKUP RAM

LOAD/STORE ACCESS

STREAMER ACCESS

BYTECODE EXECUTION LOOKUP TABLE

SMART PIN DATA SOURCE

RAM SHARING BETWEEN PAIRED COGS

BRANCH ADDRESSING

INSTRUCTION REPEATING

INSTRUCTION SKIPPING

Special SKIPF Branching Rules

Immediate ⇢ LUT ⇢ Pins/DACs

Immediate ⇢ Pins/DACs

RDFAST ⇢ LUT ⇢ Pins/DACs

RDFAST ⇢ Pins/DACs

RDFAST ⇢ RGB ⇢ Pins/DACs

Pins ⇢ DACs/WRFAST

ADCs/Pins ⇢ DACs/WRFAST

DDS/Goertzel

Digital Video Output (DVI/HDMI)

Configuring the Clock Generator

PLL Example

Write-Protecting the Last 16KB of Hub RAM and Enabling Debug Interrupts

Configuring the Digital Filters for Smart Pins

Seeding the Xoroshiro128** PRNG

Rebooting the Chip

HUB RAM

THE "EGG BEATER" INTERFACE

FAST SEQUENTIAL FIFO INTERFACE

RANDOM ACCESS INTERFACE

%00001..%00011 and not DAC_MODE = long repository

%00001 and DAC_MODE = DAC noise

%00010 and DAC_MODE = DAC 16-bit with pseudo-random dither

%00011 and DAC_MODE = DAC 16-bit with PWM dither

%00100 = pulse/cycle output

%00101 = transition output

%00110 = NCO frequency

%00111 = NCO duty

%01000 = PWM triangle

%01001 = PWM sawtooth

%01010 = PWM switch-mode power supply with voltage and current feedback

%01011 = A/B-input quadrature encoder

%01100 = Count A-input positive edges when B-input is high

%01101 = Accumulate A-input positive edges with B-input supplying increment (B=1) or decrement (B=0)

%01110 AND !Y[0] = Count A-input positive edges

%01110 AND Y[0] = Increment on A-input positive edge and decrement on B-input positive edge

%01111 AND !Y[0] = Count A-input highs

%01111 AND Y[0] = Increment on A-input high and decrement on B-input high

%10000 = Time A-input states

%10001 = Time A-input high states

%10010 AND !Y[2] = Time X A-input highs/rises/edges

%10010 AND Y[2] = Timeout on X clocks of missing A-input high/rise/edge

%10011 = For X periods, count time

%10100 = For X periods, count states

%10101 = For periods in X+ clock cycles, count time

%10110 = For periods in X+ clock cycles, count states

%10111 = For periods in X+ clock cycles, count periods

%11000 = ADC sample/filter/capture, internally clocked

%11001 = ADC sample/filter/capture, externally clocked

About SINC2 and SINC3 filtering

SINC2 Sampling Mode (%00)

SINC3 Filtering Mode (%10)

Bitstream Capturing Mode (%11)

%11010 = ADC Scope with Trigger

SCOPE Data Pipe

%11011 = USB host/device

%11100 = synchronous serial transmit

%11101 = synchronous serial receive

%11110 = asynchronous serial transmit

%11111 = asynchronous serial receive

BOOT PROCESS (needs more editing)

SERIAL LOADING PROTOCOL

Prop_Chk

Prop_Clk

PLL Example

Reset to Boot Clock Configuration

P2X	8C	4M	64P	ES
Propeller 2	8 cogs (processors)	4 Mb hub RAM (512 KB)	64 smart I/O pins	Engineering Sample

Each cog has:

Access to all I/O pins, plus four fast DAC output channels and four fast ADC input channels
512 longs of dual-port register RAM for code and fast variables
512 longs of dual-port lookup RAM for code, streamer lookup, and variables
Ability to execute code directly from register RAM, lookup RAM, and hub RAM
~350 unique instructions for math, logic, timing, and control operations
2-clock execution for all math and logic instructions, including 16 x 16 multiply
6-clock custom-bytecode executor for interpreted languages
Ability to stream hub RAM and/or lookup RAM to DACs and pins or HDMI modulator
Ability to stream pins and/or ADCs to hub RAM
Live colorspace conversion using a 3 x 3 matrix with 8-bit signed/unsigned coefficients
Pixel blending instructions for 8:8:8:8 data
16 unique event trackers that can be polled and waited upon
3 prioritized interrupts that trigger on selectable events
Hidden debug interrupt for single-stepping, breakpoint, and polling
8-level hardware stack for fastest subroutine calls/returns and push/pop operations
Carry and Zero flag

The hub provides the cogs with:

Up to 1 MB of contiguous RAM in a 20-bit address space (P2X8C4M64P contains 512 KB)

32-bits-per-clock sequential read/write for all cogs, simultaneously
readable and writable as bytes, words, or longs in little-endian format
last 16KB of RAM also appears at the end of the 1MB map and is write-protectable

32-bit, pipelined CORDIC solver with scale-factor correction

32-bit x 32-bit unsigned multiply with 64-bit result
64-bit / 32-bit unsigned divide with 32-bit quotient and 32-bit remainder
64-bit → 32-bit square root
Rotate (X32,Y32) by Theta32 → (X32,Y32)
(Rho32,Theta32) → (X32,Y32) polar-to-cartesian
(X32,Y32) → (Rho32,Theta32) cartesian-to-polar
32 → 5.27 unsigned-to-logarithm
5.27 → 32 logarithm-to-unsigned
Cogs can start CORDIC operations every 1/2/4/8/16 (#cogs) clocks and get results 55 clocks later

16 semaphore bits with atomic read-modify-write operations
64-bit free-running counter which increments every clock, cleared on reset
High-quality pseudo-random number generator (Xoroshiro128**), true-random seeded at start-up, updates every clock, provides unique data to each cog and pin
Mechanisms for starting, polling, and stopping cogs
16KB boot ROM

Loads into last 16 KB of hub RAM on boot-up
SPI loader for automatic startup from 8-pin flash or SD card
Serial loader for startup from host

Hex and Base64 download protocols
Terminal monitor invocable via "> " (greater than followed by a space) and then CTRL+D
TAQOZ Forth invocable via "> " (greater than followed by a space) and then ESC

Each smart I/O pin has the following functions:

8-bit, 120-ohm (3ns) and 1k-ohm DACs with 16-bit oversampling, noise, and high/low digital modes
Delta-sigma ADC with 5 ranges, 2 sources, and VIO/GIO calibration
Several ADC sampling modes: automatic 2n SINC2, adjustable SINC2/SINC3, oscilloscope
Logic, Schmitt, pin-to-pin-comparator, and 8-bit-level-comparator input modes
2/3/5/8-bit-unanimous input filtering with selectable sample rate
Incorporation of inputs from relative pins, -3 to +3
Negative or positive local feedback, with or without clocking
Separate drive modes for high and low output: logic / 1.5 k / 15 k / 150 k / 1 mA / 100 µA / 10 µA / float
Programmable 32-bit clock output, transition output, NCO/duty output
Triangle/sawtooth/SMPS PWM output, 16-bit frame with 16-bit prescaler
Quadrature decoding with 32-bit counter, both position and velocity modes
16 different 32-bit measurements involving one or two signals
USB full-speed and low-speed (via odd/even pin pairs)
Synchronous serial transmit and receive, 1 to 32 bits, up to clock/2 baud rate
Asynchronous serial transmit and receive, 1 to 32 bits, up to clock/3 baud rate

Six different clock modes, all under software control with glitch-free switching between sources:

Internal 20+ MHz RC oscillator, nominally 24 MHz, used as initial clock source
Crystal oscillator with internal loading caps for 7.5 pF/15 pF crystals, can feed PLL
Clock input, can feed PLL
Fractional PLL with 1..64 crystal divider --> 1..1024 VCO multiplier --> optional (1..15)*2 VCO post-divider
Internal ~20 kHz RC oscillator for low-power operation (130 µA)
Clock can be stopped for lowest power until reset (100 µA, due to leakage)

PIN DESCRIPTIONS

Pin Name

Direction

V(typ)

Description

TEST

Tied to ground

VDD

1.8

Core power

VSS

Ground

VIO_{x}_{y}

3.3

Power for smart pins {x} through {y}

GIO_{x}_{y}

Ground for smart pins {x} through {y} and other related circuits

P0-63

I/O

0 to 3.3

Smart pins

P58-P63

Boot source(s). See BOOT PROCESS.

Crystal Input. Can be connected to output of crystal/oscillator pack (with XO left disconnected), or to one leg of crystal (with XO connected to other leg of crystal or resonator) depending on CLK Register settings. No external resistors or capacitors are required.

Crystal Output. Provides feedback for an external crystal, or may be left disconnected depending on CLK Register settings. No external resistors or capacitors are required.

RESn

Reset (active low). When low, resets the Propeller chip: all cogs disabled and I/O pins floating. Propeller restarts 3 ms after RESn transitions from low to high.

MEMORIES

There are three memory regions: cog RAM, lookup RAM, and hub RAM. Each cog has its own cog RAM and lookup RAM, while the hub RAM is shared by all cogs.

Memory Region	Memory Width	Memory Depth	Instruction D/S Address Ranges	Program Counter Address Ranges
COG	32 bits	512	$000..$1FF	$00000..$001FF
LOOKUP	32 bits	512	$000..$1FF	$00200..$003FF
HUB	8 bits	1,048,576 (*)	$00000..$FFFFF	$00400..$FFFFF

(*) 1,048,576 bytes is the maximum size supported. However, some variants may have less available. See the Hub Memory section below for more details.

COGS

The Propeller contains multiple processors, called "cogs". Each cog has its own RAM and can start, stop, and execute instructions independently of one another. All active cogs share the same system clock, Hub RAM, and I/O pins.

Cogs employ a five-stage pipelined execution architecture. When the execution pipeline is full, each instruction effectively takes as little as two clock cycles to execute. If an instruction stalls for additional clock cycles, all following instructions in the pipeline are also stalled. Any instruction that is conditionally canceled will still move through the pipeline without stalling or executing, but taking two clock cycles. Branch instructions cause the pipeline to be flushed, so the first instruction following the branch will take at least five clock cycles.

The available instruction set can be found at Parallax Propeller 2 Instruction Set. When reading the "Encoding" column, the following table may help:

Key	Description
EEEE	Conditional test (see "Instruction Prefix" list at bottom of the instruction set spreadsheet)
C	0: Do not update the "C" register 1: Update the "C" register. In the instruction syntax, this is denoted by "WC" or "WCZ".
Z	0: Do not update the "Z" register 1: Update the "Z" register. In the instruction syntax, this is denoted by "WZ" or "WCZ".
I	0: Source field is a register address 1: Source field is a literal value. In the instruction syntax, this is denoted by the "#" character.
L	0: Destination field is a register address 1: Destination field is a literal value. In the instruction syntax, this is denoted by the "#" character.
R	0: 20-bit Address field is relative to current PC. 1: 20-bit Address field is absolute.
WW	Index of special register (PA, PB, PTRA, or PTRB) to write.
DDDDDDDDD	Destination field ^[f]
SSSSSSSSS	Source field
AAAAAAA...	20-bit Address field
nnnnnn...	23-bit augment number field
N,NN,NNN	Index number. This is only used for instructions with a third operand to specify word, byte, or nibble.
cccc	conditional test used to update C (%0000=clear, %1111=set, all others per EEEE)
zzzz	conditional test used to update Z (%0000=clear, %1111=set, all others per EEEE)

INSTRUCTION MODES

Cogs use 20-bit addresses for program counters (PC). This affords an execution space of up to 1MB. Depending on the value of a cog's PC, an instruction will be fetched from either its register RAM, its lookup RAM, or the hub RAM.

PC Address	Instruction Source	Memory Width	PC Increment
$00000..$001FF	cog register RAM	32 bits	1
$00200..$003FF	cog lookup RAM	32 bits	1
$00400..$FFFFF	hub RAM	8 bits	4

REGISTER EXECUTION

When the PC is in the range of $00000 and $001FF, the cog is fetching instructions from cog register RAM. This is commonly referred to as "cog execution mode." There is no special consideration when taking branches to a cog register address.

LOOKUP EXECUTION

When the PC is in the range of $00200 and $003FF, the cog is fetching instructions from cog lookup RAM. This is commonly referred to as "LUT execution mode." There is no special consideration when taking branches to a cog lookup address,

HUB EXECUTION

When the PC is in the range of $00400 and $FFFFF, the cog is fetching instructions from hub RAM. This is commonly referred to as "hub execution mode." When executing from hub RAM, the cog employs the FIFO hardware to spool up instructions so that a stream of instructions will be available for continuous execution. Branching to a hub address takes a minimum of 13 clock cycles. If the instruction being branched to is not long-aligned, one additional clock cycle is required. A branch must occur to get from cog to hub, since rolling from $3FF to $400 will not initiate hub execution.

While in hub execution mode, the FIFO cannot be used for anything else. So, during hub execution these instructions cannot be used:

RDFAST / WRFAST / FBLOCK

RFBYTE / RFWORD / RFLONG / RFVAR / RFVARS

WFBYTE / WFWORD / WFLONG

XINIT / XZERO / XCONT - when the streamer mode engages the FIFO

It is not possible to execute code from hub addresses $00000 through $003FF, as the cog will instead read instructions from the cog register or lookup RAM as indicated above.

STARTING AND STOPPING COGS

Any cog can start or stop any other cog, or restart or stop itself. Each of the eight cogs has a unique three-bit ID which can be used to start or stop it. It's also possible to start free (stopped or never started) cogs, without needing to know their ID's. This way, entire applications can be written which simply start free cogs, as needed, and as those cogs retire by stopping themselves or getting stopped by others, they return to the pool of free cogs and become available, again, for restarting.

The COGINIT instruction is used to start cogs:

COGINIT D/#,S/# {WC}

D/# = %0_x_xxxx The target cog loads its own registers $000..$1F7 from the hub,

starting at address S/#, then begins execution at register address $000.

%1_x_xxxx The target cog begins execution at register/LUT/hub address S/#.

%x_0_CCCC The target cog's ID is %CCCC.

%x_1_xxx0 If a cog is free (stopped), then start it.

To know if this succeeded, D must be a register and WC must be

used. If successful, C will be cleared and D will be over-

written with the target cog's ID. Otherwise, C will be set and D will be overwritten with $F.

%x_1_xxx1 If an even/odd cog pair is free (stopped), then start them.

To know if this succeeded, D must be a register and WC must be

used. If successful, C will be cleared and D will be over-

written with the even/lower target cog's ID. Otherwise, C will be set

and D will be overwritten with $F.

S/# = address This value is either the hub address from which the target cog will

load from, or it is the cog/hub address from which the target cog

will begin executing at, depending on D[5]. This 32-bit value will be

written into the target cog's PTRB register.

If COGINIT is preceded by SETQ, the SETQ value will be written into the target cog's PTRA register. This is intended as a convenient means of pointing the target cog's program to some runtime data structure or passing it a 32-bit parameter. If no SETQ is used, the target cog's PTRA register will be cleared to zero.

COGINIT #1,#$100 'load and start cog 1 from $100

COGINIT #%1_0_0101,PTRA 'start cog 5 at PTRA

SETQ ptra_val 'ptra_val will go into target cog's PTRA register

COGINIT #%0_1_0000,addr 'load and start a free cog at addr

COGINIT #%1_1_0001,addr 'start a pair of free cogs at addr (lookup RAM sharing)

COGINIT id,addr WC '(id=$30) start a free cog at addr, C=0 and id=cog if okay

COGID myID 'reload and restart me at PTRB

COGINIT myID,PTRB

The COGSTOP instruction is used to stop cogs. The 4 LSB's of the D/# operand supply the target cog ID.

COGSTOP #0 'stop cog 0

COGID myID 'stop me

COGSTOP myID

A cog can discover its own ID by doing a COGID instruction, which will return its ID into D[3:0], with upper bits cleared. This is useful, in case the cog wants to restart or stop itself, as shown above.

If COGID is used with WC, it will not overwrite D, but will return the status of cog D/# into C, where C=0 indicates the cog is free (stopped or never started) and C=1 indicates the cog is busy (started).

COGID ThatCog WC 'C=1 if ThatCog is busy

COG RAM

Each cog has a primary 512 x 32-bit dual-port RAM, which can be used in multiple ways:

Direct/Register access
As a source of program instructions (see COGS > INSTRUCTION MODES > REGISTER EXECUTION)

GENERAL PURPOSE REGISTERS

RAM registers $000 through $1EF are general-purpose registers for code and data usage.

DUAL-PURPOSE REGISTERS

RAM registers $1F0 through $1F7 may either be used as general-purpose registers, or may be used as special-purpose registers if their associated functions are enabled.

$1F0 RAM / IJMP3 interrupt call address for INT3

$1F1 RAM / IRET3 interrupt return address for INT3

$1F2 RAM / IJMP2 interrupt call address for INT2

$1F3 RAM / IRET2 interrupt return address for INT2

$1F4 RAM / IJMP1 interrupt call address for INT1

$1F5 RAM / IRET1 interrupt return address for INT1

$1F6 RAM / PA CALLD-imm return, CALLPA parameter, or LOC address

$1F7 RAM / PB CALLD-imm return, CALLPB parameter, or LOC address

SPECIAL-PURPOSE REGISTERS

Each cog contains 8 special-purpose registers that are mapped into the RAM register address space from $1F8 to $1FF. In general, when specifying an address between $1F8 and $1FF, the instruction is accessing a special-purpose register, not just the underlying RAM.

$1F8 PTRA pointer A to hub RAM

$1F9 PTRB pointer B to hub RAM

$1FA DIRA output enables for P31..P0

$1FB DIRB output enables for P63..P32

$1FC OUTA output states for P31..P0

$1FD OUTB output states for P63..P32

$1FE INA * input states for P31..P0

$1FF INB ** input states for P63..P32

* also debug interrupt call address

** also debug interrupt return address

LOOKUP RAM

Each cog has a secondary 512 x 32-bit dual-port RAM, which can be used in multiple ways:

Load/Store access
As a source or destination for the streamer hardware
As a lookup table for bytecode execution
As a data source for smart pins
As a "RAM sharing" mechanism between paired cogs
As a source of program instructions (see COGS > INSTRUCTION MODES > LOOKUP EXECUTION)

NOTE: The term "lookup" (and "LUT", which is short for "look-up table") is due to historical usage in the original Propeller microcontroller. This RAM can still be used in a "lookup" context, but can also be used for many other purposes, as indicated above.

LOAD/STORE ACCESS

Unlike cog RAM, the cog cannot directly use the lookup RAM in the majority of its instructions. Instead, lookup RAM must be read into cog RAM using the RDLUT instruction and cog RAM must be written into the lookup RAM using the WRLUT instruction. In other hardware architectures, these instructions would be synonymous with "LOAD" and "STORE" instructions, respectively. When using the RDLUT and WRLUT instructions, the 32-bit words are addressible from $000 to $1FF.

STREAMER ACCESS

(to be completed.)

BYTECODE EXECUTION LOOKUP TABLE

(to be completed.)

RAM SHARING BETWEEN PAIRED COGS

Adjacent cogs whose ID numbers differ by only the LSB (cogs 0 and 1, 2 and 3, 4 and 5, etc.) can each allow their lookup RAMs to be written by the other cog via its local lookup RAM writes. This allows adjacent cogs to share data very quickly through their lookup RAMs.

The 'SETLUTS D/#' instruction is used to enable the lookup RAM to receive writes from the adjacent cog:

SETLUTS #0 'disallow writes from other cog (default)

SETLUTS #1 'allow writes from other cog

Lookup-RAM writes from the adjacent cog are implemented on the 2nd port of the lookup RAM. The 2nd port is also shared by the streamer in DDS/LUT modes. If an external write occurs on the same clock as a streamer read, the external write gets priority. It is not intended that external writes would be enabled at the same time the streamer is in DDS/LUT mode.

In order to find and start two adjacent cogs with which this write-sharing scheme can be used, the COGINIT instruction has a mechanism for finding an even/odd pair and then starting them both with the same parameters. It will be necessary for the program to differentiate between even and odd cogs and possibly restart one, or both, with the final, intended program. To have COGINIT find and start two adjacent cogs, use %x_1_xxx1 for the D/# operand.

To facilitate handshaking between cogs sharing lookup RAM, the SETSE1...4 instructions can be used to set up lookup RAM read and write events.

REGISTER INDIRECTION

Cog registers can be accessed indirectly most easily by using the ALTS/ALTD/ALTR instructions. These instructions sum their D[8:0] and S/#[8:0] values to compute an address that is directly substituted into the next instruction's S field, D field, or result register address (normally, this is the same as the D field). This all happens within the pipeline and does not affect the actual program code. The idea is that S/# can serve as a register base address and D can be used as an index.

Additionally, S[17:9] is always sign-extended and added to the D register for index updating. Normally, a nine-bit #address will be used for S, causing S[17:9] to be zero, so that D is unaffected:

ALTS index^[g],#table 'set next S field to table+index

MOV OUTA,0 'output register[table+index] to OUTA

ALTD index,#table 'set next D field to table+index

MOV 0,INA 'write INA to register[table+index]

ALTR index,#table 'set next write to table+index

XOR INA,INB 'write INA^INB to register[table+index]

For cases where base+index is not required, and a register holds the desired address, the S/# field can be omitted and it will be set to '#0' by the assembler:

ALTS pointer 'set next S field to pointer

MOV OUTA,0 'output register[pointer] to OUTA

ALTD pointer 'set next D field to pointer

MOV 0,INA 'write INA to register[pointer]

ALTR pointer 'set next write to pointer

XOR INA,INB 'write INA^INB to register[pointer]

For accessing bit fields that span multiple registers, there is the ALTB instruction which sums D[13:5] and S/#[8:0] values to compute an address which is substituted into the next instruction's D field. It can be used with and without S/#:

ALTB bitindex,#base 'set next D field to base+bitindex[13:5]

BITC 0,bitindex 'write C to bit[bitindex[4:0^[h]]]

ALTB bitindex 'set next D field to bitindex[13:5]

TESTB 0,bitindex WC 'read bit[bitindex[4:0]] into C

There are also ALTxx instructions for facilitating nibble (4-bit), byte (8-bit), and word (16-bit) sub-addressing of registers. They modify either the S or D field, as well as the N field of their associated and subsequent nibble, byte, or word instruction. Like the other ALTx instructions, they can be used with or without S/#. Note that the associated nibble, byte, or word instruction can be a shortened-syntax alias of the full instruction, since two of its three fields will be filled in by the ALTxx instruction.

Nibble addressing:

ALTSN index,#base 'set next D field to base+index[11:3], next N to index[2:0]

SETNIB value 'set nibble to value ('SETNIB S/#' = 'SETNIB 0,S/#,#0')

ALTGN index,#base 'set next S field to base+index[11:3], next N to index[2:0]

GETNIB value 'get nibble into value ('GETNIB D' = 'GETNIB D,0,#0')

ALTGN index,#base 'set next S field to base+index[11:3], next N to index[2:0]

ROLNIB value 'ROL nibble into value ('ROLNIB D' = 'ROLNIB D,0,#0')

Byte addressing:

ALTSB index,#base 'set next D field to base+index[10:2], next N to index[1:0]

SETBYTE value 'set byte to value ('SETBYTE S/#' = 'SETBYTE 0,S/#,#0')

ALTGB index,#base 'set next S field to base+index[10:2], next N to index[1:0]

GETBYTE value 'get byte into value ('GETBYTE D' = 'GETBYTE D,0,#0')

ALTGB index,#base 'set next S field to base+index[10:2], next N to index[1:0]

ROLBYTE value 'ROL byte into value ('ROLBYTE D' = 'ROLBYTE D,0,#0')

Word addressing:

ALTSW index,#base 'set next D field to base+index[9:1], next N to index[0]

SETWORD value 'set word to value ('SETWORD S/#' = 'SETWORD 0,S/#,#0')

ALTGW index,#base 'set next S field to base+index[9:1], next N to index[0]

GETWORD value 'get word into value ('GETWORD D' = 'GETWORD D,0,#0')

ALTGW index,#base 'set next S field to base+index[9:1], next N to index[0]

ROLWORD value 'ROL word into value ('ROLWORD D' = 'ROLWORD D,0,#0')

For more complex S field, D field, and result register substitutions, there is the ALTI instruction. ALTI actually does a few different things. First, ALTI can be used to individually increment or decrement three different nine-bit fields within a register. Second, ALTI can substitute each of those fields (before incrementing or decrementing) into the next instruction's S field, D field, or result register address, in the same way ALTS, ALTD, and ALTR do. Lastly, ALTI can substitute D[31..18] into the next instruction's upper bits [31..18] to enable full instruction substitution with a register's contents.

ALTI D,S/# 'modify D and/or next instruction's fields according to S/#

S/# = %rrr_ddd_sss_RRR_DDD_SSS

%rrr Result register field D[27..19] increment/decrement masking

%ddd D register field D[17..9] increment/decrement masking

%sss S register field D[8..0] increment/decrement masking

%rrr/%ddd/%sss:

000 = 9 bits increment/decrement (default, full span)

001 = 8 LSBs increment/decrement (256-register looped buffer)

010 = 7 LSBs increment/decrement (128-register looped buffer)

011 = 6 LSBs increment/decrement (64-register looped buffer)

100 = 5 LSBs increment/decrement (32-register looped buffer)

101 = 4 LSBs increment/decrement (16-register looped buffer)

110 = 3 LSBs increment/decrement (8-register looped buffer)

111 = 2 LSBs increment/decrement (4-register looped buffer)

%RRR result register / instruction modification:

000 = D[27..19] stays same, no result register substitution

001 = D[27..19] stays same, but result register writing is canceled

010 = D[27..19] decrements per %rrr, no result register substitution

011 = D[27..19] increments per %rrr, no result register substitution

100 = D[27..19] sets next instruction's result register, stays same

101 = D[31..18] substitutes into next instruction's [31..18] (execute D)

110 = D[27..19] sets next instruction's result register, decrements per %rrr

111 = D[27..19] sets next instruction's result register, increments per %rrr

%DDD D field modification:

x0x = D[17..9] stays same

x10 = D[17..9] decrements per %ddd

x11 = D[17..9] increments per %ddd

0xx = no D field substitution

1xx = D[17..9] substitutes into next instruction's D field [17..9]

%SSS S field modification:

x0x = D[8..0] stays same

x10 = D[8..0] decrements per %sss

x11 = D[8..0] increments per %sss

0xx = no S field substitution

1xx = D[8..0] substitutes into next instruction's S field [8..0]

Here are some examples of ALTI usage:

ALTI ptrs,#%111_111 'set next D and S fields, increment ptrs[17:9] and ptrs[8:0]

ADD 0,0 'add registers

ALTI inst,#%101_100_100 'execute inst (same as 'ALTI inst')

NOP 'NOP becomes inst

The SETS/SETD/SETR instructions allow you to write the S field, D field and instruction field of a register without affecting other bits. They copy the lower 9 bits of S/# into their respective 9-bit field within D. These instructions are useful for establishing the fields that will be used by ALTI:

SETS D,S/# 'set D[8:0] to S/#[8:0]

SETD D,S/# 'set D[17:9] to S/#[8:0]

SETR D,S/# 'set D[27:19] to S/#[8:0]

SETS/SETD/SETR can also be used in self-modifying cog-register code. After modifying a cog register, It is necessary to elapse two instructions before executing the modified register, due to pipelining:^[i]

SETR inst,op 'set register[27:19] to op[8:0]

NOP 'first spacer instruction, could be anything

NOP 'second spacer instruction, could be anything

inst MOV x,y 'operate on x using y, MOV can become AND/OR/XOR/etc.

BRANCH ADDRESSING

The following are branch instructions which use D[19:0] as an absolute address:

EEEE 1101011 CZ0 DDDDDDDDD 000101100 JMP D

EEEE 1101011 CZ0 DDDDDDDDD 000101101 CALL D

EEEE 1101011 CZ0 DDDDDDDDD 000101110 CALLA D

EEEE 1101011 CZ0 DDDDDDDDD 000101111 CALLB D

The JMPREL instruction uses D as a relative address that steps whole instructions. In cog mode, D[19:0] is added to the program counter and in hub mode, D[17:0] << 2 is added to the program counter. This instruction is unique in its ability to make a relative jump (as opposed to an absolute jump) based on a register value. If #D is used, the relative address will be a positive 9-bit value:

EEEE 1101011 00L DDDDDDDDD 000110000 JMPREL {#}D

These next branch instructions use S[19:0] as an absolute address, or, if S is immediate, they sign-extend the 9-bit S field and use that value as a relative address that steps whole instructions (in hub mode, the value gets shifted left two bits before being added to the program counter). This means that their immediate range is -256 to +255 instructions, relative to the instruction following the branch:

EEEE 1011010 0LI DDDDDDDDD SSSSSSSSS CALLPA {#}D,{#}S

EEEE 1011010 1LI DDDDDDDDD SSSSSSSSS CALLPB {#}D,{#}S

EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,{#}S

EEEE 1011011 00I DDDDDDDDD SSSSSSSSS DJZ D,{#}S

EEEE 1011011 01I DDDDDDDDD SSSSSSSSS DJNZ D,{#}S

EEEE 1011011 10I DDDDDDDDD SSSSSSSSS DJF D,{#}S

EEEE 1011011 11I DDDDDDDDD SSSSSSSSS DJNF D,{#}S

EEEE 1011100 00I DDDDDDDDD SSSSSSSSS IJZ D,{#}S

EEEE 1011100 01I DDDDDDDDD SSSSSSSSS IJNZ D,{#}S

EEEE 1011100 10I DDDDDDDDD SSSSSSSSS TJZ D,{#}S

EEEE 1011100 11I DDDDDDDDD SSSSSSSSS TJNZ D,{#}S

EEEE 1011101 00I DDDDDDDDD SSSSSSSSS TJF D,{#}S

EEEE 1011101 01I DDDDDDDDD SSSSSSSSS TJNF D,{#}S

EEEE 1011101 10I DDDDDDDDD SSSSSSSSS TJS D,{#}S

EEEE 1011101 11I DDDDDDDDD SSSSSSSSS TJNS D,{#}S

EEEE 1011110 00I DDDDDDDDD SSSSSSSSS TJV D,{#}S

EEEE 1011110 01I 00000VVVV SSSSSSSSS Jevent {#}S

EEEE 1011110 01I 00001VVVV SSSSSSSSS JNevent {#}S

There are five branch instructions and one 'locate' instruction which involve 20-bit immediate addresses. Their addresses can be either relative to the program counter (R=1) or absolute (R=0):

EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #{\}A

EEEE 1101101 RAA AAAAAAAAA AAAAAAAAA CALL #{\}A

EEEE 1101110 RAA AAAAAAAAA AAAAAAAAA CALLA #{\}A

EEEE 1101111 RAA AAAAAAAAA AAAAAAAAA CALLB #{\}A

EEEE 11100WW RAA AAAAAAAAA AAAAAAAAA CALLD PA/PB/PTRA/PTRB,#{\}A

EEEE 11101WW RAA AAAAAAAAA AAAAAAAAA LOC PA/PB/PTRA/PTRB,#{\}A

Relative addressing is convenient for relocatable code, or code which can run from either cog RAM or hub RAM. Relative addressing is the default when cog code references cog labels or hub code references hub labels. On the other hand, absolute addressing is highly recommended, and forced by the assembler, when crossing between cog and hub domains.

Absolute addressing can be forced by the use of "\" after the "#".

The "@" operator can be used before an address label to return the hub address of that label, in case it was defined under an ORG directive to generate cog code, and the label would normally return the cog address..

The cases below illustrate use of the 20-bit immediate-address instructions and "\" and "@":

ORGH $01000

ORG 0 'cog code

cog JMP #cog '$FD9FFFFC cog to cog, relative

JMP #\cog '$FD800000 cog to cog, force absolute

JMP #@cog '$FD801000 cog to hub, always absolute

JMP #\@cog '$FD801000 cog to hub, always absolute

JMP #hub '$FD802000 cog to hub, always absolute

JMP #\hub '$FD802000 cog to hub, always absolute

JMP #@hub '$FD802000 cog to hub, always absolute

JMP #\@hub '$FD802000 cog to hub, always absolute

ORGH $02000 'hub code

hub JMP #cog '$FD800000 hub to cog, always absolute

JMP #\cog '$FD800000 hub to cog, always absolute

JMP #@cog '$FD9FEFF4 hub to hub, relative

JMP #\@cog '$FD801000 hub to hub, force absolute

JMP #hub '$FD9FFFEC hub to hub, relative

JMP #\hub '$FD802000 hub to hub, force absolute

JMP #@hub '$FD9FFFE4 hub to hub, relative

JMP #\@hub '$FD802000 hub to hub, force absolute

INSTRUCTION REPEATING

Single or multiple instructions can be repeated without branching delays in cog/LUT memory using the REP instruction:

REP {#}D,{#}S 'execute {#}D[8:0] instructions {#}S[31:0] times

If D[8:0] = 0, nothing will be repeated. If D[8:0] > 0 and S[31:0] = 0 then D[8:0] instructions will be repeated indefinitely.

By changing the #1000 to #0, the DRVNOT instruction would be repeated indefinitely:

REP #1,##1000 'toggle pin 0 1000 times (1 instruction x 1000)

DRVNOT #0 'output and toggle pin 0 (2 clocks per toggle)

In cases where you'd rather have the assembler keep track of the number of instructions, @label can be used:

REP @.end,reps 'repeat instruction block 'reps' times

WFBYTE x 'write x to next byte in hub

ADD x,#1 'increment x

.end

REP works in hub memory, as well, but executes a hidden jump to get back to the top of the repeated instructions.

Any branch within the repeating instruction block will cancel REP activity. Interrupts will be ignored during REP looping.

INSTRUCTION SKIPPING

Cogs can initiate skipping sequences to selectively skip any of the next 32 instructions encountered. Skipping is accomplished by either canceling instructions as they come through the pipeline from hub or cog/LUT memory (effectively turning them into 2-clock NOP instructions) or by leaping over them in cog/LUT memory (no clock penalty). Skipping only works outside of interrupt service routines; i.e. in main code.

There are three instructions that initiate skipping:

SKIP {#}D 'skip by cancelling instructions sequentially per D[0]..D[31]

SKIPF {#}D 'like SKIP, but fast due to PC steps of 1..8 - cog/LUT only!

EXECF {#}D 'jump to D[9:0] in cog/LUT and initiate SKIPF using D[31:10]

In each case, D provides a bit pattern which is used LSB-first to determine whether the next instruction is cancelled/skipped (bit=1) or executed (bit=0). The D bit pattern is initially captured and subsequently shifted right by one bit for each instruction encountered.

Within a skipping sequence, a CALL/CALLPA/CALLPB that is not skipped will execute all its nested subroutines normally, with the skipping sequence resuming after the returning RET/_RET_^[j]. This allows subroutines to be skipped or entirely executed without affecting the top-level skip sequence. As well, an interrupt service routine will execute normally during a skipping sequence, with the skipping sequence resuming upon its completion.

While SKIP-initiated skipping can take place in both hub and cog/LUT memory, SKIPF-initiated and EXECF-initiated skipping can only take place in cog/LUT memory. This is because the PC can be randomly stepped in cog/LUT memory, whereas the hub memory FIFO can only provide the next instruction, unless a full branch takes place, triggering a FIFO reload.

Here is a simplistic example of SKIP:

SKIP #%010110 'initiate skip sequence (skip 2nd, 3rd, 5th instruction)

DRVN #0 'drive and invert pin 0 (executes)

DRVN #1 'drive and invert pin 1 (NOP)

DRVN #2 'drive and invert pin 2 (NOP)

DRVN #3 'drive and invert pin 3 (executes)

DRVN #4 'drive and invert pin 4 (NOP)

DRVN #5 'drive and invert pin 5 (executes)

Skipping is very useful for getting increased functionality out of an otherwise-static sequence of instructions. Consider this sequence, which contains all the instructions needed to realize 36 different address calculations:

addr RFBYTE m 'offset - one of these three (3 possibilities)

RFWORD m

RFLONG m

ADD m,pbase 'base - one of these three (3 possibilities)

ADD m,vbase

ADD m,dbase

SHL i,#1 'index - zero to two of these three (4 possibilities)

SHL i,#2

ADD m,i

In the above sequence, the intention is to compute an address using an offset, a base, and an optional index. There are 3 x 3 x 4, or 36, useful permutations. If you wanted to use a byte offset, pbase, and a long index, you would want to execute only these four instructions from the 'addr' sequence:

RFBYTE m 'offset

ADD m,pbase 'base

SHL i,#2 'index

ADD m,i

The skip pattern for just those four instructions would be %001_110_110. Assuming 'pat' holds that pattern, here is what the execution would look like using SKIP. Note that the 'addr' instruction sequence, shown above, follows the SKIP instruction and skipped instructions in the 'addr' sequence are now shown as NOPs:

SKIP pat 'initiate skip sequence (%001_110_110 in this case)

addr RFBYTE m 'offset

NOP

ADD m,pbase 'base

NOP

NOP 'index

SHL i,#2

ADD m,i

If this code were located in cog/LUT memory, SKIPF could be used to speed things up by stepping over skipped instructions, instead of canceling them in the pipeline. Here is what the execution would look like using SKIPF:

SKIPF pat 'initiate skip sequence (%001_110_110 in this case)

addr RFBYTE m 'offset

ADD m,pbase 'base

SHL i,#2 'index

ADD m,i

Now things are very efficient, with no cycles being wasted on NOPs. If SKIPF is used in hub exec, it will revert to SKIP behavior, canceling instructions in the pipeline, instead of stepping over them.

Both SKIP and SKIPF can be preceded by _RET_ for an automatic branch before skipping commences:

PUSH #addr 'point to the addr routine

_RET_ SKIPF pat 'jump to addr and begin skipping fast using pat

The EXECF instruction performs a JMP and a SKIPF at the same time, getting a 10-bit branch address from D[9:0] and a 22-bit skip pattern from D[31:10]. Here is the heart of a simple bytecode interpreter which uses EXECF:

REP #1,#8 'pre-stuff 8-level hardware stack with 'loop' address

PUSH #loop 'all RETs without CALLs will branch to 'loop'

loop RFBYTE i 'get a bytecode

RDLUT e,i 'lookup long in LUT

EXECF e 'jump to e[9:0] and SKIPF e[31:10], RETs branch to 'loop'

That bytecode interpreter takes only 2+3+4, or 9, clocks to get the next bytecode, look it up, then execute that bytecode's routine in cog/LUT memory with a custom 22-bit SKIPF pattern. If that bytecode's routine is just a 2-clock instruction preceded by a _RET_, it will take 4 clocks, due to the _RET_, for a total of 13 clocks, looping. Those 13 clocks can be reduced to only 8 clocks by using XBYTE, which is explained in the next section.

While SKIPF and EXECF normally step over skipped instructions in cog/LUT memory, there are some circumstances where they must cancel an instruction, instead, since it is already in the pipeline:

The first instruction is being skipped after the SKIPF/EXECF instruction (the LSB of the skip pattern is '1')
The 8th instruction in a row is being skipped (only 7 instructions can be stepped over at once)

Each of these cancellations results in a 2-clock NOP instruction.

SKIP is fully compatible with REP, since SKIP only cancels instructions, allowing REP to maintain accurate instruction counts.

SKIPF would only work with REP if all SKIPF patterns resulted in the same instruction counts, which REP would have to be initiated with, as opposed to just length-of-code.

Special SKIPF Branching Rules

Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate (#) branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.

For non-CALL\CALLPA\CALLPB branches within SKIPF sequences, SKIPF will work through all immediate-relative branches, which are the default for immediate branches within cog/LUT memory. If an absolute-address branch is being used (#\label, register, or RET, for example), you must not skip the first instruction after the branch. This is not a problem with immediate-relative branches, however, since the variable PC stepping works to advantage, by landing the PC at the first instruction of interest at, or beyond, the branch address.

BYTECODE EXECUTION (XBYTE)

Cogs can execute custom bytecodes from hub RAM using XBYTE. XBYTE is like a phantom instruction and it executes on a hardware stack return (RET/_RET_) to $1FF. Such a return does not pop the stack, so that each additional RET/_RET_ causes another bytecode to be fetched and executed. This process has a total overhead of only 6 clocks, excluding the bytecode routine. The bytecode routine could be as short as a single 2-clock instruction with a _RET_ prefix, making the total XBYTE loop take only 8 clocks.

XBYTE performs the following steps to make a complete bytecode executor:

Clock	Phase	XBYTE Activity	Description
1	go	RFBYTE bytecode SKIPF #0	Last clock of the RET/_RET_ to $1FF Fetch bytecode from FIFO (initialized via prior RDFAST). Cancel any SKIPF pattern in progress (from prior bytecode).
2	get	MOV PA,bytecode RDLUT (per bytecode)	1st clock of 1st canceled instruction Write bytecode to PA ($1F6). Read lookup-table RAM according to bytecode and mode.
3	go	RDLUT (data → D)	2nd clock of 1st canceled instruction Get lookup RAM long into D for EXECF.
4	get	EXECF D (begin)	1st clock of 2nd canceled instruction Execute EXECF.
5	go	MOV PB,(GETPTR) MODCZ bit1,bit0 {WCZ} EXECF D (branch)	2nd clock of 2nd canceled instruction Write FIFO pointer to PB ($1F7). Write C,Z with bit1,bit0 of RDLUT address, if enabled. Do EXECF branch.
6	get	flush pipeline	1st clock of 3rd canceled instruction
7	go	reload pipeline	2nd clock of 3rd canceled instruction
8	get	<none>	1st clock of 1st instruction of bytecode routine Loop to clock 1 if _RET_ or RET

The bytecode translation table in LUT memory must consist of long data which EXECF would use, where the 10 LSBs are an address to jump to in cog/LUT RAM and the 22 MSBs are a SKIPF pattern to be applied.

Starting XBYTE and establishing its operating mode is done all at once by a '_RET_ SETQ {#}D' instruction, with the top of the hardware stack holding $1FF.

Additional '_RET_ SETQ {#}D' instructions can be executed to alter the XBYTE mode for subsequent bytecodes.

To alter the XBYTE mode for the next bytecode, only, a '_RET_ SETQ2 {#}D' instruction can be executed. This is useful for engaging singular bytecodes from alternate sets, without having to restore the original XBYTE mode afterwards .

Bits	SETQ/SETQ2 {#}D value	LUT base address	LUT index b = bytecode	LUT EXECF address
8	%A000000xF	%A00000000	I = b[7:0]	AIIIIIIII
8	%ABBBB00xF %BBBB > 0	%A00000000	if b[7:4] < %BBBB then I = b[7:0] if b[7:4] >= %BBBB then I = b[7:4] - %BBBB	%AIIIIIIII %ABBBBIIII
7	%AAxx0010F	%AA0000000	I = b[6:0]	%AAIIIIIII
7	%AAxx0011F	%AA0000000	I = b[7:1]	%AAIIIIIII
6	%AAAx1010F	%AAA000000	I = b[5:0]	%AAAIIIIII
6	%AAAx1011F	%AAA000000	I = b[7:2]	%AAAIIIIII
5	%AAAAx100F	%AAAA00000	I = b[4:0]	%AAAAIIIII
5	%AAAAx101F	%AAAA00000	I = b[7:3]	%AAAAIIIII
4	%AAAAA110F	%AAAAA0000	I = b[3:0]	%AAAAAIIII
4	%AAAAA111F	%AAAAA0000	I = b[7:4]	%AAAAAIIII

The %ABBBB00xF setting allows sets of 16 bytecodes, which would use identical LUT values, to be represented by a single LUT value, effectively compressing blocks of 16 LUT values into single LUT values. This is useful when the bytecode, which is always written to PA, is used as an operand within the bytecode routine.

The %F bit of the SETQ/SETQ2 {#}D value enables C and Z to receive bits 1 and 0 of the index field of the bytecode. This is useful for having the flags differentiate behavior within a bytecode routine, especially in cases of conditional looping, where a SKIPF pattern would have been insufficient, on its own:

SETQ/SETQ2

{#}D value

Flag Writing

%xxxxxxxx0

Do not affect flags on XBYTE

%xxxxxxxx1

Write the bytecode's index LSBs to C and Z

To start executing bytecodes, use the following instruction sequence, but with the appropriate SETQ operand:

PUSH #$1FF 'push #$1FF onto the hardware stack

_RET_ SETQ #$100 '256-long EXECF table at LUT $100, start XBYTE

con _clkfreq = 10_000_000

' ** XBYTE Demo **

' Automatically executes bytecodes via RET/_RET_ to $1FF.

' Overhead is 6 clocks, including _RET_ at the end of each bytecode routine.

dat org

asmclk 'set clock up

setq2 #$FF 'load bytecode table into LUT $100..$1FF

rdlong $100,#bytetable

rdfast #0,#bytecodes 'init fifo read at start of bytecodes

push #$1FF 'push $1FF for xbyte

_ret_ setq #$100 'start xbyte with LUT base = $100, no stack pop

' Bytecode routines

r0 _ret_ drvnot #0 'toggle pin 0

r1 _ret_ drvnot #1 'toggle pin 1

r2 _ret_ drvnot #2 'toggle pin 2

r3 _ret_ drvnot #3 'toggle pin 3

r4 rfvars pa 'get offset

add pb,pa 'add offset

_ret_ rdfast #0,pb 'init fifo read at new address

' Bytecodes that form the XBYTE program in hub

orgh

bytecodes byte 0 'toggle pin 0

byte 1 'toggle pin 1

byte 2 'toggle pin 2

byte 3 'toggle pin 3

byte 4,(bytecodes-$) & $7F 'relative branch, loop to bytecodes

' Bytecode EXECF data, moved into lut $100..$1FF (no SKIPF patterns are used in this example)

bytetable long r0 '#0 toggle pin 0

long r1 '#1 toggle pin 1

long r2 '#2 toggle pin 2

long r3 '#3 toggle pin 3

long r4 '#4 relative branch

{

clock phase hidden description

-------------------------------------------------------------------------------------------------

1 go RFBYTE byte last clock of instruction which is executing a

RET/_RET_ to $1FF

2 get RDLUT @byte, write byte to PA 1st clock of 1st canceled instruction

3 go LUT long --> next D 2nd clock of 1st canceled instruction

4 get EXECF D, 1st clock of 2nd canceled instruction

5 go EXECF D, write GETPTR to PB 2nd clock of 2nd canceled instruction

6 get flush pipe 1st clock of 3rd canceled instruction

7 go flush pipe 2nd clock of 3rd canceled instruction

8 get 1st clock of 1st instruction of bytecode routine,

loop to (clock) 1 if _RET_

}

While developing XBYTE code, you may want to single-step the bytecode execution, in order to inspect what is happening. To do this, you must simulate normal XBYTE operation using a small program. Below is an example of how to do this for the simplest case of the full-8-bit mode which doesn't write the LSBs of the LUT address to C and Z.

' Normal XBYTE or single-step bytecode executor (must run from registers or LUT)

rdfast #0,bytecodes^[k] 'start FIFO read at bytecodes

' push #$1FF 'start xbyte UNCOMMENT FOR NORMAL XBYTE

' _ret_ setq #$000 '(full 8-bit lookup at LUT $000) UNCOMMENT FOR NORMAL XBYTE

rep @.r,#8 'prepare to single-step by stuffing stack with byteloop address

push ##byteloop '(bottom stack value gets copied each _RET_ / RET)

byteloop nop '21-NOP landing strip for any trailing skip pattern

nop 'that XBYTE would have canceled on _RET_ / RET

nop

rfbyte pa 'get next bytecode into pa

getptr pb 'get next bytecode address into pb

debug(uhex_byte(pa),uhex_long(pb)) 'show bytecode and next bytecode address

rdlut temp,pa 'lookup EXECF long from LUT

execf temp 'do EXECF to execute bytecode, returns to byteloop

SETQ CONSIDERATIONS

The SETQ and SETQ2 instructions write to the Q register and are intended to precede a companion instruction. The value written to the Q register by SETQ/SETQ2 will persist until any of these events occur:

XORO32 executes - Q is set to the XORO32 result.
RDLUT executes - Q is set to the data read from the lookup RAM.
GETXACC executes - Q is set to the Goertzel sine accumulator value.
CRCNIB executes - Q gets shifted left by four bits.
COGINIT/QDIV/QFRAC/QROTATE executes without a preceding SETQ instruction - Q is set to zero.

CRCNIB is the only instruction which both inputs Q and outputs Q, requiring it to not be disrupted between the initial SETQ and subsequent CRCNIB(s). For that reason, CRCNIB sequences should be protected from interrupts by STALLI/ALLOWI instructions or by being placed within a REP block, which is automatically shielded from interrupts, including non-stallable debug interrupts.

It is possible to retrieve the current Q value by the following sequence:

MOV qval,#0 'reset qval

MUXQ qval,##$FFFFFFFF 'for each '1' bit in Q, set the same bit in qval

SETQ/SETQ2 shields the next instruction from interruption to prevent an interrupt service routine from inadvertently altering Q before the intended instruction can utilize its value.

PIXEL OPERATIONS

Each cog has a pixel mixer which can combine one pixel with another pixel in many different ways. A pixel consists of four byte fields within a 32-bit cog register. Pixel operations occur between each pair of D and S bytes, and they take seven clock cycles to complete:

ADDPIX D,S/# 'add bytes with saturation

MULPIX D,S/# 'multiply bytes ($FF = 1.0)

BLNPIX D,S/# 'alpha-blend bytes according to SETPIV value

MIXPIX D,S/# 'mix bytes according to SETPIX/SETPIV value

There are two pixel mixer setup instructions:

SETPIV D/# 'set blend factor V[7:0] to D/#[7:0]

SETPIX D/# 'set MIXPIX mode M[5:0] to D/#[5:0]

When a pixel mixer instruction executes, a sum-of-products-with-saturation computation is performed on each D and S byte pair:

D[31:24] = ((D[31:24] * DMIX + S[31:24] * SMIX + $FF) >> 8) max $FF

D[23:16] = ((D[23:16] * DMIX + S[23:16] * SMIX + $FF) >> 8) max $FF

D[15:08] = ((D[15:08] * DMIX + S[15:08] * SMIX + $FF) >> 8) max $FF

D[07:00] = ((D[07:00] * DMIX + S[07:00] * SMIX + $FF) >> 8) max $FF

Here are the DMIX and SMIX terms, according to each instruction:

	DMIX	SMIX
ADDPIX	$FF	$FF
MULPIX	S[byte]	$00
BLNPIX	!V	V
MIXPIX	M[5:3] = %000 → $00 M[5:3] = %001 → $FF M[5:3] = %010 → V M[5:3] = %011 → !V M[5:3] = %100 → S[byte] M[5:3] = %101 → !S[byte] M[5:3] = %110 → D[byte] M[5:3] = %111 → !D[byte]	M[2:0] = %000 → $00 M[2:0] = %001 → $FF M[2:0] = %010 → V M[2:0] = %011 → !V M[2:0] = %100 → S[byte] M[2:0] = %101 → !S[byte] M[2:0] = %110 → D[byte] M[2:0] = %111 → !D[byte]

DACs

Each cog outputs four 8-bit DAC channels that can directly drive the DACs within the pins. For this to work, the pins of interest will need to be configured for DAC-channel output.

DAC0 can drive the DAC's of all pins numbered %XXXX00.

DAC1 can drive the DAC's of all pins numbered %XXXX01.

DAC2 can drive the DAC's of all pins numbered %XXXX10.

DAC3 can drive the DAC's of all pins numbered %XXXX11.

The background state of these four 8-bit channels can be established by SETDACS:

SETDACS D/# - Write bytes 3/2/1/0 of D/# to DAC3/DAC2/DAC1/DAC0

The DAC values established by SETDACS will be constantly output, except at times when the streamer and/or colorspace converter override them.

STREAMER

Each cog has a streamer which can automatically output timed state sequences to pins and DACs. It can also capture pin and ADC readings to hub RAM and perform Goertzel computations from smart pins configured as ADC's.

There are five instructions directly associated with the streamer:

SETXFRQ D/# - Set NCO frequency

XINIT D/#,S/# - Issue command immediately, zeroing phase

XZERO D/#,S/# - Issue command on final NCO rollover (waits), zeroing phase

XCONT D/#,S/# - Issue command on final NCO rollover (waits), continuing phase

GETXACC D - Get Goertzel X into D and Y into next S, clear X and Y

The streamer uses a numerically-controlled oscillator (NCO) to time its operation. On every clock while the streamer is active, it adds a 32-bit frequency value into a 32-bit phase accumulator, while masking the MSB of the original phase. The NCO can be understood as such:

phase = (phase & $7FFF_FFFF) + frequency

The MSB of the resultant phase value indicates NCO rollover and is used as a trigger to advance the state of the streamer. This is true for every mode except DDS/Goertzel, in which case the streamer runs continuously.

The frequency of the streamer's NCO rollover is set by the 'SETXFRQ D/#' instruction, where D/# expresses a fractional 0-to-1 multiplier for the system clock, which value must be multiplied by $8000_0000. Here are some system clock multipliers and the D/# values that realize them:

1 $8000_0000 (default value on cog start)

1 / 2 $4000_0000

1 / 3 $2AAA_AAAA+1 *

1 / 4 $2000_0000

1 / 5 $1999_9999+1 *

1 / 6 $1555_5555+1 *

1 / 7 $1249_2492+1 *

1 / 8 $1000_0000

* For fractions with remainders, the computed D/# value should be incremented, in order to produce proper initial rollover behavior.

The NCO frequency may also be set/changed via a 'SETQ D/#' instruction immediately preceding an XINIT/XZERO/XCONT instruction. When the streamer command executes, the new frequency will be set during the first clock of the command. If no SETQ is used before the instruction, the frequency will remain the same when the command executes.

The streamer may be activated by a command from an XINIT/XZERO/XCONT instruction. For these instructions, D/# expresses the streamer mode and duration, while S/# supplies various data, or is ignored, depending upon the mode expressed in D/#.

There is a single-level command buffer in the streamer, enabling you to give it two initial commands before it makes you wait for the first command to finish before accepting another. This command buffer enables you to coordinate streamer activity with smart pin activity. By executing an XINIT and then an XCONT, you get time during the XINIT command to instantiate a smart pin to perform some operation which will then correlate with the queued XCONT command. Think of tossing a ball up gently, so that you can then hit it with a bat.

For the XINIT/XZERO/XCONT instructions, D/#[31:16] conveys the command, while D/#[15:0] conveys the number of NCO rollovers that the command will be active for. S/# is used to select sub-modes for some commands:

D/#[31:16]

Mode DACs Pins Misc S/# Description Pins DAC Channels $X3_X2_X1_X0 .

Immediate ⇢ LUT ⇢ Pins/DACs

0000 dddd eppp bbbb <long> imm -> 32 x 1-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0001 dddd eppp bbbb <long> imm -> 16 x 2-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0010 dddd eppp bbbb <long> imm -> 8 x 4-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0011 dddd eppp bbbb <long> imm -> 4 x 8-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

Immediate ⇢ Pins/DACs

0100 dddd eppp pppa <long> imm 32 x 1 -> 1-pin + 1-DAC1 1 out %00000000_00000000_00000000_aaaaaaaa

0101 dddd eppp pp0a <long> imm 16 x 2 -> 2-pin + 2-DAC1 2 out %00000000_00000000_bbbbbbbb_aaaaaaaa

0101 dddd eppp pp1a <long> imm 16 x 2 -> 2-pin + 1-DAC2 2 out %00000000_00000000_00000000_babababa

0110 dddd eppp p00a <long> imm 8 x 4 -> 4-pin + 4-DAC1 4 out %dddddddd_cccccccc_bbbbbbbb_aaaaaaaa

0110 dddd eppp p01a <long> imm 8 x 4 -> 4-pin + 2-DAC2 4 out %00000000_00000000_dcdcdcdc_babababa

0110 dddd eppp p10a <long> imm 8 x 4 -> 4-pin + 1-DAC4 4 out %00000000_00000000_00000000_dcbadcba

0110 dddd eppp 0110 <long> imm 4 x 8 -> 8-pin + 4-DAC2 8 out %hghghghg_fefefefe_dcdcdcdc_babababa

0110 dddd eppp 0111 <long> imm 4 x 8 -> 8-pin + 2-DAC4 8 out %00000000_00000000_hgfehgfe_dcbadcba

0110 dddd eppp 1110 <long> imm 4 x 8 -> 8-pin + 1-DAC8 8 out %00000000_00000000_00000000_hgfedcba

0110 dddd eppp 1111 <long> imm 2 x 16 -> 16-pin + 4-DAC4 16 out %ponmponm_lkjilkji_hgfehgfe_dcbadcba

0111 dddd eppp 0000 <long> imm 2 x 16 -> 16-pin + 2-DAC8 16 out %00000000_00000000_ponmlkji_hgfedcba

0111 dddd eppp 0001 <long> imm 1 x 32 -> 32-pin + 4-DAC8 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

RDFAST ⇢ LUT ⇢ Pins/DACs

0111 dddd eppp 001a bbbb RFLONG -> 32 x 1-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0111 dddd eppp 010a bbbb RFLONG -> 16 x 2-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0111 dddd eppp 011a bbbb RFLONG -> 8 x 4-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

0111 dddd eppp 1000 bbbb RFLONG -> 4 x 8-bit LUT 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

RDFAST ⇢ Pins/DACs

1000 dddd eppp pppa - 1/8 RFBYTE -> 1-pin + 1-DAC1 1 out %00000000_00000000_00000000_aaaaaaaa

1001 dddd eppp pp0a - 1/4 RFBYTE -> 2-pin + 2-DAC1 2 out %00000000_00000000_bbbbbbbb_aaaaaaaa

1001 dddd eppp pp1a - 1/4 RFBYTE -> 2-pin + 1-DAC2 2 out %00000000_00000000_00000000_babababa

1010 dddd eppp p00a - 1/2 RFBYTE -> 4-pin + 4-DAC1 4 out %dddddddd_cccccccc_bbbbbbbb_aaaaaaaa

1010 dddd eppp p01a - 1/2 RFBYTE -> 4-pin + 2-DAC2 4 out %00000000_00000000_dcdcdcdc_babababa

1010 dddd eppp p10a - 1/2 RFBYTE -> 4-pin + 1-DAC4 4 out %00000000_00000000_00000000_dcbadcba

1010 dddd eppp 0110 - RFBYTE -> 8-pin + 4-DAC2 8 out %hghghghg_fefefefe_dcdcdcdc_babababa

1010 dddd eppp 0111 - RFBYTE -> 8-pin + 2-DAC4 8 out %00000000_00000000_hgfehgfe_dcbadcba

1010 dddd eppp 1110 - RFBYTE -> 8-pin + 1-DAC8 8 out %00000000_00000000_00000000_hgfedcba

1010 dddd eppp 1111 - RFWORD -> 16-pin + 4-DAC4 16 out %ponmponm_lkjilkji_hgfehgfe_dcbadcba

1011 dddd eppp 0000 - RFWORD -> 16-pin + 2-DAC8 16 out %00000000_00000000_ponmlkji_hgfedcba

1011 dddd eppp 0001 - RFLONG -> 32-pin + 4-DAC8 32 out %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

RDFAST ⇢ RGB ⇢ Pins/DACs

1011 dddd eppp 0010 rgb RFBYTE -> 24-pin + LUMA8 32 out %rrrrrrrr_gggggggg_bbbbbbbb_00000000

1011 dddd eppp 0011 - RFBYTE -> 24-pin + RGBI8 32 out %rrrrrrrr_gggggggg_bbbbbbbb_00000000

1011 dddd eppp 0100 - RFBYTE -> 24-pin + RGB8 (3:3:2) 32 out %rrrrrrrr_gggggggg_bbbbbbbb_00000000

1011 dddd eppp 0101 - RFWORD -> 24-pin + RGB16 (5:6:5) 32 out %rrrrrrrr_gggggggg_bbbbbbbb_00000000

1011 dddd eppp 0110 - RFLONG -> 24-pin + RGB24 (8:8:8) 32 out %rrrrrrrr_gggggggg_bbbbbbbb_00000000

Pins ⇢ DACs/WRFAST

1100 dddd wppp pppa - 1-pin -> 1-DAC1 + 1/8 WFBYTE 1 in %00000000_00000000_00000000_aaaaaaaa

1101 dddd wppp pp0a - 2-pin -> 2-DAC1 + 1/4 WFBYTE 2 in %00000000_00000000_bbbbbbbb_aaaaaaaa

1101 dddd wppp pp1a - 2-pin -> 1-DAC2 + 1/4 WFBYTE 2 in %00000000_00000000_00000000_babababa

1110 dddd wppp p00a - 4-pin -> 4-DAC1 + 1/2 WFBYTE 4 in %dddddddd_cccccccc_bbbbbbbb_aaaaaaaa

1110 dddd wppp p01a - 4-pin -> 2-DAC2 + 1/2 WFBYTE 4 in %00000000_00000000_dcdcdcdc_babababa

1110 dddd wppp p10a - 4-pin -> 1-DAC4 + 1/2 WFBYTE 4 in %00000000_00000000_00000000_dcbadcba

1110 dddd wppp 0110 - 8-pin -> 4-DAC2 + WFBYTE 8 in %hghghghg_fefefefe_dcdcdcdc_babababa

1110 dddd wppp 0111 - 8-pin -> 2-DAC4 + WFBYTE 8 in %00000000_00000000_hgfehgfe_dcbadcba

1110 dddd wppp 1110 - 8-pin -> 1-DAC8 + WFBYTE 8 in %00000000_00000000_00000000_hgfedcba

1110 dddd wppp 1111 - 16-pin -> 4-DAC4 + WFWORD 16 in %ponmponm_lkjilkji_hgfehgfe_dcbadcba

1111 dddd wppp 0000 - 16-pin -> 2-DAC8 + WFWORD 16 in %00000000_00000000_ponmlkji_hgfedcba

1111 dddd wppp 0001 - 32-pin -> 4-DAC8 + WFLONG 32 in %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

ADCs/Pins ⇢ DACs/WRFAST

1111 dddd w--- 0010 ss 1-ADC8 -> 1-DAC8 + WFBYTE 8 in %00000000_00000000_00000000_hgfedcba

1111 dddd wppp 0011 ss 1-ADC8 + 8-pin -> 2-DAC8 + WFWORD 16 in %00000000_00000000_ponmlkji_hgfedcba

1111 dddd w--- 0100 s- 2-ADC8 -> 2-DAC8 + WFWORD 16 in %00000000_00000000_ponmlkji_hgfedcba

1111 dddd wppp 0101 s- 2-ADC8 + 16-pin -> 4-DAC8 + WFLONG 32 in %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1111 dddd w--- 0110 -- 4-ADC8 -> 4-DAC8 + WFLONG 32 in %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

DDS/Goertzel

1111 dddd 0ppp p111 <config> DDS/Goertzel LUT SINC1 * 4 in ADC %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

1111 dddd 1ppp p111 <config> DDS/Goertzel LUT SINC2 * 4 in ADC %PONMLKJI_HGFEDCBA_ponmlkji_hgfedcba

Each of these modes requires explanation, but there are some overlapping matters that can be covered first.

The 16-bit D[15:0] field expresses an initial counter value which will be decremented on each subsequent NCO rollover, with each rollover causing new streamer data to be output or input. When the counter equals 1 and the NCO is rolling over for the last time for the current command, a new command may be seamlessly begun by a buffered XZERO/XCONT instruction. If no XZERO/XCONT instruction is buffered, the counter goes to 0. When the counter reaches 0, or is set to 0, streamer operation stops and all streamer DAC overrides and streamer pin outputs cease.

By setting the D[15:0] count to its maximal value of $FFFF, a streamer command will run perpetually.

XINIT (re)starts the streamer, no matter what state it is in. 'XINIT #0,#0' will always stop the streamer immediately. XSTOP (no operands) is an alias for 'XINIT #0,#0'.

XZERO and XCONT are used to maintain seamless streamer I/O, from command to command. They wait for the prior command's last clock cycle. If the streamer count has already run down to 0, there is no waiting. Also, if the prior command used $FFFF for its initial count, in which case the streamer is running perpetually without decrementing its counter, a new XZERO/XCONT command will only wait for the next NCO rollover, at which point the streamer will begin executing the new command.

XZERO clears out the phase accumulator when it executes. This clearing is desirable when, say, pixels are being output at 1/3 Fclk and and you don't want a 1-clock delay (glitch) every ~30 seconds, due to imperfect fractions like %5555_5555 = ~1/3. In such a case, it would be good to use XZERO to initiate the horizontal sync pulse, while using XCONT everywhere else. It may also be desirable to increment such frequency values by 1, so that the initial NCO rollover occurs on the Nth clock, and not on the Nth+1 clock.

XCONT is like XZERO, but does not affect the phase accumulator. XCONT is useful in cases where NCO phase and frequency should be strictly maintained and streamer activity should ride along with it.

The streamer has four DAC output channels, X0, X1, X2 and X3, which can selectively override the four SETDACS values on a per-DAC basis. To bring out the data as a voltage on a pin, that pin must be set to DAC mode with the COGID embedded, via WRPIN, and DIR must be set high.

The %dddd field in D[27:24] selects which streamer DAC channels will override which SETDACS values during active streamer operation. In the table below, "--" indicates no-override and "!" indicates one's-complement:

DAC Channel

dddd 3 2 1 0 description .

0000 -- -- -- -- no streamer DAC output

0001 X0 X0 X0 X0 output X0 on all four DAC channels

0010 -- -- X0 X0 output X0 on DAC channels 1 and 0

0011 X0 X0 -- -- output X0 on DAC channels 3 and 2

0100 -- -- -- X0 output X0 on DAC channel 0

0101 -- -- X0 -- output X0 on DAC channel 1

0110 -- X0 -- -- output X0 on DAC channel 2

0111 X0 -- -- -- output X0 on DAC channel 3

1000 !X0 X0 !X0 X0 output X0 diff pairs on all four DAC channels

1001 -- -- !X0 X0 output X0 diff pairs on DAC channels 1 and 0

1010 !X0 X0 -- -- output X0 diff pairs on DAC channels 3 and 2

1011 X1 X0 X1 X0 output X1, X0 pairs on all four DAC channels

1100 -- -- X1 X0 output X1, X0 on DAC channels 1 and 0

1101 X1 X0 -- -- output X1, X0 on DAC channels 3 and 2

1110 !X1 X1 !X0 X0 output X1, X0 diff pairs on all four DAC channels

1111 X3 X2 X1 X0 output X3, X2, X1, X0 on all four DAC channels

Modes which can output to pins OR the streamer pin-output bus with {OUTB, OUTA} to produce the final 64 pin output states on each clock for the cog. For these modes, %e in D[23] must be '1' to enable pin output.

Modes which input from pins read {INB, INA} and can optionally write the pin data to hub RAM. For these modes, %w in D[23] must be '1' to enable automatic WFBYTE/WFWORD/WFLONG operations.

In every mode^[l], the three %ppp bits in D[22:20] select the pin group, in 8-pin increments, which will be used as outputs or inputs, for up to 32-pin transfers. The selection wraps around:

%ppp : 000 = select pins 31..0

001 = select pins 39..8

010 = select pins 47..16

011 = select pins 55..24

100 = select pins 63..32

101 = select pins 7..0, 63..40

110 = select pins 15..0, 63..48

111 = select pins 23..0, 63..56

For modes which involve less than 8 pins, lower-order %p bit(s) in D[19:19..17] are used to further resolve the pin number(s)^[m].

Modes which shift data use bits bottom-first, by default. Some of these modes have the %a bit in D[16] to reorder the data sequence within the individual bytes to top-first when %a = 1^[n].

For RDFAST modes, it is necessary to do a RDFAST sometime beforehand, to ensure that the hub RAM FIFO is ready to deliver data.

For WRFAST modes, it is necessary to do a WRFAST sometime beforehand, to ensure that the hub RAM FIFO is ready to receive data.

Immediate ⇢ LUT ⇢ Pins/DACs

S/# supplies 32 bits of data which form a set of 1/2/4/8-bit values that are shifted by 1/2/4/8 bits on each subsequent NCO rollover, with the last value repeating. Each value gets used as an offset address into lookup RAM, with the %bbbb bits in D[19:16] furnishing the base address of %bbbb00000. The resulting 32 bits of data read from lookup RAM (at %bbbb00000 + 1/2/4/8-bit value) are output.

Immediate ⇢ Pins/DACs

S/# supplies 32 bits of data which form a set of 1/2/4/8/16-bit values that are shifted by 1/2/4/8/16/32 bits on each subsequent NCO rollover, with the last value repeating. Each value is output in sequence.

RDFAST ⇢ LUT ⇢ Pins/DACs

Automatic RFLONG operations are done to read 32 bits at a time from hub RAM. The data are treated as a set of 1/2/4/8-bit values that are shifted by 1/2/4/8 bits on each subsequent NCO rollover, with the last value triggering a new RFLONG. Each value gets used as an offset address into lookup RAM, with the %bbbb bits in S[3:0] furnishing the base address of %bbbb00000. The resultant 32 bits of data read from lookup RAM (at %bbbb00000 + 1/2/4/8-bit value) are output.

RDFAST ⇢ Pins/DACs

Automatic RFBYTE/RFWORD/RFLONG operations are done to read 8/16/32 bits at a time from hub RAM. The data are treated as a set of 1/2/4/8/16/32-bit values that are shifted by 1/2/4/8/16/32 bits on each subsequent NCO rollover, with the last value triggering a new RFBYTE/RFWORD/RFLONG. Each value is output in sequence.

RDFAST ⇢ RGB ⇢ Pins/DACs

RFBYTE/RFWORD/RFLONG operations, done initially and on each subsequent NCO rollover, read 8/16/32-bit pixel values from hub RAM. The pixel values P[31/15/7:0] are translated into {R[7:0], G[7:0], B[7:0], 8'b0} values and output to X3, X2, X1, and X0.

LUMA8 mode uses three bits in S[2:0] as colors and the 8-bit pixels as luminance values:

S[2:0]	Color	X3	X2	X1	X0
%000	Orange	P[7:0]	%0, P[7:1]	$00	$00
%001	Blue	$00	$00	P[7:0]	$00
%010	Green	$00	P[7:0]	$00	$00
%011	Cyan	$00	P[7:0]	P[7:0]	$00
%100	Red	P[7:0]	$00	$00	$00
%101	Magenta	P[7:0]	$00	P[7:0]	$00
%110	Yellow	P[7:0]	P[7:0]	$00	$00
%111	White	P[7:0]	P[7:0]	P[7:0]	$00

RGBI8 mode uses the top three bits of the 8-bit pixel values as colors and the bottom 5 bits as luminance values:

P[7:5]	Color	X3	X2	X1	X0
%000	Orange	P[4,3,2,1,0,4,3,2]	%0, P[4,3,2,1,0,4,3]	$00	$00
%001	Blue	$00	$00	P[4,3,2,1,0,4,3,2]	$00
%010	Green	$00	P[4,3,2,1,0,4,3,2]	$00	$00
%011	Cyan	$00	P[4,3,2,1,0,4,3,2]	P[4,3,2,1,0,4,3,2]	$00
%100	Red	P[4,3,2,1,0,4,3,2]	$00	$00	$00
%101	Magenta	P[4,3,2,1,0,4,3,2]	$00	P[4,3,2,1,0,4,3,2]	$00
%110	Yellow	P[4,3,2,1,0,4,3,2]	P[4,3,2,1,0,4,3,2]	$00	$00
%111	White	P[4,3,2,1,0,4,3,2]	P[4,3,2,1,0,4,3,2]	P[4,3,2,1,0,4,3,2]	$00

RGB8 mode uses the top three bits of the 8-bit pixel values for red, the next three for green, and the last two for blue:

X3	X2	X1	X0
P[7,6,5,7,6,5,7,6]	P[4,3,2,4,3,2,4,3]	P[1,0,1,0,1,0,1,0]	$00

RGB16 mode uses the top five bits of the 16-bit pixel values for red, the next six for green, and the last five for blue:

X3	X2	X1	X0
P[15:11], P[15:13]	P[10:5], P[10:9]	P[4:0], P[4:2]	$00

RGB24 mode uses the top three bytes of the 32-bit pixel values for red, green, and blue:

X3	X2	X1	X0
P[31:24]	P[23:16]	P[15:8]	$00

Pins ⇢ DACs/WRFAST

Initially, and on each subsequent NCO rollover, 1/2/4/8/16/32 pins are read from {INB, INA} and X3, X2, X1, and X0 are updated using the read data. If the %w bit in D[23] is high, WFBYTE/WFWORD/WFLONG operations will be done automatically to record the pin data. In the case of 1/2/4-pin modes, a WFBYTE will be done each time 8 bits of pin data accrue.

ADCs/Pins ⇢ DACs/WRFAST

This mode captures SCOPE channel data, along with optional pin data from {INB, INA}.

It will be necessary to use the SETSCP instruction beforehand to select the block of four pins which will feed the four 8-bit SCOPE channels. Any pins, within that block of four, that will be used as the ADC8 input(s) for this mode, must be put into "ADC sample" or "ADC scope" smart pin mode and enabled.

For the 1-ADC8 modes, where one of four SCOPE channels will be captured, the %ss bits in S[1:0] select the channel.

For the 2-ADC8 modes, where two of four SCOPE channels will be captured, the %s bit in S[1] selects the upper two or lower two channels.

For the 4-ADC8 mode, all four SCOPE channels will be captured.

For modes which also capture pin data, the lower 8 or 16 pins of the 32 pins selected by the %ppp bits in D[22:20] will be captured and placed into the lower half of the word/long, while the one or two SCOPE channels will be placed into the upper half.

Initially, and on each subsequent NCO rollover, SCOPE channel data and optional pin data are read and X3, X2, X1, and X0 are updated. If the %w bit in D[23] is high, WFBYTE/WFWORD/WFLONG operations will be done automatically to record the ADC and optional pin data.

DDS/Goertzel

This mode is unique, in that it outputs and inputs on every clock in which the command is active. Its purpose is to perform direct digital synthesis (DDS) on up to four DAC channels and/or to perform simultaneous Goertzel analysis on up to four ADC bit streams summed together.

On each clock, the upper bits of the NCO are used as an index to read a long containing four signed bytes from lookup RAM. The four bytes are output to X3, X2, X1, and X0 with their MSBs inverted, so that they may drive the unsigned DACs. The top two bytes from lookup RAM are also used as sine and cosine inputs to the Goertzel analyzer, where they are each multiplied by the sum of up to four ADC bitstreams and then separately accumulated.

Goertzel analysis can be thought of as a single slice of a Fourier transform, in which energy of a single frequency is measured amid potential noise for some number of NCO cycles. Goertzel analysis returns sine and cosine accumulations which can be converted into polar coordinates using the QVECTOR instruction, yielding power and phase information.

By incorporating DDS output with simultaneous Goertzel input, many interactive real-world measurements can be made to determine things like time-of-flight and resonance.

The four-pin input block is selected by the %pppp bits in D/#[22:19], where %pppp*4 is the base pin. One to four of these pins should be configured for ADC mode, so that their IN signals are raw delta-sigma bit streams, with no smart pin mode selected. For IN bitstream summation, '0' values are treated as -1 and '1' values are treated as +1. For cases of two or four input channels summed together, the sum is always even, so it is shifted right by one bit to conserve multiplication and accumulator resources.

S[19:0] supplies a 20-bit value which is used to configure the DDS/Goertzel mode. S[19:16] selects which of the four input pins are to be inverted, allowing for both addition and subtraction of particular input channels, while S[15:12] selects which of the four pins are to be included in the summation:

S[19:12] Effect

%xxxx_xxx0 Base pin +0 is ignored

%xxx0_xxx1 Base pin +0 is summed (0 ⇢ -1, 1 ⇢ +1)

%xxx1_xxx1 Base pin +0 is inverted and summed (0 ⇢ +1, 1 ⇢ -1)

%xxxx_xx0x Base pin +1 is ignored

%xx0x_xx1x Base pin +1 is summed

%xx1x_xx1x Base pin +1 is inverted and summed

%xxxx_x0xx Base pin +2 is ignored

%x0xx_x1xx Base pin +2 is summed

%x1xx_x1xx Base pin +2 is inverted and summed

%xxxx_0xxx Base pin +3 is ignored

%0xxx_1xxx Base pin +3 is summed

%1xxx_1xxx Base pin +3 is inverted and summed

S[11:0] selects how much and what part of the lookup RAM will be used, along with an offset:

S[11:0] Loop Size NCO Bits LUT Range

%000_TTTTTTTTT 512 30..22 %000000000..%111111111

%001_ATTTTTTTT 256 30..23 %A00000000..%A11111111

%010_AATTTTTTT 128 30..24 %AA0000000..%AA1111111

%011_AAATTTTTT 64 30..25 %AAA000000..%AAA111111

%100_AAAATTTTT 32 30..26 %AAAA00000..%AAAA11111

%101_AAAAATTTT 16 30..27 %AAAAA0000..%AAAAA1111

%110_AAAAAATTT 8 30..28 %AAAAAA000..%AAAAAA111

%111_AAAAAAATT 4 30..29 %AAAAAAA00..%AAAAAAA11

On each clock, the lookup RAM is read at the 9-bit location bound by the %A bits, with the lower bits being the sum of the %T bits and the topmost NCO bits. This allows you to set bounded areas within the LUT and to shift or modulate the phase of playback.

The 8-bit sine (byte 3) and cosine (byte 2) values from the lookup RAM will each be multiplied by the bitstream sum (an integer from -3 to +3) and then added into their respective 32-bit accumulators.

After some number of complete NCO cycles, both accumulators can be simultaneously captured into holding registers and cleared using the GETXACC instruction. GETXACC writes the captured cosine accumulation into D and places the captured sine accumulation into the next instruction's S value. Subsequent GETXACC instructions will return the same values until a new streamer command executes.

D[23] selects between SINC1 and SINC2 accumulation modes:

D[23] Mode Accumulations (SIN_ACC/COS_ACC are read and cleared by GETXACC)

%0 SINC1 SIN_MUL = bitstream_sum * lookup_sin

COS_MUL = bitstream_sum * lookup_cos

SIN_ACC += SIN_MUL

COS_ACC += COS_MUL

%1 SINC2 SIN_MUL += bitstream_sum * lookup_sin

COS_MUL += bitstream_sum * lookup_cos

SIN_ACC += SIN_MUL

COS_ACC += COS_MUL

The program below demonstrates both SINC1 and SINC2 modes in a looped Goertzel measurement of 100 cycles of 1MHz, taking 100us per measurement. The 4th line of the program must be changed to "sinc2 = 1" to select SINC2 mode:

' Goertzel input and display

con adcpin = 0

dacpin = 1

cycles = 100 'number of cycles to measure

sinc2 = 0 '0 for SINC1, 1 for SINC2

ampl = sinc2 ? 10 : 127 'small sin/cos amplitude for SINC2

shifts = sinc2 ? 23 : 12 'more right-shifts for SINC2 acc's

_clkfreq = 256_000_000

' Setup

dat org

wrpin adcmode,#adcpin 'init ADC pin

dirh #dacpin 'enable DAC pin

setxfrq freq 'set streamer NCO frequency

' Make sine and cosine tables in LUT bytes 3 and 2

mov z,#$1FF 'make 512-sample sin/cos table in LUT

sincos shl z,#32-9 'get angle into top 9 bits of z

qrotate #ampl,z 'rotate (ampl,0) by z

shr z,#32-9 'restore z

getqy y 'get y

getqx x 'get x

shl y,#24 'y into byte3

setbyte y,x,#2 'x into byte2

wrlut y,z 'write sin:cos:0:0 into LUT

djnf z,#sincos 'loop until 512 samples

' Input Goertzel measurements from adcpin and output power level to dacpin

loop xcont dds_d,dds_s 'issue Goertzel command

getxacc x 'get prior Goertzel acc's, cos first

mov y,0 '..then sin

modc sinc2 * %1111 wc 'if SINC2, get differences

if_c sub x,xdiff

if_c add xdiff,x

if_c sub y,ydiff

if_c add ydiff,y

qvector x,y 'convert (x,y) to (rho,theta)

getqx x 'get rho (power measurement)

shr x,#shifts 'shift power down to byte

setbyte dacmode,x,#1 'insert into dacmode

wrpin dacmode,#dacpin 'update DAC pin

jmp #loop 'loop

'Data

adcmode long %0000_0000_000_100011_0000000_00_00000_0 'ADC mode

dacmode long %0000_0000_000_10110_00000000_00_00000_0 'DAC mode

freq long round(1_000_000.0/256_000_000.0 * 65536.0 * 32768.0) '1.000000 MHz

dds_d long %1111_0000_0000_0111<<16 + sinc2<<23 + cycles 'Goertzel mode, pin 0..3 in

dds_s long %0000_0001_000_000000000 'input on pin +0, 512 table

x res 1

y res 1

z res 1

xdiff res 1

ydiff res 1

In the pictures that follow, you can see the program's DAC output pin while a function generator drives a 0-3.3V frequency-swept sine wave into the ADC input pin, going from 950-1050KHz over 12ms, while the program measures the energy level at 1MHz:

You can see that SINC2 mode has a higher Q than SINC1 mode. Due to rapid (X,Y) accumulator growth, SINC2 may require the sine/cosine table to be reduced in amplitude to avoid (X,Y) accumulator overflow. This was done in the example program above, where it was reduced from ±127 for SINC1 to ±10 for SINC2.

NOTE ABOUT GOERTZEL SINC2 MODE (2024.12.16)

It has just been discovered that the Goertzel SINC2 mode generates periodic problematic GETXACC readings when the number of iterations in a Goertzel cycle varies, due to SETXFREQ's D being a non-power-of-two value. The example code above was modified so that the clock frequency is now 256 MHz, instead of 250 MHz, so that the 1MHz being listened to will always take 256 clocks per Goertzel cycle. This causes the double-integrating accumulators in SINC2 mode to always have the same number of iterations before a GETXACC instruction executes and captures the double accumulations. Being off by a single clock cycle will corrupt the current and next samples.

Digital Video Output (DVI/HDMI)

The streamer can serialize its internal 32 pin output data P[31:0] into 8-pin/10-bit digital video format, where the 32-pin output becomes $000000xx with $xx being a reversible pattern of RED, GRN, BLU, and CLK differential pairs.

The SETCMOD instruction is used to write bits 8:7 of the CMOD register to set digital video mode:

CMOD[8:7]	Mode	Pin +31:8	Pin +7	Pin +6	Pin +5	Pin +4	Pin +3	Pin +2	Pin +1	Pin +0
%0x	Normal	P[31:8]	P[7]	P[6]	P[5]	P[4]	P[3]	P[2]	P[1]	P[0]
%10	DVI fwd	$000000	RED+	RED-	GRN+	GRN-	BLU+	BLU-	CLK+	CLK-
%11	DVI rev	$000000	CLK-	CLK+	BLU-	BLU+	GRN-	GRN+	RED-	RED+

Eight-bit red, green, and blue pixel data are encoded into 10-bit TMDS patterns for transmission, while control data, such as horizontal and vertical syncs, are transmitted literally. P[1] in the internal pin output data selects whether data will be TMDS-encoded or sent out literally:

P[31:0]

RED+/- serial

GRN+/- serial

BLU+/- serial

%RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxx0x

%RRRRRRRR

gets encoded

%GGGGGGGG

gets encoded

%BBBBBBBB

gets encoded

%rrrrrrrrrr_gggggggggg_bbbbbbbbbb_1x

%rrrrrrrrrr

is sent literally

%gggggggggg

is sent literally

%bbbbbbbbbb

is sent literally

Digital video output mode requires that the P2 clock frequency be 10x the pixel rate. For standard-compliant 640x480 digital video, which has a pixel rate of 25MHz, the P2 chip should be clocked at 250MHz.

The NCO frequency must be set to 1/10 of the main clock using the value $0CCCCCCC+1, where the +1 forces initial NCO rollover on the 10th clock.

The following program displays a 16bpp image in 640x480 HDMI mode:

'********************************************

'* VGA 640 x 480 x 16bpp 5:6:5 RGB - HDMI *

'********************************************

CON hdmi_base = 16 'must be a multiple of 8

DAT org

' Setup

hubset ##%1_000001_0000011000_1111_10_00 'config PLL, 20MHz/2*25*1 = 250MHz

waitx ##20_000_000 / 200 'allow crystal+PLL 5ms to stabilize

hubset ##%1_000001_0000011000_1111_10_11 'switch to PLL

rdfast ##640*350*2/64,##$1000 'set rdfast to wrap on bitmap

setxfrq ##$0CCCCCCC+1 'set streamer freq to 1/10th clk

setcmod #$100 'enable HDMI mode

drvl #7<<6 + hdmi_base 'enable HDMI pins

wrpin ##%100100_00_00000_0,#7<<6 + hdmi_base 'set 1mA drive on HDMI pins

' Field loop

fieldloop mov hsync0,sync_000 'vsync off

mov hsync1,sync_001

callpa #90,#blank 'top blanks

mov x,#350 'set visible lines

line call #hsync 'do horizontal sync

xcont m_rf,#0 'do visible line

djnz x,#line 'another line?

callpa #83,#blank 'bottom blanks

mov hsync0,sync_222 'vsync on

mov hsync1,sync_223

callpa #2,#blank 'vertical sync blanks

jmp #fieldloop 'loop

' Subroutines

blank call #hsync 'blank lines

xcont m_vi,hsync0

_ret_ djnz pa,#blank

hsync xcont m_bs,hsync0 'horizontal sync

xzero m_sn,hsync1

_ret_ xcont m_bv,hsync0

' Initialized data

sync_000 long %1101010100_1101010100_1101010100_10 '

sync_001 long %1101010100_1101010100_0010101011_10 ' hsync

sync_222 long %0101010100_0101010100_0101010100_10 'vsync

sync_223 long %0101010100_0101010100_1010101011_10 'vsync + hsync

m_bs long $70810000 + hdmi_base<<17 + 16 'before sync

m_sn long $70810000 + hdmi_base<<17 + 96 'sync

m_bv long $70810000 + hdmi_base<<17 + 48 'before visible

m_vi long $70810000 + hdmi_base<<17 + 640 'visible

m_rf long $B0850000 + hdmi_base<<17 + 640 'visible rfword rgb16 (5:6:5)

' Uninitialized data

x res 1

hsync0 res 1

hsync1 res 1

' Bitmap

orgh $1000 - 70 'justify pixels at $1000

file "birds_16bpp.bmp" 'rayman's picture (640 x 350)

COLORSPACE CONVERTER

Each cog has a colorspace converter which can perform ongoing matrix transformations and modulation of the cog's 8-bit DAC channels. The colorspace converter is intended primarily for baseband video modulation, but it can also be used as a general-purpose RF modulator.

The colorspace converter is configured via the following instructions:

SETCY {#}D - Set colorspace converter CY parameter to D[31:0]

SETCI {#}D - Set colorspace converter CI parameter to D[31:0]

SETCQ {#}D - Set colorspace converter CQ parameter to D[31:0]

SETCFRQ {#}D - Set colorspace converter CFRQ parameter to D[31:0]

SETCMOD {#}D - Set colorspace converter CMOD parameter to D[8:0]

It is intended that DAC3/DAC2/DAC1 serve as R/G/B channels. On each clock, new matrix and modulation calculations are performed through a pipeline. There is a group delay of five clocks from DAC-channel inputs to outputs when the colorspace converter is in use.

For the following signed multiply-accumulate computations, CMOD[4] determines whether the CY/CI/CQ terms will be sign-extended (CMOD[4] = 1) or zero-extended (CMOD[4] = 0). If zero-extended, using 128 for a CY/CI/CQ term will result in no attenuation of the related DAC term:

Y[7:0] = (DAC3 * CY[31:24] + DAC2 * CY[23:16] + DAC1 * CY[15:8]) / 128

I[7:0] = (DAC3 * CI[31:24] + DAC2 * CI[23:16] + DAC1 * CI[15:8]) / 128

Q[7:0] = (DAC3 * CQ[31:24] + DAC2 * CQ[23:16] + DAC1 * CQ[15:8]) / 128

The modulator works by subtracting CFRQ from PHS on each clock cycle, in order to get a clockwise angle rotation in the upper bits of PHS. PHS[31:24] is then used to rotate the coordinate pair (I, Q). The rotated Q coordinate becomes IQ. Because a 5-stage CORDIC rotator is used to perform the rotation, IQ gets scaled by 1.646. When using the modulator, this scaling will need to be taken into account when computing your CI/CQ terms, in order to avoid IQ overflow:

PHS[31:0] = PHS[31:0] - CFRQ[31:0]

IQ[7:0] = Q of (I,Q) after being rotated by PHS and multiplied by 1.646

The formula for computing CFRQ for a desired modulation frequency is: $1_0000_0000 * desired_frequency / clock_frequency. For example, if you wanted 3.579545 MHz and your clock frequency was 80 MHz, you would compute: $1_0000_0000 * 3_579_545 / 80_000_000 = $0B74_5CFE, which you would set using the SETCFRQ instruction.

The preliminary output terms are computed as follows:

FY[7:0] = CY[7:0] + (DAC0 & {8{CMOD[3]}}) + Y[7:0] (VGA R / HDTV Y)

FI[7:0] = CI[7:0] + (DAC0 & {8{CMOD[2]}}) + I[7:0] (VGA G / HDTV Pb)

FQ[7:0] = CQ[7:0] + (DAC0 & {8{CMOD[1]}}) + Q[7:0] (VGA B / HDTV Pr)

FS[7:0] = {8{DAC0[0] ^ CMOD[0]}} (VGA H-Sync)

FIQ[7:0] = CQ[7:0] + IQ[7:0] (Chroma)

FYS[7:0] = DAC0[1] ? 8'b0 (1x = Luma Sync)

: DAC0[0] ? CI[7:0] (01 = Luma Blank/Burst)

: CY[7:0] + Y[7:0] (00 = Luma Visible)

FYC[7:0] = FYS[7:0] + IQ[7:0] (Composite Luma+Chroma)

The final output terms are selected by CMOD[6:5]:

CMOD[6:5]	Mode	DAC3	DAC2	DAC1	DAC0
00	<off>	DAC3 (bypass)	DAC2 (bypass)	DAC1 (bypass)	DAC0 (bypass)
01	VGA (R-G-B) / HDTV (Y-Pb-Pr)	FY (R / Y)	FI (G / Pb)	FQ (B / Pr)	FS (H-Sync)
10	NTSC/PAL Composite + S-Video	FYC (Composite)	FYC (Composite)	FIQ (Chroma)	FYS (Luma)
11	NTSC/PAL Composite	FYC (Composite)	FYC (Composite)	FYC (Composite)	FYC (Composite)

I/O PIN TIMING

I/O pins are controlled by cogs via the following cog registers:

DIRA - output enable bits for P0..P31 (active high)

DIRB - output enable bits for P32..P63 (active high)

OUTA - output state bits for P0..P31 (corresponding DIRA bit must be high to enable output)

OUTB - output state bits for P32..P63 (corresponding DIRB bit must be high to enable output)

I/O pins are read by cogs via the following cog registers:

INA - input state bits for P0..P31

INB - input state bits for P32..P63

Aside from general-purpose instructions which may operate on DIRA/DIRB/OUTA/OUTB, there are special pin instructions which operate on singular bits within these registers:

DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx

OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx

FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D - affect pin D bit in OUTx, clear bit in DIRx

DRVL/DRVH/DRVC/DRVNC/DRVZ/DRVNZ/DRVRND/DRVNOT {#}D - affect pin D bit in OUTx, set bit in DIRx

As well, aside from general-purpose instructions which may read INA/INB, there are special pin instructions which can read singular bits within these registers:

TESTP {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ - read pin D bit in INx and affect C or Z

TESTPN {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ - read pin D bit in !INx and affect C or Z

When a DIRx/OUTx bit is changed by any instruction, it takes THREE additional clocks after the instruction before the pin starts transitioning to the new state. Here this delay is demonstrated using DRVH:

____0 ____1 ____2 ____3 ____4 ____5

Clock: / \____/ \____/ \____/ \____/ \____/ \____/

DIRA: | | DIRA-->| REG-->| REG-->| REG-->| P0 DRIV |

OUTA: | | OUTA-->| REG-->| REG-->| REG-->| P0 HIGH |

| |

Instruction: | DRVH #0 |

When an INx register is read by an instruction, it will reflect the state of the pins registered THREE clocks before the start of the instruction. Here this delay is demonstrated using TESTB:

____0 ____1 ____2 ____3 ____4 ____5

Clock: / \____/ \____/ \____/ \____/ \____/ \____/

INA: | P0 IN-->| REG-->| REG-->| REG-->| ALU-->| C/Z-->|

| |

Instruction: | TESTB INA,#0 |

When a TESTP/TESTPN instruction is used to read a pin, the value read will reflect the state of the pin registered TWO clocks before the start of the instruction. So, TESTP/TESTPN get fresher INx data than is available via the INx registers:

____0 ____1 ____2 ____3 ____4

Clock: / \____/ \____/ \____/ \____/ \____/

INA: | P0 IN-->| REG-->| REG-->| REG-->| C/Z-->|

| |

Instruction: | TESTP #0 |

COG ATTENTION

Each cog can request the attention of other cogs by using the COGATN instruction:

COGATN D/# 'get attention of cog(s), 2 clocks

The D/# operand supplies a 16-bit value in which bits 0..15 represent cogs 0..15. For each set bit, the corresponding c^[o]og will be strobed, causing an 'attention' event for POLLATN/WAITATN and interrupt use. The 16 attention strobe outputs from all cogs are OR'd together to form a composite set of 16 strobes, from which each cog receives its particular strobe.

COGATN #%0000_0000_1111_0000 'request attention of cogs 4..7

POLLATN WC 'has attention been requested?

WAITATN 'wait for attention request

JATN S/# 'jump to S/# if attention requested

JNATN S/# 'jump to S/# if attention not requested

In cases where multiple cogs may be requesting the attention of a single cog, some messaging structure may need to be implemented in hub RAM, in order to differentiate requests. In the main intended use case, the cog that is receiving an attention request knows which other cog is strobing it and how it is to respond.

EVENTS

Cogs monitor and track 16 different background events, numbered 0..15:

Event 0 = An interrupt occurred
Event 1 = CT passed CT1 (CT is the lower 32-bits of the free-running 64-bit global counter)
Event 2 = CT passed CT2
Event 3 = CT passed CT3
Event 4 = Selectable event 1 occurred
Event 5 = Selectable event 2 occurred
Event 6 = Selectable event 3 occurred
Event 7 = Selectable event 4 occurred
Event 8 = A pattern match or mismatch occurred on either INA or INB
Event 9 = Hub FIFO block-wrap occurred - a new start address and block count were loaded
Event 10 = Streamer command buffer is empty - it's ready to accept a new command
Event 11 = Streamer finished - it ran out of commands, now idle
Event 12 = Streamer NCO rollover occurred
Event 13 = Streamer read lookup RAM location $1FF
Event 14 = Attention was requested by another cog or other cogs
Event 15 = GETQX/GETQY executed without any CORDIC results available

Events are tracked and can be polled, waited for, and used as interrupt sources.

Before explaining the details, consider the event-related instructions.

First are the POLLxxx instructions which simultaneously return their event-occurred flag into C and/or Z, and clear their event-occurred flag (unless it's being set again by the event sensor):

Interrupt source (0=off):

POLLINT Poll the interrupt-occurred event flag -

POLLCT1 Poll the CT-passed-CT1 event flag 1

POLLCT2 Poll the CT-passed-CT2 event flag 2

POLLCT3 Poll the CT-passed-CT3 event flag 3

POLLSE1 Poll the selectable-event-1 event flag 4

POLLSE2 Poll the selectable-event-2 event flag 5

POLLSE3 Poll the selectable-event-3 event flag 6

POLLSE4 Poll the selectable-event-4 event flag 7

POLLPAT Poll the pin-pattern-detected event flag 8

POLLFBW Poll the hub-FIFO-interface-block-wrap event flag 9

POLLXMT Poll the streamer-empty event flag 10

POLLXFI Poll the streamer-finished event flag 11

POLLXRO Poll the streamer-NCO-rollover event flag 12

POLLXRL Poll the streamer-lookup-RAM-$1FF-read event flag 13

POLLATN poll the attention-requested event flag 14

POLLQMT Poll the CORDIC-read-but-no-results event flag 15

Next are the WAITxxx instructions, which will wait for their event-occurred flag to be set (in case it's not, already) and then clear their event-occurred flag (unless it's being set again by the event sensor), before resuming.

By doing a SETQ right before one of these instructions, you can supply a future CT target value which will be used to end the wait prematurely, in case the event-occurred flag never went high before the CT target was reached. When using SETQ with 'WAITxxx WC', C will be set if the timeout occurred before the event; otherwise, C will be cleared.

WAITINT Wait for an interrupt to occur, stalls the cog to save power

WAITCT1 Wait for the CT-passed-CT1 event flag

WAITCT2 Wait for the CT-passed-CT2 event flag

WAITCT3 Wait for the CT-passed-CT3 event flag

WAITSE1 Wait for the selectable-event-1 event flag

WAITSE2 Wait for the selectable-event-2 event flag

WAITSE3 Wait for the selectable-event-3 event flag

WAITSE4 Wait for the selectable-event-4 event flag

WAITPAT Wait for the pin-pattern-detected event flag

WAITFBW Wait for the hub-FIFO-interface-block-wrap event flag

WAITXMT Wait for the streamer-empty event flag

WAITXFI Wait for the streamer-finished event flag

WAITXRO Wait for the streamer-NCO-rollover event flag

WAITXRL Wait for the streamer-lookup-RAM-$1FF-read event flag

WAITATN Wait for the attention-requested event flag

There's no 'WAITQMT' because the event could not happen while waiting.

Last are the 'Jxxx/JNxxx S/#' instructions, which each jump to S/# if their event-occurred flag is set (Jxxx) or clear (JNxxx). Whether or not a branch occurs, the event-occurred flag will be cleared, unless it's being set again by the event sensor.

JINT/JNINT Jump to S/# if the interrupt-occurred event flag is set/clear

JCT1/JNCT1 Jump to S/# if the CT-passed-CT1 event flag is set/clear

JCT2/JNCT2 Jump to S/# if the CT-passed-CT2 event flag is set/clear

JCT3/JNCT3 Jump to S/# if the CT-passed-CT3 event flag is set/clear

JSE1/JNSE1 Jump to S/# if the selectable-event-1 event flag is set/clear

JSE2/JNSE2 Jump to S/# if the selectable-event-2 event flag is set/clear

JSE3/JNSE3 Jump to S/# if the selectable-event-3 event flag is set/clear

JSE4/JNSE4 Jump to S/# if the selectable-event-4 event flag is set/clear

JPAT/JNPAT Jump to S/# if the pin-pattern-detected event flag is set/clear

JFBW/JNFBW Jump to S/# if the hub-FIFO-interface-block-wrap event flag is set/clear

JXMT/JNXMT Jump to S/# if the streamer-empty event flag is set/clear

JXFI/JNXFI Jump to S/# if the streamer-finished event flag is set/clear

JXRO/JNXRO Jump to S/# if the streamer-NCO-rollover event flag is set/clear

JXRL/JNXRL Jump to S/# if the streamer-lookup-RAM-$1FF-read event flag is set/clear

JATN/JNATN Jump to S/# if the attention-requested event flag is set/clear

JQMT/JNQMT Jump to S/# if the CORDIC-read-but-no-results event flag is set/clear

Here are detailed descriptions of each event flag. Understand that the 'set' events can also be used as interrupt sources (except in the case of the first flag which is set when an interrupt occurs):

POLLINT/WAITINT event flag

Cleared on cog start.
Set whenever interrupt 1, 2, or 3 occurs (debug interrupts are ignored).
Also cleared on POLLINT/WAITINT/JINT/JNINT.

POLLCT1/WAITCT1 event flag

Cleared on ADDCT1.
Set whenever CT passes the result of the ADDCT1 (MSB of CT minus CT1 is 0).
Also cleared on POLLCT1/WAITCT1/JCT1/JNCT1.

POLLCT2/WAITCT2 event flag

Cleared on ADDCT2.
Set whenever CT passes the result of the ADDCT2 (MSB of CT minus CT2 is 0).
Also cleared on POLLCT2/WAITCT2/JCT2/JNCT2.

POLLCT3/WAITCT3 event flag

Cleared on ADDCT3.
Set whenever CT passes the result of the ADDCT3 (MSB of CT minus CT3 is 0).
Also cleared on POLLCT3/WAITCT3/JCT3/JNCT3.

POLLPAT/WAITPAT event flag

Cleared on SETPAT
Set whenever (INA & D) != S after 'SETPAT D/#,S/#' with C=0 and Z=0.
Set whenever (INA & D) == S after 'SETPAT D/#,S/#' with C=0 and Z=1.
Set whenever (INB & D) != S after 'SETPAT D/#,S/#' with C=1 and Z=0.
Set whenever (INB & D) == S after 'SETPAT D/#,S/#' with C=1 and Z=1.
Also cleared on POLLPAT/WAITPAT/JPAT/JNPAT.

POLLFBW/WAITFBW event flag

Cleared on RDFAST/WRFAST/FBLOCK.
Set whenever the hub RAM FIFO interface exhausts its block count and reloads its 'block count' and 'start address'.
Also cleared on POLLFBW/WAITFBW/JFBW/JNFBW.

POLLXMT/WAITXMT event flag

Cleared on XINIT/XZERO/XCONT.
Set whenever the streamer is ready for a new command.
Also cleared on POLLXMT/WAITXMT/JXMT/JNXMT.

POLLXFI/WAITXFI event flag

Cleared on XINIT/XZERO/XCONT.
Set whenever the streamer runs out of commands.
Also cleared on POLLXFI/WAITXFI/JXFI/JNXFI.

POLLXRO/WAITXRO event flag

Cleared on XINIT/XZERO/XCONT.
Set whenever the the streamer NCO rolls over.
Also cleared on POLLXRO/WAITXRO/JXRO/JNXRO.

POLLXRL/WAITXRL event flag

Cleared on cog start.
Set whenever location $1FF of the lookup RAM is read by the streamer.
Also cleared on POLLXRL/WAITXRL/JXRL/JNXRL.

POLLATN/WAITATN event flag

Cleared on cog start.
Set whenever any cogs request attention.
Also cleared on POLLATN/WAITATN/JATN/JNATN.

POLLQMT event flag

Cleared on cog start.
Set whenever GETQX/GETQY executes without any CORDIC results available or in progress.
Also cleared on POLLQMT/WAITQMT/JQMT/JNQMT.

Example: ADDCT1/WAITCT1

'ADDCT1 D,S/#' must be used to establish a CT target. This is done by first using 'GETCT D' to get the current CT value into a register, and then using ADDCT1 to add into that register, thereby making a future CT target, which, when passed, will trigger the CT-passed-CT1 event and set the related event flag.

GETCT x 'get initial CT

ADDCT1 x,#500 'make initial CT1 target

.loop WAITCT1 'wait for CT to pass CT1 target

ADDCT1 x,#500 'update CT1 target

DRVNOT #0 'toggle P0

JMP #.loop 'loop to the WAITCT1

It doesn't matter what register is used to keep track of the CT1 target. Whenever ADDCT1 executes, S/# is added into D, and the result gets copied into a dedicated CT1 target register that is compared to CT on every clock. When CT passes the CT1 target, the event flag is set. ADDCT1 clears the CT-passed-CT1 event flag to help with initialization and cycling.

Selectable Events

Each cog can track up to four selectable pin, LUT, or hub lock events. This is accomplished by using the SETSEn instruction, where "n" is 1, 2, 3, or 4. In order for user code to detect the occurrence of the selected event, the following options are available:

The matched WAITSEn instruction will block until the event occurs
The matched POLLSEn instruction will check for the event without blocking
The matched JSEn and JNSEn branch instructions will branch according to the polled event state
As an interrupt (see INTERRUPTS)

Each selected event is set or cleared according to the following rules:

SEn is set whenever the configured event occurs.
SEn is cleared on matched POLLSEn / WAITSEn / JSEn / JNSEn.
SEn is cleared when matched 'SETSEn D/#' is called.

SETSEn D/# accepts the following configuration values:

%000_00_00AA = this cog reads LUT address %1111111AA

%000_00_01AA = this cog writes LUT address %1111111AA

%000_00_10AA = odd/even companion cog reads LUT address %1111111AA

%000_00_11AA = odd/even companion cog writes LUT address %1111111AA

%000_01_LLLL = hub lock %LLLL rises

%000_10_LLLL = hub lock %LLLL falls

%000_11_LLLL = hub lock %LLLL changes

%001_PPPPPP = INA/INB bit of pin %PPPPPP rises

%010_PPPPPP = INA/INB bit of pin %PPPPPP falls

%011_PPPPPP = INA/INB bit of pin %PPPPPP changes

%10x_PPPPPP = INA/INB bit of pin %PPPPPP is low

%11x_PPPPPP = INA/INB bit of pin %PPPPPP is high

INTERRUPTS

Each cog has three interrupts: INT1, INT2, and INT3.

INT1 has the highest priority and can interrupt INT2 and INT3.

INT2 has the middle priority and can interrupt INT3.

INT3 has the lowest priority and can only interrupt non-interrupt code.

The STALLI instruction can be used to hold off INT1, INT2 and INT3 interrupt branches indefinitely, while the ALLOWI instruction allows those interrupt branches to occur. Critical blocks of code can, therefore, be protected from interruption by beginning with STALLI and ending with ALLOWI.

There are 16 interrupt event sources, selected by a 4-bit pattern:

0 <off>, default on cog start for INT1/INT2/INT3 event sources

1 CT-passed-CT1, established by ADDCT1

2 CT-passed-CT2, established by ADDCT2

3 CT-passed-CT3, established by ADDCT3

4 SE1 event occurred, established by SETSE1

5 SE2 event occurred, established by SETSE2

6 SE3 event occurred, established by SETSE3

7 SE4 event occurred, established by SETSE4

8 Pin pattern match or mismatch occurred, established by SETPAT

9 Hub RAM FIFO interface wrapped and reloaded, established by RDFAST/WRFAST/FBLOCK

10 Streamer is ready for another command, established by XINIT/XZERO/ZCONT

11 Streamer ran out of commands, established by XINIT/XZERO/ZCONT

12 Streamer NCO rolled over, established by XINIT/XZERO/XCONT

13 Streamer read location $1FF of lookup RAM

14 Attention requested by other cog(s)

15 GETQX/GETQY executed without any CORDIC results available or in progress

To set up an interrupt, you need to first point its IJMP register to your interrupt service routine (ISR). When the interrupt occurs, it will jump to where the IJMP register points and simultaneously store the C/Z flags and return address into the adjacent IRET register:

$1F0 RAM / IJMP3 interrupt call address for INT3

$1F1 RAM / IRET3 interrupt return address for INT3

$1F2 RAM / IJMP2 interrupt call address for INT2

$1F3 RAM / IRET2 interrupt return address for INT2

$1F4 RAM / IJMP1 interrupt call address for INT1

$1F5 RAM / IRET1 interrupt return address for INT1

When your ISR is done, it can do a RETIx instruction to return to the interrupted code. The RETIx instructions are actually CALLD instructions:

RETI1 = CALLD INB,IRET1 WCZ

RETI2 = CALLD INB,IRET2 WCZ

RETI3 = CALLD INB,IRET3 WCZ

The CALLD with D = <any register>, S = IRETx, and WCZ, signals the cog that the interrupt is complete. This causes the cog to clear its internal interrupt-busy flag for that interrupt, so that another interrupt can occur. INB (read-only) is used as D for RETIx instructions to effectively make the CALLD into a JMP back to the interrupted code.

Instead of using RETIx, though, you could use RESIx to have your ISR resume at the next instruction when the next interrupt occurs:

RESI1 = CALLD IJMP1,IRET1 WCZ

RESI2 = CALLD IJMP2,IRET2 WCZ

RESI3 = CALLD IJMP3,IRET3 WCZ

Once you've got the IJMPx register configured to point to your ISR, you can enable the interrupt. This is done using the SETINTx instruction:

SETINT1 D/# Set INT1 event to 0..15 (see table above)

SETINT2 D/# Set INT2 event to 0..15 (see table above)

SETINT3 D/# Set INT3 event to 0..15 (see table above)

Interrupts may be forced in software by the TRGINTx instructions:

TRGINT1 Trigger INT1

TRGINT2 Trigger INT2

TRGINT3 Trigger INT3

Interrupts that have been triggered and are waiting to branch may be nixed in software by the NIXINTx instructions. These instructions are only useful in main code after STALLI executes or in an ISR which needs to stop a lower-level interrupt from executing after the current ISR exits:

NIXINT1 Nix INT1

NIXINT2 Nix INT2

NIXINT3 Nix INT3

Interrupts can be stalled or allowed using the following instructions:

ALLOWI Allow any stalled and future interrupt branches to occur indefinitely (default mode on cog start)

STALLI Stall interrupt branches indefinitely until ALLOWI executes

When an interrupt event occurs, certain conditions must be met during execution before the interrupt branch can happen:

ALTxx / CRCNIB / SCA / SCAS / GETCT+WC / GETXACC / SETQ / SETQ2 / XORO32 / XBYTE must not be executing
AUGS must not be executing or waiting for a S/# instruction
AUGD must not be executing or waiting for a D/# instruction
REP must not be executing or active
STALLI must not be executing or active
The cog must not be stalled in any WAITx instruction

Once these conditions are all met, any pending interrupt is allowed to branch, with priority given to INT1, then INT2, and then INT3.

Interrupt branches are realized, internally, by inserting a 'CALLD IRETx,IJMPx WCZ' into the instruction pipeline while holding the program counter at its current value, so that the interrupt later returns to the address saved in IRETx.

Interrupts loop through these three states:

Waiting for interrupt event
Waiting for interrupt branch
Executing interrupt service routine

During states 2 and 3, any intervening interrupt events at the same priority level are ignored. When state 1 is returned to, a new interrupt event will be waited for.

Example: Using INT1 as a CT1 interrupt

org

start mov ijmp1,#isr1 'set int1 vector

setint1 #1 'set int1 for ct-passed-ct1 event

getct ct1 'set initial ct1 target

addct1 ct1,#50

'main program, gets interrupted

loop drvnot #0 'toggle p0

jmp #loop 'loop

'int1 isr, runs once every 50 clocks

isr1 drvnot #1 'toggle p1

addct1 ct1,#50 'update ct1 target

reti1 'return to main program

ct1 res 'reserve long for ct1

DEBUG INTERRUPT

In addition to the three visible interrupts, there is a fourth "hidden" interrupt that has priority over all the others. It is the debug interrupt, and it is inaccessible to normal cog programs.

Debug interrupts are enabled on a per-cog basis via HUBSET. Each debug-enabled cog will generate a debug interrupt on (re)start from each COGINIT exercised upon it. Within that initial debug ISR and within each subsequent debug ISR, multiple trigger conditions may be set for the next debug interrupt. If no trigger conditions are set before the debug ISR ends, no more debug interrupts will occur until the cog is restarted from another COGINIT.

The last 16KB of hub RAM, which is also mapped to $FC000..$FFFFF, gets partially used as a buffer area for saving and restoring cog registers during debug ISR's. The initial debug ISR routines are also stored in this upper RAM. Once initialized with debug ISR code, this upper hub RAM can be write-protected, in which case it is mapped only to $FC000..$FFFFF and it is only writable from within debug ISR's.

Each cog has an execute-only ROM in cog registers $1F8..$1FF which contains special debug-ISR-entry and -exit routines. These tiny routines perform seamless register-load and register-restore operations for your debugger program, which must be realized entirely within debug ISR's.

Execute-only ROM in cog registers $1F8..$1FF

(%cccc = !CogNumber)

Debug ISR Entry - IJMP0 is initialized to $1F8 on COGINIT

$1F8 - SETQ #$0F 'save registers $000..$00F

$1F9 - WRLONG 0,* '* = %1111_1111_1ccc_c000_0000

$1FA - SETQ #$0F 'load program into $000..$00F

$1FB - RDLONG 0,* '* = %1111_1111_1ccc_c100_0000

$1FC - JMP #0 'jump to loaded program

Debug ISR Exit - Jump here to exit your debug ISR

$1FD - SETQ #$0F 'restore registers $000..$00F

$1FE - RDLONG 0,* '* = %1111_1111_1ccc_c000_0000

$1FF - RETI0 'CALLD IRET0,IRET0 WCZ

During a debug ISR, INA and INB, normally read-only input-pin registers, become readable/writable RAM registers named IJMP0 and IRET0, and are used by the debug interrupt as jump and return addresses. On COGINIT, IJMP0 is initialized to $1F8 which is the debug-ISR-entry routine's address.

When a debug interrupt occurs with IJMP0 pointing to $1F8, the following sequence happens:

Cog registers $000 to $00F are saved to hub RAM starting at ($FF800 + !CogNumber << 7), or %1111_1111_1ccc_c000_0000, where %cccc = !CogNumber.

Cog registers $000 to $00F are loaded from hub RAM starting at ($FF840 + !CogNumber << 7), or %1111_1111_1ccc_c100_0000, where %cccc = !CogNumber.

A "JMP #$000" executes to run the 16-instruction debugger program that was just loaded into registers $000 to $00F.

Your 16-instruction debugger program will likely want to determine if this debug interrupt was due to a COGINIT, in which case the debugger will probably want to note that a new program is now running in this cog. Depending on what the debugger must do next, it is likely that it will need to save more registers to the upper hub RAM and then load in more code from the upper hub RAM to facilitate more complex operations than the initial 16-instruction ISR can achieve. The ISR may then need to perform some communication between itself and a host system which may be serving as the debugger's user interface. It may be necessary to employ a LOCK to time-share P2-to-host communication channels among cogs, likely on P63 (serial Rx) and P62 (serial Tx). This scenario is somewhat hypothetical, but illustrates the design intent behind the debug interrupt mechanism.

When your debug ISR is complete, you can do a 'JMP #$1FD' to execute the debug-ISR-exit routine which does the following:

Original cog registers $000 to $00F are restored from hub RAM starting at ($FF800 + !CogNumber << 7), or %1111_1111_1ccc_c000_0000, where %cccc = !CogNumber.

A "RETI0" executes to return to the interrupted cog program.

Here is a table of the hub RAM locations used by each cog for register save/restore and ISR images during the debug interrupt when the register ROM routines are used for ISR entry and exit:

Cog	Save/Restore in Hub RAM for Registers $000..$00F	ISR image in Hub RAM for Registers $000..$00F
7	$FFC00..$FFC3F	$FFC40..$FFC7F
6	$FFC80..$FFCBF	$FFCC0..$FFCFF
5	$FFD00..$FFD3F	$FFD40..$FFD7F
4	$FFD80..$FFDBF	$FFDC0..$FFDFF
3	$FFE00..$FFE3F	$FFE40..$FFE7F
2	$FFE80..$FFEBF	$FFEC0..$FFEFF
1	$FFF00..$FFF3F	$FFF40..$FFF7F
0	$FFF80..$FFFBF	$FFFC0..$FFFFF

Though the first debug interrupt upon cog (re)start will always use the debug-ISR-entry routine at $1F8, you may redirect IJMP0 during any debug ISR to point elsewhere for use by subsequent debug interrupts. This would mean that you would lose the initial register-saving function provided by the small ROM starting at $1F8, so you would have to use some cog registers for debugger-state storage that don't interfere with the cog program that is being debugged. If no register saving/restoring or host communications are required, your debug ISR may execute very quickly.

What terminates a debug interrupt is not only RETI0 (CALLD INB,INB WCZ), but any D-register variant (CALLD anyreg,INB WCZ). For example RESI0 (CALLD INA,INB WCZ) may be used to resume next time from where this debug ISR left off, but this would imply that you are not using the debug-ISR-entry and -exit routines in the cog-register ROM and have, instead, permanently located debugger code into some cog registers, so that your debugger program is already present at the start of the debug interrupt.

This debug interrupt scheme was designed to operate stealthily, without any cooperation from the cog program being debugged. All control has been placed within the debug ISR. This isolation from normal programming is intended to prevent, or at least discourage, programmers from making any aspect of the debug interrupt system part of their application, thereby rendering the debug interrupt compromised as a standard debugging mechanism. Also, by executing the ISR strictly in cog register space, this scheme does not interfere with the hub FIFO state, which would be impossible to reconstruct if disturbed by hub execution within the debug ISR.

Below are the instructions which are used in the debugging mechanism:

BRK D/#

During normal program execution, the BRK instruction is used to generate a debug interrupt with an 8-bit code which can be read within the debug ISR. The BRK instruction interrupt must be enabled from within a prior debug ISR for this to work. Regardless of the execution condition, the BRK instruction will trigger a debug interrupt, if enabled. The execution condition only gates the writing of the 8-bit code:

D/# = %BBBBBBBB: 8-bit BRK code

During a debug ISR, the BRK instruction operates differently and is used to establish the next debug interrupt condition(s). It is also used to select INA/INB, instead of the IJMP0/IRET0 registers exposed during the ISR, so that the pins' inputs states may be read:

D/# = %aaaaaaaaaaaaaaaaeeee_LKJIHGFEDCBA

%aaaaaaaaaaaaaaaaeeee: 20-bit breakpoint address or 4-bit event code (%eeee)

%L: 1 = map INA/INB normally, 0 = map IJMP0/IRET0 at INA/INB (default during ISR) *

%K: 1 = enable interrupt on breakpoint address match

%J: 1 = enable interrupt on event %eeee

%I: 1 = enable interrupt on asynchronous breakpoint (via COGBRK on another cog)

%H: 1 = enable interrupt on INT3 ISR entry

%G: 1 = enable interrupt on INT2 ISR entry

%F: 1 = enable interrupt on INT1 ISR entry

%E: 1 = enable interrupt on BRK instruction

%D: 1 = enable interrupt on INT3 ISR code (single step)

%C: 1 = enable interrupt on INT2 ISR code (single step)

%B: 1 = enable interrupt on INT1 ISR code (single step)

%A: 1 = enable interrupt on non-ISR code (single step)

* If set to 1 by the debug ISR, %L must be reset to 0 before exiting the debug ISR, so

that the RETI0 instruction is able to see IJMP0 and IRET0.

On debug ISR entry, bits L to A are cleared to '0'. If a subsequent debug interrupt is desired, a BRK instruction must be executed before exiting the debug ISR, in order to establish the next breakpoint condition(s).

COGBRK D/#

The COGBRK instruction can trigger an asynchronous breakpoint in another cog. For this to work, the cog executing the COGBRK instruction must be in its own debug ISR and the other cog must have its asynchronous breakpoint interrupt enabled:

D/# = %CCCC: the cog in which to trigger an asynchronous breakpoint

GETBRK D WCZ

During normal program execution, GETBRK with WCZ returns various data about the cog's internal status:

C = 1 if STALLI mode or 0 if ALLOWI mode (established by STALLI/ALLOWI)

Z = 1 if cog started in hubexec or 0 if cog started in cogexec

D[31:23] = 0

D[22] = 1 if colorspace converter is active

D[21] = 1 if streamer is active

D[20] = 1 if WRFAST mode or 0 if RDFAST mode

D[19:16] = INT3 selector, established by SETINT3

D[15:12] = INT2 selector, established by SETINT2

D[11:08] = INT1 selector, established by SETINT1

D[07:06] = INT3 state: %0x = idle, %10 = interrupt pending, %11 = ISR executing

D[05:04] = INT2 state: %0x = idle, %10 = interrupt pending, %11 = ISR executing

D[03:02] = INT1 state: %0x = idle, %10 = interrupt pending, %11 = ISR executing

D[01] = 1 if STALLI mode or 0 if ALLOWI mode (established by STALLI/ALLOWI)

D[00] = 1 if cog started in hubexec or 0 if cog started in cogexec

During a debug ISR, GETBRK with WCZ returns additional data that is useful to a debugger:

C = 1 if debug interrupt was from a COGINIT, indicating that the cog was (re)started

D[31:24] = 8-bit break code from the last 'BRK #/D' during normal execution

D[23] = 1 if debug interrupt was from a COGINIT, indicating that the cog was (re)started

GETBRK D WC

GETBRK with WC always returns the following:

C = LSB of SKIP/SKIPF/EXECF/XBYTE pattern

D[31:28] = 4-bit CALL depth since SKIP/SKIPF/EXECF/XBYTE (skipping suspended if not %0000)

D[27] = 1 if SKIP mode or 0 if SKIPF/EXECF/XBYTE mode

D[26] = 1 if LUT sharing enabled (established by SETLUTS)

D[25] = 1 if top of stack = $001FF, indicating XBYTE will execute on next _RET_/RET

D[24:16] = 9-bit XBYTE mode, established by '_RET_ SETQ/SETQ2' when top of stack = $001FF

D[15:00] = 16 event-trap flags

D[15] = GETQX/GETQY executed without prior CORDIC command

D[14] = attention requested by cog(s)

D[13] = streamer read location $1FF of lookup RAM

D[12] = streamer NCO rolled over

D[11] = streamer finished, now idle

D[10] = streamer ready to accept new command

D[09] = hub RAM FIFO interface loaded block count and start address

D[08] = pin pattern match occurred

D[07] = SE4 event occurred

D[06] = SE3 event occurred

D[05] = SE2 event occurred

D[04] = SE1 event occurred

D[03] = CT-passed-CT1

D[02] = CT-passed-CT2

D[01] = CT-passed-CT3

D[00] = INT1, INT2, or INT3 occurred

GETBRK D WZ

GETBRK with WZ always returns the following:

Z = 1 if no SKIP/SKIPF/EXECF/XBYTE pattern queued (D = 0) or 1 if pattern queued (D <> 0)

D = 32-bit SKIP/SKIPF/EXECF/XBYTE pattern, used LSB-first to skip instructions in main code

HUB

Configuration

The hub contains several global circuits which are configured using the HUBSET instruction. HUBSET uses a single D operand to both select the circuit to be configured and to provide the configuration data:

HUBSET {#}D - Configure global circuit selected by MSBs

%0000_xxxE_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS Set clock generator mode

%0001_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx Hard reset, reboots chip

%0010_xxxx_xxxx_xxLW_DDDD_DDDD_DDDD_DDDD Set write-protect and debug enables

%0100_xxxx_xxxx_xxxx_xxxx_xxxR_RLLT_TTTT Set filter R to length L and tap T

%1DDD_DDDD_DDDD_DDDD_DDDD_DDDD_DDDD_DDDD Seed Xoroshiro128** PRNG with D

Configuring the Clock Generator

The Prop2 can generate its system clock in several different ways.

There are two separate internal RC clock oscillators that can be used, a 20MHz+ (RCFAST) and a ~20kHz (RCSLOW). The 20MHz+ oscillator is designed to always run at least 20MHz, worst-case, in order to accommodate 2M baud serial loading during boot. The ~20kHz oscillator is intended for low-power operation.

The XI and XO pins can also be used for clocking, with XI being an input and XO being a crystal-feedback output for 10MHz-20MHz crystals. Internal loading caps can also be enabled on XI and XO for crystal impedance matching.

If the XI pin is used as a clock input or crystal oscillator input, its frequency can be modified through an internal phase-locked loop (PLL). The PLL divides the XI pin frequency from 1 to 64, then multiplies the resulting frequency from 1 to 1024 in the VCO. The VCO frequency can be used directly, or divided by 2, 4, 6, ...30, to get the final PLL clock frequency which can be used as the system clock.

The clock configuration setting consists of 25 bits. The four LSBs are all that are needed to switch among clock sources and select all but the PLL settings.

HUBSET ##%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS 'set clock mode

The tables below explain the various bit fields within the HUBSET operand:

PLL Setting	Value	Effect	Notes
%E	0/1	PLL off/on	XI input must be enabled by %CC. Allow 10ms for crystal+PLL to stabilize before switching over to PLL clock source.
%DDDDDD	0..63	1..64 division of XI pin frequency	This divided XI frequency feeds into the phase-frequency comparator's 'reference' input.
%MMMMMMMMMM	0..1023	1..1024 division of VCO frequency	This divided VCO frequency feeds into the phase-frequency comparator's 'feedback' input. This frequency division has the effect of multiplying the divided XI frequency (per %DDDDDD) inside the VCO. The VCO frequency should be kept within 100 MHz to 200 Mhz.
%PPPP	0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	VCO / 2 VCO / 4 VCO / 6 VCO / 8 VCO / 10 VCO / 12 VCO / 14 VCO / 16 VCO / 18 VCO / 20 VCO / 22 VCO / 24 VCO / 26 VCO / 28 VCO / 30 VCO / 1	This divided VCO frequency is selectable as the system clock when SS = %11. For fastest overclocking, the PLL can be pushed to 350 MHz using the 'VCO / 1' mode (%PPPP = 15).

%CC	XI status	XO status	XI / XO impedance	XI / XO loading caps
%00	ignored	float	Hi-Z	OFF
%01	input	600-ohm drive	1M-ohm	OFF
%10	input	600-ohm drive	1M-ohm	15pF per pin
%11	input	600-ohm drive	1M-ohm	30pF per pin

%SS	Clock Source	Notes
%11	PLL	CC != %00 and E=1, allow 10ms for crystal+PLL to stabilize before switching to PLL
%10	XI	CC != %00, allow 5ms for crystal to stabilize before switching to XI pin
%01	RCSLOW	~20 kHz, can be switched to at any time, low-power
%00	RCFAST	20 MHz+, can be switched to at any time, used on boot-up.

WARNING: Incorrectly switching away from the PLL setting (%SS = %11 and %CC <> %00) with %PPPP = %1111 can cause a clock glitch which will hang the P2 chip until a reset occurs. In order to safely switch away, always start by switching to an internal RC oscillator (%SS = %00 or %01), while maintaining the %PPPP = %1111 and %CC settings.

PLL Example

The PLL's VCO is designed to run between 100 MHz and 200 MHz and should be kept within that range.

Let's say you have a 20 MHz crystal attached to XI and XO and you want to run the Prop2 at 148.5 MHz. You could divide the crystal by 40 (%DDDDDD = 39) to get a 500 kHz reference, then multiply that by 297 (%MMMMMMMMMM = 296) in the VCO to get 148.5 MHz. You would set %PPPP to %1111 to use the VCO output directly. The configuration value would be %1_100111_0100101000_1111_10_11. The last two 2-bit fields select 15pf crystal mode and the PLL. In order to realize this clock setting, though, it must be done over a few steps:

HUBSET #$F0 'set 20 MHz+ (RCFAST) mode

HUBSET ##%1_100111_0100101000_1111_10_00 'enable crystal+PLL, stay in RCFAST mode

WAITX ##20_000_000/100 'wait ~10ms for crystal+PLL to stabilize

HUBSET ##%1_100111_0100101000_1111_10_11 'now switch to PLL running at 148.5 MHz

The clock selector controlled by the %SS bits has a deglitching circuit which waits for a positive edge on the old clock source before disengaging, holding its output high, and then waiting for a positive edge on the new clock source before switching over to it. It is necessary to select mode %00 or %01 while waiting for the crystal and/or PLL to settle into operation, before switching over to either.

Write-Protecting the Last 16KB of Hub RAM and Enabling Debug Interrupts

HUBSET {#}D 'set write-protect and enable debug interrupts

{#}D = %0010_xxxx_xxxx_xxLW_DDDD_DDDD_DDDD_DDDD

%L: Lock W and D bit settings until next reset

0 = establish W and D bit settings and allow subsequent modification

1 = establish W and D bit settings and disallow subsequent modification

%W: Write-protect last 16KB of hub RAM

0 = Last 16KB of hub RAM can be read and written at both its normal range

and at $FC000..$FFFFF (default)

1 = Last 16KB of hub RAM disappears from its normal range and is write-

protected at $FC000..$FFFFF, except from within debug ISR's

%D: Debug interrupt enables for cogs 15..0, respectively

0 = Debug interrupt is disabled for cog n (default)

1 = Debug interrupt is enabled for cog n

Examples:

HUBSET ##$2000_0001 'enable debug interrupt for cog 0

HUBSET ##$2001_FFFF 'enable debug interrupts for cogs 15..0

'..and write-protect the last 16KB of hub RAM

HUBSET ##$2003_00FF 'enable debug interrupts for cogs 7..0

'..and write-protect the last 16KB of hub RAM

'..and disallow subsequent changes to this scheme

See the DEBUG INTERRUPT section to learn how debug interrupts work.

Configuring the Digital Filters for Smart Pins

There are four global digital filter settings which can be used by each smart pin to low-pass filter its incoming pin states.

Each filter setting includes a filter length and a timing tap. The filter length is 2, 3, 5, or 8 flipflops, selected by values 0..3. The flipflops shift pin state data at the timing tap rate and must be unanimously high or low to change the filter output to high or low. The timing tap is one of the lower 32 bits of CT (the free-running 64-bit global counter), selected by values 0..31. Each time the selected tap transitions, the current pin state is shifted into the flipflops and if the flipflops are all in agreement, the filter output goes to that state. The filter will be reflected in the INA/INB bits if no smart pin mode is selected, or the filter states will be used by the smart pin mode as its inputs.

The D operand selects both the filter to configure and the data to configure it with:

HUBSET ##$4000_0000 + Length<<5 + Tap 'set filt0

HUBSET ##$4000_0080 + Length<<5 + Tap 'set filt1

HUBSET ##$4000_0100 + Length<<5 + Tap 'set filt2

HUBSET ##$4000_0180 + Length<<5 + Tap 'set filt3

"Length" is 0..3 for 2, 3, 5, or 8 flipflops.

"Tap" is 0..31 for every single clock, every 2nd clock, every 4th clock,... every 2,147,483,648th clock.

The filters are set to the following defaults on reset:

Filter

Tap

(clocks per sample)

Length

(flipflops)

Low-pass time

(at 6.25ns/clock)

filt0

(1:1)

%00

(2 flipflops)

6.25ns * 1 * 2 =

12.5ns

filt1

(32:1)

%01

(3 flipflops)

6.25ns * 32 * 3 =

600ns

filt2

(512K:1)

%10

(5 flipflops)

6.25ns * 512K * 5 =

16.4ms

filt3

(4M:1)

%11

(8 flipflops)

6.25ns * 4M * 8 =

210ms

Seeding the Xoroshiro128** PRNG

To seed 32 bits of state data into the 128-bit PRNG, use HUBSET with the MSB of D set. This will write {1'b1, D[30:0]} into 32 bits of the PRNG, affecting 1/4th of its total state. The 1'b1 bit ensures that the overall state will not go to zero. Because the PRNG's 128 state bits rotate, shift, and XOR against each other, they are thoroughly spread around within a few clocks, so seeding from a fixed set of 32 bits should not pose a limitation on seeding quality.

After reset, the boot ROM uses HUBSET to seed the Xoroshiro128** PRNG fifty times, each time with 31 bits of thermal noise gleaned from pin 63 while in ADC calibration mode. This establishes a very random seed which the PRNG iterates from, thereafter. There is no need to do this again, but here is how you would do it if 'x' contained a seed value:

SETB x,#31 'set the MSB of x to make a PRNG seed command

HUBSET x 'seed 32 bits of the Xoroshiro128** state

The Xoroshiro128** PRNG iterates on every clock, generating 64 fresh bits which get spread among all cogs and smart pins. Each cog receives a unique set of 32 different bits, in a scrambled arrangement with some bits inverted, from the 64-bit pool. Each smart pin receives a similarly-unique set of 8 different bits. Cogs can sample these bits using the GETRND instruction and directly apply them using the BITRND and DRVRND instructions. Smart pins utilize their 8 bits as noise sources for DAC dithering and noise output.

Rebooting the Chip

HUBSET can be used to reset and reboot the chip:

HUBSET ##$1000_0000 'generate an internal reset pulse to reboot

HUB RAM

The globally-accessible hub RAM can be read and written as bytes, words, and longs, in little-endian format. Hub addresses are always byte-oriented. There are no special alignment rules for words and longs in hub RAM. Cogs can read and write bytes, words, and longs at any hub address, as well as execute instruction longs from any hub address starting at $400 (see COGS > INSTRUCTION MODES > HUB EXECUTION).

On hub RAM implementations of less than the full 1MB, the last 16KB of hub RAM is normally addressable at both its normal address range, as well as at $FC000..$FFFFF. This provides a stable address space for the 16KB of internal ROM which gets cached into the last 16KB of hub RAM on startup. This upper 16KB mapping is also used by the cog debugging scheme.

The last 16KB of RAM can be hidden from its normal address range and made read-only at $FC000..$FFFFF. This is useful for making the last 16KB of RAM persistent, like ROM. It is also how debugging is realized, as the RAM mapped to $FC000..$FFFFF can still be written to from within debug interrupt service routines, permitting the otherwise-protected RAM to be used as debugger-application space and cog-register swap buffers for debug interrupts.

See the HUBSET instruction definition for setting up write-protection.

Here are the hub memory maps for the various FPGA boards currently being supported during development. The "W" column represents write-protection status, set by HUBSET, for the last 16KB of hub RAM:

FPGA Board

Hub RAM

Cogs/
Slices

Lower RAM

Gap (reads $00)

Top 16KB RAM

DE0-Nano

32KB

$00000..$07FFF

$00000..$03FFF

$08000..$FBFFF

$04000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

BeMicro-A2

128KB

$00000..$1FFFF

$00000..$1BFFF

$20000..$FBFFF

$1C000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

DE2-115

256KB

$00000..$3FFFF

$00000..$3BFFF

$40000..$FBFFF

$3C000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

Prop123-A7

512KB

$00000..$7FFFF

$00000..$7BFFF

$80000..$FBFFF

$7C000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

Prop123-A9

BeMicro-A9

512KB

$00000..$7FFFF

$00000..$7BFFF

$80000..$FBFFF

$7C000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

Prop123-A9

BeMicro-A9

1024KB

$00000..$FFFFF

none, full map

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

P2X8C4M64PES

512KB

$00000..$7FFFF

$00000..$7BFFF

$80000..$FBFFF

$7C000..$FBFFF

$FC000..$FFFFF, R/W

$FC000..$FFFFF, Read

THE COG -to- HUB RAM INTERFACE

Hub RAM is comprised of 32-bit-wide single-port RAMs with byte-level write controls. For each cog, there is one of these RAMs, but it is multiplexed among all cogs. Let's call these separate RAMs "slices". Each RAM slice holds every single/2nd/4th/8th/16th (depending on number of cogs) set of 4 bytes in the composite hub RAM. At every clock, each cog can access the "next" RAM slice, allowing for continuously-ascending bidirectional streaming of 32 bits per clock between the composite hub RAM and each cog.

When a cog wants to read or write the hub RAM, it must wait up to #cogs-1 clocks to access the initial RAM slice of interest. Once that occurs, subsequent slices can be accessed on every clock, thereafter, for continuous reading or writing of 32-bit longs.

To smooth out data flow for less than 32-bits-per-clock between hub RAM and the cog, each cog has a hub FIFO interface which can be set for hub-RAM-read or hub-RAM-write operation. This FIFO interface allows hub RAM to be either sequentially read or sequentially written in any combination of bytes, words, or longs, at any rate, up to one long per clock. No matter the transfer frequency or the word size, the FIFO will ensure that the cog's reads or writes are all properly conducted from or to the composite hub RAM.

Cogs can access hub RAM either via the sequential FIFO interface, or by waiting for RAM slices of interest, while yielding to the FIFO. If the FIFO is not busy, which is soon the case if data is not being read from or written to it, random accesses will have full opportunity to access the composite hub RAM.

There are three ways the hub FIFO interface can be used, and it can only be used for one of these at a time:

Hub execution (when the PC is $00400..$FFFFF)
Streamer usage (background transfers from hub RAM → pins/DACs, or from pins/ADCs → hub RAM)
Software usage (fast sequential-reading or sequential-writing instructions)

For streamer or software usage, FIFO operation must be established by a RDFAST or WRFAST instruction executed from cog RAM (register/lookup, $00000..$003FF). After that, and while remaining in cog RAM, the streamer can be enabled to begin moving data in the background, or the two-clock RFxxxx/WFxxxx instructions can be used to manually read and write sequential data.

The FIFO contains (cogs+11) stages. When in read mode, the FIFO loads continuously whenever less than (cogs+7) stages are filled, after which point, up to 5 more longs may stream in, potentially filling all (cogs+11) stages. These metrics ensure that the FIFO never underflows, under all potential reading scenarios.

FAST SEQUENTIAL FIFO INTERFACE

To configure the hub FIFO interface for streamer or software usage, use the RDFAST and WRFAST instructions. These instructions establish read or write operation, the hub start address, and the block count. The block count determines how many 64-byte blocks will be read or written before wrapping to the original start address and reloading the original block count. If you intend to use wrapping, your hub start address must be long-aligned^[p] (address ends in %00), since there won't be an extra cycle in which to read/write a portion of a long in an extra hub RAM slice. In cases where you don't want wrapping, just use 0 for the block count, so that wrapping won't occur until the entire 1MB hub map is sequenced through.

The FBLOCK instruction provides a way to set a new start address and a new 64-byte block count for when the current blocks are fully read or written and the FIFO interface would have otherwise wrapped back to the prior start address and reloaded the prior block count. FBLOCK can be executed after RDFAST, WRFAST, or a FIFO block wrap event. Coordinating FBLOCK instructions with streamer-FIFO activity enables dynamic and seamless streaming between hub RAM and pins/DACs.

Here are the RDFAST, WRFAST, and FBLOCK instructions:

EEEE 1100011 1LI DDDDDDDDD SSSSSSSSS RDFAST D/#,S/#

EEEE 1100100 0LI DDDDDDDDD SSSSSSSSS WRFAST D/#,S/#

EEEE 1100100 1LI DDDDDDDDD SSSSSSSSS FBLOCK D/#,S/#

For these instructions, the D/# operand provides the block count, while the S/# operand provides the hub RAM start address:

D/# %xxxx_xxxx_xxxx_xxxx_xx00_0000_0000_0000 = block count for limited r/w

%xxxx_xxxx_xxxx_xxxx_xxBB_BBBB_BBBB_BBBB = block count for wrapping

S/# %xxxx_xxxx_xxxx_AAAA_AAAA_AAAA_AAAA_AAAA = start address for limited r/w

%xxxx_xxxx_xxxx_AAAA_AAAA_AAAA_AAAA_AA00 = start address for wrapping (long-aligned)

RDFAST and WRFAST each have two modes of operation.

If D[31] = 0, RDFAST/WRFAST will wait for any previous WRFAST to finish and then reconfigure the hub FIFO interface for reading or writing. In the case of RDFAST, it will additionally wait until the FIFO has begun receiving hub data, so that it can start being used in the next instruction.

If D[31] = 1, RDFAST/WRFAST will not wait for FIFO reconfiguration, taking only two clocks. In this case, your code must allow a sufficient number of clocks before any attempt is made to read or write FIFO data.

FBLOCK doesn't need to wait for anything, so it always takes two clocks.

Once RDFAST has been used to configure the hub FIFO interface for reading, you can enable the streamer for any hub-reading modes or use the following instructions^[q] to manually read sequential data from the hub:

EEEE 1101011 CZ0 DDDDDDDDD 000010000 RFBYTE D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010001 RFWORD D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010010 RFLONG D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010011 RFVAR D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010100 RFVARS D {WC/WZ/WCZ}

These instructions all take 2 clocks and read bytes, words, longs, and variable-length data from the hub into D, via the hub FIFO interface.

If WC is expressed, the MSB of the byte, word, long, or variable-length data will be written to C.

If WZ is expressed, Z will be set if the data read from the hub equals zero, otherwise Z will be cleared.

RFVAR and RFVARS read 1..4 bytes of data, depending upon the MSB of the first byte, and then subsequent bytes, waiting in the FIFO. While RFVAR returns zero-extended data, RFVARS returns sign-extended data. This mechanism is intended to provide a fast and memory-efficient means for bytecode interpreters to read numerical constants and offset addresses that were assembled at compile-time for efficient reading during run-time.

This table shows the relationship between upcoming bytes in the FIFO and what RFVAR and RFVARS will return:

FIFO 1st Byte	FIFO 2nd Byte	FIFO 3rd Byte	FIFO 4th Byte	RFVAR Returns RFVARS Returns
%0SAAAAAA	-	-	-	%00000000_00000000_00000000_0SAAAAAA %SSSSSSSS_SSSSSSSS_SSSSSSSS_SSAAAAAA
%1AAAAAAA	%0SBBBBBB	-	-	%00000000_00000000_00SBBBBB_BAAAAAAA %SSSSSSSS_SSSSSSSS_SSSBBBBB_BAAAAAAA
%1AAAAAAA	%1BBBBBBB	%0SCCCCCC	-	%00000000_000SCCCC_CCBBBBBB_BAAAAAAA %SSSSSSSS_SSSSCCCC_CCBBBBBB_BAAAAAAA
%1AAAAAAA	%1BBBBBBB	%1CCCCCCC	%SDDDDDDD	%000SDDDD_DDDCCCCC_CCBBBBBB_BAAAAAAA %SSSSDDDD_DDDCCCCC_CCBBBBBB_BAAAAAAA

Once WRFAST has been used to configure the hub FIFO interface for writing, you can enable the streamer for any hub-writing modes or use the following instructions to manually write sequential data:

EEEE 1101011 00L DDDDDDDDD 000010101 WFBYTE D/#

EEEE 1101011 00L DDDDDDDDD 000010110 WFWORD D/#

EEEE 1101011 00L DDDDDDDDD 000010111 WFLONG D/#

These instructions all take 2 clocks and write byte, word, or long data in D into the hub via the hub FIFO interface.

If a cog has been writing to the hub via WRFAST, and it wants to immediately COGSTOP itself, a 'WAITX #20' should be executed first, in order to allow time for any lingering FIFO data to be written to the hub.

RANDOM ACCESS INTERFACE

Here are the random-access hub RAM read instructions:

EEEE 1010110 CZI DDDDDDDDD SSSSSSSSS RDBYTE D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1010111 CZI DDDDDDDDD SSSSSSSSS RDWORD D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1011000 CZI DDDDDDDDD SSSSSSSSS RDLONG D,S/#/PTRx {WC/WZ/WCZ}

For these instructions, the D operand is the register which will receive the data read from the hub.

The S/#/PTRx operand supplies the hub address to read from.

If WC is expressed, the MSB of the byte, word, or long read from the hub will be written to C.

If WZ is expressed, Z will be set if the data read from the hub equaled zero, otherwise Z will be cleared.

Here are the random-access hub RAM write instructions:

EEEE 1100010 0LI DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/#/PTRx

EEEE 1100010 1LI DDDDDDDDD SSSSSSSSS WRWORD D/#,S/#/PTRx

EEEE 1100011 0LI DDDDDDDDD SSSSSSSSS WRLONG D/#,S/#/PTRx

EEEE 1010011 11I DDDDDDDDD SSSSSSSSS WMLONG D,S/#/PTRx

For these instructions, the D/# operand supplies the data to be written to the hub.

The S/#/PTRx operand supplies the hub address to write to.

WMLONG writes longs, like WRLONG; however, it does not write any D byte fields whose data are $00. This is intended for things like sprite overlays, where $00 byte data represent transparent pixels.

In the case of the 'S/#/PTRx' operand used by RDBYTE, RDWORD, RDLONG, WRBYTE, WRWORD, WRLONG, and WMLONG^[r], there are five ways to express a hub address:

$000..$1FF - register whose 20 LSBs will be used as the hub address

#$00..$FF - 8-bit immediate hub address

^[s]

##$00000..$FFFFF - 20-bit immediate hub address (invokes AUGS)

PTRx {[index5]} - PTR expression with a 5-bit scaled index

PTRx {[##index20]} - PTR expression with a 20-bit unscaled index (invokes AUGS)

If AUGS is used to augment the #S value to 32 bits, the #S value will be interpreted differently:

#%0AAAAAAAA - No AUGS, 8-bit immediate address

#%1SUPNNNNN - No AUGS, PTR expression with a 5-bit scaled index

##%000000000000AAAAAAAAAAA_AAAAAAAAA - AUGS, 20-bit immediate address

##%000000001SUPNNNNNNNNNNN_NNNNNNNNN - AUGS, PTR expression with a 20-bit unscaled index

PTRx expressions without AUGS:

INDEX6 = -32..+31 for non-updating offsets

INDEX = 1..16 for ++'s and --'s

SCALE = 1 for RDBYTE/WRBYTE, 2 for RDWORD/WRWORD, 4 for RDLONG/WRLONG/WMLONG

S = 0 for PTRA, 1 for PTRB

U = 0 to keep PTRx same, 1 to update PTRx (PTRx += INDEX*SCALE)

P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)

IIIIII = INDEX6, uses %100000..%111111 for -32..-1 and %000000..%011111 for 0..31

NNNNN = INDEX, uses %00001..%01111 for 1..15 and %00000 for 16

nnnnn = -INDEX, uses %10000..%11111 for -16..-1

1SUPNNNNN PTR expression

------------------------------------------------------------------------------

100000000 PTRA 'use PTRA

110000000 PTRB 'use PTRB

100IIIIII PTRA[INDEX6] 'use PTRA + INDEX6*SCALE

110IIIIII PTRB[INDEX6] 'use PTRB + INDEX6*SCALE

101100001 PTRA++ 'use PTRA, PTRA += SCALE

111100001 PTRB++ 'use PTRB, PTRB += SCALE

101111111 PTRA-- 'use PTRA, PTRA -= SCALE

111111111 PTRB-- 'use PTRB, PTRB -= SCALE

101000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE

111000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE

101011111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE

111011111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE

1011NNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE

1111NNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE

1011nnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE

1111nnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE

1010NNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE

1110NNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE

1010nnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE

1110nnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE

Examples:

Read byte at PTRA into D

1111 1010110 001 DDDDDDDDD 100000000 RDBYTE D,PTRA

Write lower word in D to PTRB - 7*2

1111 1100010 101 DDDDDDDDD 110111001 WRWORD D,PTRB[-7]

Write long value 10 at PTRB, PTRB += 1*4

1111 1100011 011 000001010 111100001 WRLONG #10,PTRB++

Read word at PTRA into D, PTRA -= 1*2

1111 1010111 001 DDDDDDDDD 101111111 RDWORD D,PTRA--

Write lower byte in D at PTRA - 1*1, PTRA -= 1*1

1111 1100010 001 DDDDDDDDD 101011111 WRBYTE D,--PTRA

Read long at PTRB + 10*4 into D, PTRB += 10*4

1111 1011000 001 DDDDDDDDD 111001010 RDLONG D,++PTRB[10]

Write lower byte in D to PTRA, PTRA += 15*1

1111 1100010 001 DDDDDDDDD 101101111 WRBYTE D,PTRA++[15]

Read word at PTRB into D, PTRB += 16*2

1111 1010111 001 DDDDDDDDD 111100000 RDWORD D,PTRB++[16]

PTRx expressions with AUGS:

If "##" is used before the index value in a PTRx expression, the assembler will automatically insert an AUGS instruction and assemble the 20-bit index instruction pair:

RDBYTE D,++PTRB[##$12345]

...becomes...

1111 1111000 000 000111000 010010001 AUGS #$00E12345

1111 1010110 001 DDDDDDDDD 101000101 RDBYTE D,#$00E12345 & $1FF

FAST BLOCK MOVES

By preceding RDLONG with either SETQ or SETQ2, multiple hub RAM longs can be read into either cog register RAM or cog lookup RAM. This transfer happens at the rate of one long per clock, assuming the hub FIFO interface is not accessing the same hub RAM slice as RDLONG, on the same cycle, in which case the FIFO gets priority access and the block move must wait for the hub RAM slice to come around again. If WC/WZ/WCZ are used with RDLONG, the flags will be set according to the last long read in the sequence.

Use SETQ+RDLONG to read multiple hub longs into cog register RAM:

SETQ #x 'x = number of longs, minus 1, to read

RDLONG first_reg,S/#/PTRx 'read x+1 longs starting at first_reg

Use SETQ2+RDLONG to read multiple hub longs into cog lookup RAM:

SETQ2 #x 'x = number of longs, minus 1, to read

RDLONG first_lut,S/#/PTRx 'read x+1 longs starting at first_lut

Similarly, WRLONG and WMLONG can be preceded by either SETQ or SETQ2 to write either multiple register RAM longs or lookup RAM longs into hub RAM. When WRLONG/WMLONG‘s D field is an immediate, it instead writes that immediate value to RAM, functioning as a memory filler.

Use SETQ+WRLONG/WMLONG to write multiple register RAM longs into hub RAM:

SETQ #x 'x = number of longs, minus 1, to write

WRLONG first_reg,S/#/PTRx 'write x+1 longs starting at first_reg

RAM registers $1F8..$1FF are special-purpose registers which cannot be transferred to hub RAM via SETQ+WRLONG/WMLONG.

Use SETQ2+WRLONG/WMLONG to write multiple lookup RAM longs into hub RAM:

SETQ2 #x 'x = number of longs, minus 1, to write

WRLONG first_lut,S/#/PTRx 'write x+1 longs starting at first_lut

For fast block moves, PTRx expressions cannot have arbitrary index values, since the index will be overridden with the number of longs, with bit 4 of the encoded index value serving as the ++/-- indicator. In plain PTRA/PTRB cases, the index will be overridden with zero:

SETQ #x 'x = number of longs, minus 1

RDLONG first_reg,PTRA 'read x+1 longs from PTRA

SETQ #x 'x = number of longs, minus 1

RDLONG first_reg,PTRA++ 'read x+1 longs from PTRA, PTRA += (x+1)*4

SETQ #x 'x = number of longs, minus 1

RDLONG first_reg,PTRA-- 'read x+1 longs from PTRA, PTRA -= (x+1)*4

SETQ #x 'x = number of longs, minus 1

RDLONG first_reg,++PTRA 'read x+1 longs from PTRA+(x+1)*4, PTRA += (x+1)*4

SETQ #x 'x = number of longs, minus 1

RDLONG first_reg,--PTRA 'read x+1 longs from PTRA-(x+1)*4, PTRA -= (x+1)*4

Because these fast block moves yield to the hub FIFO interface, they can be used during hub execution.

CORDIC Solver

In the hub, there is a 54-stage pipelined CORDIC solver that can compute the following functions for all cogs:

32 x 32 unsigned multiply with 64-bit product
64 / 32 unsigned divide with 32-bit quotient and 32-bit remainder
Square root of 64-bit unsigned value with 32-bit result
32-bit signed (X,Y) rotation around (0,0) by a 32-bit angle with 32-bit signed (X,Y) results
32-bit signed (X,Y) to 32-bit (length,angle) - cartesian to polar
32-bit (length,angle) to 32-bit signed (X,Y) - polar to cartesian
32-bit unsigned integer to 5:27-bit logarithm
5:27-bit logarithm to 32-bit unsigned integer

When a cog issues a CORDIC instruction, it must wait for its hub slot, which is zero to (cogs-1) clocks away, in order to hand off the command to the CORDIC solver. Fifty-five clocks later, results will be available via the GETQX and GETQY instructions, which will wait for the results, in case they haven't arrived yet.

MULTIPLY

To multiply two unsigned 32-bit numbers together, use the QMUL instruction (CORDIC instructions wait for the hub slot):

QMUL D/#,S/# - Multiply D by S

To get the results (these instructions wait for the CORDIC results):

GETQX lower_long

GETQY upper_long

DIVIDE

For convenience, two different divide instructions exist, each with an optional SETQ prefix instruction which establishes a non-0 value for one 32-bit part of the 64-bit numerator:

QDIV D/#,S,# - Divide {$00000000:D} by S

...or...

SETQ Q/# - Set top part of numerator

QDIV D/#,S,# - Divide {Q:D} by S

...or...

QFRAC D/#,S,# - Divide {D:$00000000} by S

...or...

SETQ Q/# - Set bottom part of numerator

QFRAC D/#,S,# - Divide {D:Q} by S

To get the results:

GETQX quotient

GETQY remainder

SQUARE ROOT

To get the square root of a 64-bit integer:

QSQRT D/#,S,# - Compute square root of {S:D}

To get the result:

GETQX root

(X,Y) ROTATION

The rotation function inputs three terms: 32-bit signed X and Y values, and an unsigned 32-bit angle, where $00000000..$FFFFFFFF = 0..359.9999999 degrees. The Y term, if non-zero, is supplied via an optional SETQ prefix instruction:

SETQ Q/# - Set Y

QROTATE D/#,S,# - Rotate (D,Q) by S

...or...

QROTATE D/#,S,# - Rotate (D,$00000000) by S

Notice that in the second example, a polar-to-cartesian conversion is taking place.

To get the results:

GETQX X

GETQY Y

(X,Y) VECTORING

The vectoring function converts (X,Y) cartesian coordinates into (length,angle) polar coordinates:

QVECTOR D/#,S,# - (X=D,Y=S) cartesian into (length,angle) polar

To get the results:

GETQX length

GETQY angle

LOGARITHM

To convert an unsigned 32-bit integer into a 5:27-bit logarithm, where the top 5 bits hold the whole part of the power-of-2 exponent and the bottom 27 bits hold the fractional part:

QLOG D/# - Compute log base 2 of D

To get the result:

GETQX logarithm

EXPONENT

To convert a 5:27-bit logarithm into a 32-bit unsigned integer:

QEXP D/# - Compute 2 to the power of D

To get the result:

GETQX integer

OVERLAPPING CORDIC COMMANDS

Because each cog's hub slot comes around every 1/2/4/8/16 clocks (8 clocks for the current P2X8C4M64P, since it has 8 cogs) and the pipeline is 54 clocks long, it is possible to overlap CORDIC commands, where several commands are initially given to the CORDIC solver, and then results are read and another command is given, indefinitely, until, at the end, the trailing results are read. You must not have interrupts enabled during such a juggle, or enough clocks could be stolen by the interrupt service routine that one or more of your results could be overwritten before you can read them. If you ever attempt to read results when none are available and none are in progress, GETQX/GETQY will only take two clocks and the QMT (CORDIC empty) event flag will be set.

' CORDIC overlapping command demo

' - outputs 32 sine waves of increasing frequency on P0..P31 using 990-ohm DACs

' - uses SETQ+QROTATE+GETQY+GETQX, the most input/output-intensive CORDIC command

con _clkfreq = 256_000_000 'clock frequency

clks = 3*256 'clocks per frame, 3 complete DAC cycles

f = 100 frac (_clkfreq / clks) '100 Hz, gets multiplied by 100, 101, 102..

dacmode = %10100_00000000_01_00011_0 '990-ohm DAC + pwm-dithered 16-bit DAC mode

dat org

wrpin ##dacmode,pins32 'set 16-bit pwm-dither DAC mode for P0..P31

wxpin ##clks,pins32 'set period for three pwm-dithered DAC cycles

dirh pins32 'enable smart pins

' Rotate 32 sets of (x,y) coordinates at different rates

' by overlapping CORDIC commands and result fetches

' 'clk sum

' 'w=wait !=cordic tick

loop setq y+00 '2 ? begin first 8 commands

qrotate x+00,a+00 '?w+2 2!

setq y+01 '2 4

qrotate x+01,a+01 '4w+2 10!

setq y+02 '2 12

qrotate x+02,a+02 '4w+2 18!

setq y+03 '2 20

qrotate x+03,a+03 '4w+2 26!

setq y+04 '2 28

qrotate x+04,a+04 '4w+2 34!

setq y+05 '2 36

qrotate x+05,a+05 '4w+2 42!

setq y+06 '2 44

qrotate x+06,a+06 '4w+2 50!

setq y+07 '2 52

qrotate x+07,a+07 '4w+2 58! result 00 is ready at 54!!!

getqy y+00 '2 60 get result 00, no waiting!!!

getqx x+00 '2 62

setq y+08 '2 64 begin overlapping commands and results

qrotate x+08,a+08 '2 66!

getqy y+01 '2 68

getqx x+01 '2 70

setq y+09 '2 72

qrotate x+09,a+09 '2 74!

getqy y+02 '2 76

getqx x+02 '2 78

setq y+10 '2 80

qrotate x+10,a+10 '2 82!

getqy y+03 '2 84

getqx x+03 '2 86

setq y+11 '2 88

qrotate x+11,a+11 '2 90!

getqy y+04 '2 92

getqx x+04 '2 94

setq y+12 '2 96

qrotate x+12,a+12 '2 98!

getqy y+05 '2 100

getqx x+05 '2 102

setq y+13 '2 104

qrotate x+13,a+13 '2 106!

getqy y+06 '2 108

getqx x+06 '2 110

setq y+14 '2 112

qrotate x+14,a+14 '2 114!

getqy y+07 '2 116

getqx x+07 '2 118

setq y+15 '2 120

qrotate x+15,a+15 '2 122!

getqy y+08 '2 124

getqx x+08 '2 126

setq y+16 '2 128

qrotate x+16,a+16 '2 130!

getqy y+09 '2 132

getqx x+09 '2 134

setq y+17 '2 136

qrotate x+17,a+17 '2 138!

getqy y+10 '2 140

getqx x+10 '2 142

setq y+18 '2 144

qrotate x+18,a+18 '2 146!

getqy y+11 '2 148

getqx x+11 '2 150

setq y+19 '2 152

qrotate x+19,a+19 '2 154!

getqy y+12 '2 156

getqx x+12 '2 158

setq y+20 '2 160

qrotate x+20,a+20 '2 162!

getqy y+13 '2 164

getqx x+13 '2 166

setq y+21 '2 168

qrotate x+21,a+21 '2 170!

getqy y+14 '2 172

getqx x+14 '2 174

setq y+22 '2 176

qrotate x+22,a+22 '2 178!

getqy y+15 '2 180

getqx x+15 '2 182

setq y+23 '2 184

qrotate x+23,a+23 '2 186!

getqy y+16 '2 188

getqx x+16 '2 190

setq y+24 '2 192

qrotate x+24,a+24 '2 194!

getqy y+17 '2 196

getqx x+17 '2 198

setq y+25 '2 200

qrotate x+25,a+25 '2 202!

getqy y+18 '2 204

getqx x+18 '2 206

setq y+26 '2 208

qrotate x+26,a+26 '2 210!

getqy y+19 '2 212

getqx x+19 '2 214

setq y+27 '2 216

qrotate x+27,a+27 '2 218!

getqy y+20 '2 220

getqx x+20 '2 222

setq y+28 '2 224

qrotate x+28,a+28 '2 226!

getqy y+21 '2 228

getqx x+21 '2 230

setq y+29 '2 232

qrotate x+29,a+29 '2 234!

getqy y+22 '2 236

getqx x+22 '2 238

setq y+30 '2 240

qrotate x+30,a+30 '2 242!

getqy y+23 '2 244

getqx x+23 '2 246

setq y+31 '2 248

qrotate x+31,a+31 '2 250!

getqy y+24 '2 252 get 8 trailing results

getqx x+24 '2 254

getqy y+25 '4w+2 260

getqx x+25 '2 262

getqy y+26 '4w+2 268

getqx x+26 '2 270

getqy y+27 '4w+2 276

getqx x+27 '2 278

getqy y+28 '4w+2 284

getqx x+28 '2 286

getqy y+29 '4w+2 292

getqx x+29 '2 294

getqy y+30 '4w+2 300

getqx x+30 '2 302

getqy y+31 '4w+2 308

getqx x+31 '2 310

' Wait for next DAC frame

.wait testp #0 wc 'check ina[0]

if_nc jmp #.wait

' Output y[00..31] (sines) to P0..P31 DACs

rep @.r,#32 'ready to update 32 DACs

alts i,#y 'get y[00..31] into next s and inc i

getword j,0-0,#1 'get upper word of y

bitnot j,#15 'convert signed word to unsigned word for DAC output

wypin j,i 'update DAC output value

incmod i,#31 'inc index, wrap to 0

drvnot #32 'toggle P32 on each iteration

jmp #loop 'loop for another sample set

' Data

pins32 long 0 addpins 31 'pin range for P0..P31

i long 0 'index

j long 0 'misc

x long $7F000000[32] 'initial (x,y) coordinates

y long $00000000[32]

a long 100*f,101*f,102*f,103*f,104*f,105*f,106*f,107*f 'ascending frequencies

long 108*f,109*f,110*f,111*f,112*f,113*f,114*f,115*f

long 116*f,117*f,118*f,119*f,120*f,121*f,122*f,123*f

long 124*f,125*f,126*f,127*f,128*f,129*f,130*f,131*f

LOCKS

The hub contains a pool of 16 semaphore bits, called locks. Locks can be used by cogs to coordinate exclusive access of a shared resource. In order to use a lock, one cog must first allocate a lock with LOCKNEW. Once allocated, cooperative cogs use LOCKTRY and LOCKREL to respectively take or release the allocated lock. When the lock is no longer needed, it may be returned to the unallocated lock pool by executing LOCKRET.

The LOCK instructions are:

LOCKNEW D {WC}
LOCKRET {#}D
LOCKTRY {#}D {WC}
LOCKREL {#}D {WC}

What a lock represents is completely up to the application using it. locks are just a means of allowing one cog at a time the exclusive status of 'owner'. All participant cogs must agree on a lock's number and its purpose for a lock to be useful.

Allocating Locks

LOCKNEW is used to allocate a lock from the hub lock pool. If an unallocated lock is available, that lock's number will be stored in the D register. If WC is set on the instruction, the C flag will indicate whether a lock was allocated. Zero (0) indicates success, while one (1) indicates that all locks are already allocated. A cog may allocate more than one lock. Once a lock has been allocated, the lock number may be shared with other cogs so that they can use LOCKTRY/LOCKREL.

LOCKRET is used to return an allocated lock to the lock pool. Any cog can return an allocated lock, even if it wasn't the cog that allocated it with LOCKNEW.

Using Locks

A cog may attempt to take an allocated lock by executing LOCKTRY with the lock number. If WC is used with the instruction, the C flag will indicate afterwards whether the lock was successfully taken. Zero (0) indicates that the lock was not taken because either another cog is holding it or the lock is not allocated, while one (1) indicates that the lock was successfully taken (or is now "held" by this cog). While the lock is held, no other cog can take the lock until the cog that's holding the lock either executes LOCKREL with the lock number or it is stopped via COGSTOP or restarted via COGINIT.

Because lock arbitration is performed by the hub in a round-robin fashion, any cog waiting in a loop to capture a lock will get its fair turn:

'Keep trying to capture lock until successful
.try LOCKTRY write_lock WC
IF_NC JMP #.try

When a cog is done with a held lock, it must execute LOCKREL to release it for other cogs to take. Only the cog that has taken the lock can release it.

NOTE: A lock will also be implicitly released if the cog that's holding the lock is stopped (COGSTOP) or restarted (COGINIT), or if LOCKRET is executed for that lock.

LOCKREL can also be used to query the current lock status. When LOCKREL is executed with WC, the C flag will indicate whether the lock is currently taken. Additionally, if the D field references a register (not an immediate value), the register will be written with the cog ID of the current owner (if held) or last owner (if released). If the cog executing LOCKREL is also the cog that is holding the lock, the normal LOCKREL behavior will still be performed (i.e. the lock will be released).

SMART PINS

Each I/O pin has a 'smart pin' circuit which, when enabled, performs some autonomous function on the pin. Smart pins free the cogs from needing to micro-manage many I/O operations by providing high-bandwidth concurrent hardware functions which cogs could not perform as well on their own by manipulating I/O pins via instructions.

Normally, an I/O pin's output enable is controlled by its DIR bit and its output state is controlled by its OUT bit, while the IN bit returns the pin's read state. In smart pin modes, the DIR bit is used as an active-low reset signal to the smart pin circuitry, while the output enable state is controlled by a configuration bit. In some modes, the smart pin takes over driving the output state, in which case the OUT bit gets ignored. The IN bit serves as a flag to indicate to the cog(s) that the smart pin has completed some function or an event has occurred, and acknowledgment is perhaps needed.

Smart pins have four 32-bit registers inside of them:

mode - smart pin mode, as well as low-level I/O pin mode (write-only)

X - mode-specific parameter (write-only)

Y - mode-specific parameter (write-only)

Z - mode-specific result (read-only)

These four registers are written and read via the following 2-clock instructions, in which S/# is used to select the pin number (0..63) and D/# is the 32-bit data conduit:

WRPIN D/#,S/# - Set smart pin S/# mode to D/#, ack pin

WXPIN D/#,S/# - Set smart pin S/# parameter X to D/#, ack pin

WYPIN D/#,S/# - Set smart pin S/# parameter Y to D/#, ack pin

RDPIN D,S/# {WC} - Get smart pin S/# result Z into D, flag into C, ack pin

RQPIN D,S/# {WC} - Get smart pin S/# result Z into D, flag into C, don't ack pin

AKPIN S/# - Acknowledge pin S/#

Each cog has a 34-bit bus to each smart pin for write data and acknowledgment signaling. Each smart pin OR's all incoming 34-bit buses from the cogs in the same way DIR and OUT bits are OR'd before going to the pins. Therefore, if you intend to have multiple cogs execute WRPIN / WXPIN / WYPIN / RDPIN / AKPIN instructions on the same smart pin, you must be sure that they do so at different times, in order to avoid clobbering each other's bus data. Any number of cogs can read a smart pin simultaneously, without bus conflict, though, by using RQPIN ('read quiet'), since it does not utilize the 34-bit cog-to-smart-pin bus for acknowledgement signaling, like RDPIN does.

Each smart pin has an outgoing 33-bit bus which conveys its Z result and a special flag. RDPIN and RQPIN are used to multiplex and read these buses, so that a pin's Z result is read into D and its special flag can be read into C. C will be either a mode-related flag or the MSB of the Z result.

For the WRPIN instruction, which establishes both the low-level and smart-pin configuration for each I/O pin, the D operand is composed as:

D/# = %AAAA_BBBB_FFF_MMMMMMMMMMMMM_TT_SSSSS_0

%AAAA: 'A' input selector

0xxx = true (default)

1xxx = inverted

x000 = this pin's read state (default)

x001 = relative +1 pin's read state

x010 = relative +2 pin's read state

x011 = relative +3 pin's read state

x100 = this pin's OUT bit from cogs

x101 = relative -3 pin's read state

x110 = relative -2 pin's read state

x111 = relative -1 pin's read state

%BBBB: 'B' input selector

0xxx = true (default)

1xxx = inverted

x000 = this pin's read state (default)

x001 = relative +1 pin's read state

x010 = relative +2 pin's read state

x011 = relative +3 pin's read state

x100 = this pin's OUT bit from cogs

x101 = relative -3 pin's read state

x110 = relative -2 pin's read state

x111 = relative -1 pin's read state

%FFF: 'A' and 'B' input logic/filtering (after 'A' and 'B' input selectors)

000 = A, B (default)

001 = A AND B, B

010 = A OR B, B

011 = A XOR B, B

100 = A, B, both filtered using global filt0 settings

101 = A, B, both filtered using global filt1 settings

110 = A, B, both filtered using global filt2 settings

111 = A, B, both filtered using global filt3 settings

The resultant 'A' will drive the IN signal in non-smart-pin modes.

%M..M: low-level pin control

In the Spin2 documentation, there are many predefined labels documented, which cover these pin configurations, as well as the smart pin modes.

%TT: pin DIR/OUT control (default = %00)

for odd pins, 'OTHER' = even pin's NOT output state (diff source)

for even pins, 'OTHER' = unique pseudo-random bit (noise source)

for all pins, 'SMART' = smart pin output which overrides OUT/OTHER

'DAC_MODE' is enabled when M[12:10] = %101

'BIT_DAC' outputs {2{M[7:4]}} for 'high' or {2{M[3:0]}} for 'low' in DAC_MODE

for smart pin mode off (%SSSSS = %00000):

DIR enables output

for non-DAC_MODE:

0x = OUT drives output

1x = OTHER drives output

for DAC_MODE:

00 = OUT enables ADC, M.[7..0] sets DAC level

01 = OUT enables ADC, M.[3..0] selects cog DAC channel

10 = OUT drives BIT_DAC

11 = OTHER drives BIT_DAC

for all smart pin modes (%SSSSS > %00000):

x0 = output disabled, regardless of DIR

x1 = output enabled, regardless of DIR

for DAC smart pin modes (%SSSSS = %00001..%00011):

0x = OUT enables ADC in DAC_MODE, M.[7..0] overridden

1x = OTHER enables ADC in DAC_MODE, M.[7..0] overridden

for non-DAC smart pin modes (%SSSSS = %00100..%11111):

0x = SMART/OUT drives output or BIT_DAC if DAC_MODE

1x = SMART/OTHER drives output or BIT_DAC if DAC_MODE

%SSSSS: 00000 = smart pin off (default)

00001 = long repository (M.[12..10] != %101)

00010 = long repository (M.[12..10] != %101)

00011 = long repository (M.[12..10] != %101)

00001 = DAC noise (M.[12..10] = %101)

00010 = DAC 16-bit dither, noise (M.[12..10] = %101)

00011 = DAC 16-bit dither, PWM (M.[12..10] = %101)

00100* = pulse/cycle output

00101* = transition output

00110* = NCO frequency

00111* = NCO duty

01000* = PWM triangle

01001* = PWM sawtooth

01010* = PWM switch-mode power supply, V and I feedback

01011 = periodic/continuous: A-B quadrature encoder

01100 = periodic/continuous: inc on A-rise & B-high

01101 = periodic/continuous: inc on A-rise & B-high / dec on A-rise & B-low

01110 = periodic/continuous: inc on A-rise {/ dec on B-rise}

01111 = periodic/continuous: inc on A-high {/ dec on B-high}

10000 = time A-states

10001 = time A-highs

10010 = time X A-highs/rises/edges -or- timeout on X A-high/rise/edge

10011 = for X periods, count time

10100 = for X periods, count states

10101 = for periods in X+ clocks, count time

10110 = for periods in X+ clocks, count states

10111 = for periods in X+ clocks, count periods

11000 = ADC sample/filter/capture, internally clocked

11001 = ADC sample/filter/capture, externally clocked

11010 = ADC scope with trigger

11011* = USB host/device (even/odd pin pair = DM/DP)

11100* = sync serial transmit (A-data, B-clock)

11101 = sync serial receive (A-data, B-clock)

11110* = async serial transmit (baudrate)

11111 = async serial receive (baudrate)

* OUT signal overridden

When a mode-related event occurs in a smart pin, it raises its IN signal to alert the cog(s) that new data is ready, new data can be loaded, or some process has finished. A cog acknowledges a smart pin whenever it does a WRPIN, WXPIN, WYPIN, RDPIN or AKPIN on it. This causes the smart pin to lower its IN signal so that it can be raised again on the next event. Note that since the RQPIN instruction (read quiet) does not do an acknowledge, it can be used by any number of cogs, concurrently, to read a pin without bus conflict.

After WRPIN/WXPIN/WYPIN/RDPIN/AKPIN, it will take two clocks for IN to drop, before it can be polled again:

WRPIN/WXPIN/WYPIN/RDPIN/AKPIN 'acknowledge smart pin, releases IN from high

NOP 'elapse 2 clocks (or more)

TESTP pin WC 'IN can now be polled again

A smart pin should be configured while its DIR bit is low, holding it in reset. During that time, WRPIN/WXPIN/WYPIN can be used to establish the mode and related parameters. Once configured, DIR can be raised high and the smart pin will begin operating. After that, depending on the mode, you may feed it new data via WXPIN/WYPIN or retrieve results using RDPIN/RQPIN. These activities are usually coordinated with the IN signal going high.

Note that while a smart pin is configured, the %TT bits, explained above, will govern the pin's output enable, regardless of the DIR state.

A smart pin can be reset at any time, without the need to reconfigure it, by clearing and then setting its DIR bit.

To return a pin to normal mode, do a 'WRPIN #0,pin'.

PIN CONFIGURATION MODES

Each I/O pin has 13 configuration bits which determine the operation of its 3.3V circuit. The M.[12..0] bits within the WRPIN instruction's D.[20..8] operand go directly to these bits. In some smart pin modes, these bits are partially overwritten to set things like DAC values.

Below is a diagram of a single I/O pin circuit. It is powered from its local 3.3V supply pin. It connects to its own pin, as well as its odd/even adjacent pin. Pins P0 and P1 see each other's pins as adjacent pins, as do P2 and P3, etc.

Equivalent Schematics for Each Unique I/O Pin Configuration

SMART PIN MODES

Below is a list of all smart pin modes. These are set by the %SSSSS bits within the D.[5..1] operand of the WRPIN instruction.

%00000 = normal mode

This mode is for normal operation, without any smart pin functionality.

%00001..%00011 and not DAC_MODE = long repository

This mode turns the smart pin into a long repository, where WXPIN writes the long and RDPIN/RQPIN can read the long.

When active (DIR=1), WXPIN updates the long and raises IN.

During reset (DIR=0), WXPIN instructions are ignored and IN is low.

%00001 and DAC_MODE = DAC noise

This mode overrides M.[7..0] to feed the pin's 8-bit DAC pseudo-random data on every clock. M.[12..10] must be set to %101 to configure the low-level pin for DAC output. Each pin in this mode receives a unique data pattern.

X.[15..0] can be set to a sample period, in clock cycles, in case you want to mark time with IN raising at each period completion. If a sample period is not wanted, set X.[15..0] to zero (65,536 clocks), in order to maximize the unused sample period, thereby reducing switching power.

RDPIN/RQPIN can be used to retrieve the 16-bit ADC accumulation from the last sample period.

During reset (DIR=0), IN is low.

%00010 and DAC_MODE = DAC 16-bit with pseudo-random dither

This mode overrides M.[7..0] to feed the pin's 8-bit DAC with pseudo-randomly-dithered data on every clock. M.[12..10] must be set to %101 to configure the low-level pin for DAC output.

X.[15..0] establishes the sample period in clock cycles.

Y.[15..0] establishes the DAC output value which gets captured at each sample period and used for its duration.

On completion of each sample period, Y.[15..0] is captured for the next output value and IN is raised. Therefore, you would coordinate updating Y.[15..0] with IN going high.

Pseudo-random dithering does not require any kind of fixed period, as it randomly dithers the 8-bit DAC between adjacent levels, in order to achieve 16-bit DAC output, averaged over time. So, if you would like to be able to update the output value at any time and have it take immediate effect, set X.[15..0] to one (IN will stay high).

If OUT is high, the ADC will be enabled and RDPIN/RQPIN can be used to retrieve the 16-bit ADC accumulation from the last sample period. This can be used to measure loading on the DAC pin.

During reset (DIR=0), IN is low and Y.[15..0] is captured.

%00011 and DAC_MODE = DAC 16-bit with PWM dither

This mode overrides M.[7..0] to feed the pin's 8-bit DAC with PWM-dithered data on every clock. M.[12..10] must be set to %101 to configure the low-level pin for DAC output.

X.[15..0] establishes the sample period in clock cycles. The sample period must be a multiple of 256 (X.[7..0]=0), so that an integral number of 256 steps are afforded the PWM, which dithers the DAC between adjacent 8-bit levels.

Y.[15..0] establishes the DAC output value which gets captured at each sample period and used for its duration.

On completion of each sample period, Y.[15..0] is captured for the next output value and IN is raised. Therefore, you would coordinate updating Y.[15..0] with IN going high.

PWM dithering will give better dynamic range than pseudo-random dithering, since a maximum of only two transitions occur for every 256 clocks. This means, though, that a frequency of Fclock/256 will be present in the output at -48dB.

If OUT is high, the ADC will be enabled and RDPIN/RQPIN can be used to retrieve the 16-bit ADC accumulation from the last sample period. This can be used to measure loading on the DAC pin.

During reset (DIR=0), IN is low and Y.[15..0] is captured.

%00100 = pulse/cycle output

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

X.[31..16] establishes a value to which the base period counter will be compared to on each clock cycle, as it counts from X.[15..0] down to 1, before starting over at X.[15..0] if decremented Y > 0. On each clock, if the base period counter > X.[31..16] and Y > 0, the output will be high (else low).

Whenever Y.[31..0] is written with a non-zero value, the pin will begin outputting a high pulse or cycles, starting at the next base period. After each pulse, Y is decremented by one, until it reaches zero, at which the output will remain low.

Some examples:

If X.[31..16] is set to 0, the output will be high for the duration of Y > 0.

If X.[15..0] is set to 3 and X.[31..16] is set to 2, the output will be 0-0-1 (repeat) for the duration of Y > 0.

IN will be raised and the pin will revert to low output when the pulse or cycles complete, meaning Y has been decremented to zero.

During reset (DIR=0), IN is low, the output is low, and Y is set to zero.

%00101 = transition output

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units. The base-period counter begins decrementing and periodically reloading as soon as the smart pin is out of reset. All transition outputs will be synchronized to this free-running base period.

Whenever Y.[31..0] is written with a non-zero value, the pin will begin toggling for Y transitions at each base period, starting at the next base period.

IN will be raised when the transitions complete, with the pin remaining in its current output state.

During reset (DIR=0), IN is low, the output is low, and Y is set to zero.

%00110 = NCO frequency

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

Upon WXPIN, X.[31..16] is written to Z.[31..16] to allow phase setting, even during reset.

Y.[31..0] will be added into Z.[31..0] at each base period.

The pin output will reflect Z.[31].

IN will be raised whenever Z overflows.

During reset (DIR=0), IN is low, the output is low, and Z[15:0] is set to zero.

%00111 = NCO duty

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

Upon WXPIN, X.[31..16] is written to Z.[31..16] to allow phase setting.

Y.[31..0] will be added into Z.[31..0] at each base period.

The pin output will reflect Z overflow.

IN will be raised whenever Z overflows.

During reset (DIR=0), IN is low, the output is low, and Z is set to zero.

%01000 = PWM triangle

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

X.[31..16] establishes a PWM frame period in terms of base periods.

Y.[15..0] establishes the PWM output value which gets captured at each frame start and used for its duration. It should range from zero to the frame period (value specified in X.[31..16]).

A counter, updating at each base period, counts from the frame period down to one, then from one back up to the frame period. Then, Y.[15..0] is captured, IN is raised, and the process repeats.

Note that the overall update time is TWO frame periods times the base period.

At each base period, the captured output value is compared to the counter. If it is equal or greater, a high is output. If it is less, a low is output. Therefore, a zero will always output a low and the frame period value will always output a high.

During reset (DIR=0), IN is low, the output is low, and Y.[15..0] is captured.

%01001 = PWM sawtooth

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

X.[31..16] establishes a PWM frame period in terms of base periods.

Y.[15..0] establishes the PWM output value which gets captured at each frame start and used for its duration. It should range from zero to the frame period.

A counter, updating at each base period, counts from one up to the frame period. Then, Y.[15..0] is captured, IN is raised, and the process repeats.

During reset (DIR=0), IN is low, the output is low, and Y.[15..0] is captured.

%01010 = PWM switch-mode power supply with voltage and current feedback

This mode overrides OUT to control the pin output state.

X.[15..0] establishes a base period in clock cycles which forms the empirical high-time and low-time units.

X.[31..16] establishes a PWM frame period in terms of base periods.

Y.[15..0] establishes the PWM output value which gets captured at each frame start and used for its duration. It should range from zero to the frame period.

A counter, updating at each base period, counts from one up to the frame period. Then, the 'A' input is sampled at each base period until it reads low. After 'A' reads low, Y.[15..0] is captured, IN is raised^[t], and the process repeats.

At each base period, the captured output value is compared to the counter. If it is equal or greater, a high is output. If it is less, a low is output. If, at any time during the cycle, the 'B' input goes high, the output will be low for the rest of that cycle.

Due to the nature of switch-mode power supplies, it may be appropriate to just set Y.[15..0] once and let it repeat indefinitely.

During reset (DIR=0), IN is low, the output is low, and Y.[15..0] is captured.

WXPIN is used to set the base period (X.[15..0]) and the PWM frame count (X.[31..16]). The base period is the number of clocks which makes a base unit of time. The frame count is the number of base units that make up a PWM cycle.

WYPIN is used to set the output value (Y.[15..0]), which is internally captured at the start of every PWM frame and compared to the frame counter upon completion of each base unit of time. If the output value is greater than or equal to the frame counter, the pin outputs a high, else a low. This is intended to drive the gate of the switcher FET.

The "A" input is the voltage detector for the SMPS output. This could be an adjacent pin using the internal-DAC-comparison mode to observe the center tap of a voltage divider which is fed by the final SMPS output. When "A" is low, a PWM cycle is performed because the final output voltage has sagged below the requirement and it's time to do another pulse.

The "B" input is the over-current detector which, if ever high during the PWM cycle, immediately forces the output low for the rest of that PWM cycle. This could be an adjacent pin using the internal-DAC-comparison mode to observe a shunt resistor between GND and the FET source. When the shunt voltage gets too high, too much current is flowing (or the desired amount of current is flowing), so the output goes low to turn off the FET and allow the inductor connected to its drain to shoot high, creating a power pulse to be captured by a diode and dumped into a cap, which is the SMPS final output.

%01011 = A/B-input quadrature encoder

X.[31..0] establishes a measurement period in clock cycles.

If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit quadrature step count can always be read via RDPIN/RQPIN.

If a non-zero value is used for the period, quadrature steps will be counted for that many clock cycles and then the result will be placed in Z while the accumulator will be set to the 0/1/-1 value that would have otherwise been added into it. This way, all quadrature steps get counted across measurements. At the end of each period, IN will be raised and RDPIN/RQPIN can be used to retrieve the last 32-bit measurement.

It may be useful to configure both 'A' and 'B' smart pins to quadrature mode, with one being continuous (X=0) for absolute position tracking and the other being periodic (x<>0) for velocity measurement.

The quadrature encoder can be "zeroed" by pulsing DIR low at any time. There is no need to do another WXPIN.

During reset (DIR=0), IN is low and Z is set to the adder value (0/1/-1).

%01100 = Count A-input positive edges when B-input is high

X.[31..0] establishes a measurement period in clock cycles.

If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.

If a non-zero value is used for the period, events will be counted for that many clock cycles and then the result will be placed in Z, while the accumulator will be set to the 0/1 value that would have otherwise been added into it, beginning a new measurement. This way, all events get counted across measurements. At the end of each period, IN will be raised and RDPIN/RQPIN can be used to retrieve the 32-bit measurement.

During reset (DIR=0), IN is low and Z is set to the adder value (0/1).

%01101 = Accumulate A-input positive edges with B-input supplying increment (B=1) or decrement (B=0)

X.[31..0] establishes a measurement period in clock cycles.

If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.

If a non-zero value is used for the period, events will be counted for that many clock cycles and then the result will be placed in Z, while the accumulator will be set to the 0/1/-1 value that would have otherwise been added into it, beginning a new measurement. This way, all events get counted across measurements. At the end of each period, IN will be raised and RDPIN/RQPIN can be used to retrieve the 32-bit measurement.

During reset (DIR=0), IN is low and Z is set to the adder value (0/1/-1).

%01110 AND !Y.[0] = Count A-input positive edges

%01110 AND Y.[0] = Increment on A-input positive edge and decrement on B-input positive edge

X.[31..0] establishes a measurement period in clock cycles. Y.[0] establishes whether to just count A-input positive edges (=0), or to increment on A-input positive edge and decrement on B-input positive edge (=1).

If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.

During reset (DIR=0), IN is low and Z is set to the adder value (0/1/-1).

%01111 AND !Y.[0] = Count A-input highs

%01111 AND Y.[0] = Increment on A-input high and decrement on B-input high

X.[31..0] establishes a measurement period in clock cycles. Y.[0] establishes whether to just count A-input highs (Y.[0]=0), or to increment on A-input high and decrement on B-input high (Y.[0]=1).

If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.

During reset (DIR=0), IN is low and Z is set to the adder value (0/1/-1).

%10000 = Time A-input states

Continuous states are counted in clock cycles.

Upon each state change, the prior state is placed in the C-flag buffer, the prior state's duration count is placed in Z, and IN is raised. RDPIN/RQPIN can then be used to retrieve the measurement. Z will be limited to $80000000.

If states change faster than the cog is able to retrieve measurements, the measurements will effectively be lost, as old ones will be overwritten with new ones. This may be gotten around by using two smart pins to time highs, with one pin inverting its 'A' input. Then, you could capture both states, as long as the sum of the states' durations didn't exceed the cog's ability to retrieve both results. This would help in cases where one of the states was very short in duration, but the other wasn't.

During reset (DIR=0), IN is low and Z is set to $00000001.

%10001 = Time A-input high states

Continuous high states are counted in clock cycles.

Upon each high-to-low transition, the previous high duration count is placed in Z, and IN is raised. RDPIN/RQPIN can then be used to retrieve the measurement. Z will be limited to $80000000.

During reset (DIR=0), IN is low and Z is set to $00000001.

%10010 AND !Y.[2] = Time X A-input highs/rises/edges

Time is measured until X A-input highs/rises/edges are accumulated.

X.[31..0] establishes how many A-input highs/rises/edges are to be accumulated.

Y.[1..0] establishes A-input high/rise/edge sensitivity:

%00 = A-input high

%01 = A-input rise

%1x = A-input edge

Time is measured in clock cycles until X highs/rises/edges are accumulated from the A-input. The measurement is then placed in Z, and IN is raised. RDPIN/RQPIN can then be used to retrieve the measurement. Z will be limited to $80000000.

During reset (DIR=0), IN is low and Z is set to $00000001.

%10010 AND Y.[2] = Timeout on X clocks of missing A-input high/rise/edge

If no A-input high/rise/edge occurs within X clocks, IN is raised, a new timeout period of X clocks begins, and Z maintains a running count of how many clocks have elapsed since the last A-input high/rise/edge. Z will be limited to $80000000 and can be read any time via RDPIN/RQPIN.

If an A-input high/rise/edge does occur within X clocks, a new timeout period of X clocks begins and Z is reset to $00000001.

X.[31..0] establishes how many clocks before a timeout due to no A-input high/rise/edge occurring.

Y.[1..0] establishes A-input high/rise/edge sensitivity:

%00 = A-input high

%01 = A-input rise

%1x = A-input edge

During reset (DIR=0), IN is low and Z is set to $00000001.

%10011 = For X periods, count time

%10100 = For X periods, count states

X.[31..0] establishes how many A-input rise/edge to B-input rise/edge periods are to be measured.

Y.[1..0] establishes A-input and B-input rise/edge sensitivity:

%00 = A-input rise to B-input rise

%01 = A-input rise to B-input edge

%10 = A-input edge to B-input rise

%11 = A-input edge to B-input edge

Note: The B-input can be set to the same pin as the A-input for single-pin cycle measurement.

Clock cycles or A-input trigger states are counted from each A-input rise/edge to each B-input rise/edge for X periods. If the A-input rise/edge is ever coincident with the B-input rise/edge at the end of the period, the start of the next period is registered. Upon completion of X periods, the measurement is placed in Z, IN is raised, and a new measurement begins. RDPIN/RQPIN can then be used to retrieve the completed measurement. Z will be limited to $80000000.

The first mode is intended to be used as an oversampling period measurement, while the second mode is a complementary duty measurement.

During reset (DIR=0), IN is low and Z is set to $00000000.

%10101 = For periods in X+ clock cycles, count time

%10110 = For periods in X+ clock cycles, count states

%10111 = For periods in X+ clock cycles, count periods

X.[31..0] establishes the minimum number of clock cycles to track periods for. Periods are A-input rise/edge to B-input rise/edge.

Y.[1..0] establishes A-input and B-input rise/edge sensitivity:

%00 = A-input rise to B-input rise

%01 = A-input rise to B-input edge

%10 = A-input edge to B-input rise

%11 = A-input edge to B-input edge

Note: The B-input can be set to the same pin as the A-input for single-pin cycle measurement.

A measurement is taken across some number of A-input rise/edge to B-input rise/edge periods, until X clock cycles elapse and then any period in progress completes. If the A-input rise/edge is ever coincident with the B-input rise/edge at the end of the period, the start of the next period is registered. Upon completion, the measurement is placed in Z, IN is raised, and a new measurement begins. RDPIN/RQPIN can then be used to retrieve the completed measurement. Z will be limited to $80000000.

The first mode accumulates time within each period, for an oversampled period measurement.

The second mode accumulates A-input trigger states within each period, for an oversampled duty measurement.

The third mode counts the periods.

Knowing how many clock cycles some number of complete periods took, and what the duty was, affords a very time-efficient and precise means of determining frequency and duty cycle. At least two of these measurements must be made concurrently to get useful results.

During reset (DIR=0), IN is low and Z is set to $00000000.

%11000 = ADC sample/filter/capture, internally clocked

%11001 = ADC sample/filter/capture, externally clocked

These modes facilitate sampling, SINC filtering, and raw capturing of ADC bitstream data.

For the internally-clocked mode, the A-input will be sampled on every clock and should be a pin configured for ADC operation (M.[12..10] = %100). In the externally-clocked mode, the A-input will be sampled on each B-input rise, so that an external delta-sigma ADC may be employed.

WXPIN sets the mode to X.[5..4] and the sample period to POWER(2, X.[3..0]). Not all mode and period combinations are useful, or even functional:

	X.[5..4] → Mode →	%00 SINC2 Sampling	%01 SINC2 Filtering	%10 SINC3 Filtering	%11 Bitstream capturing
X.[3..0]	Sample Period	Sample Resolution	Post-diff ENOB*	Post-diff ENOB*	(LSB = oldest bit)
%0000	1 clock	impractical	impractical	impractical	1 new bit
%0001	2 clocks	2 bits	impractical	impractical	2 new bits
%0010	4 clocks	3 bits	impractical	impractical	4 new bits
%0011	8 clocks	4 bits	4	impractical	8 new bits
%0100	16 clocks	5 bits	5	8	16 new bits
%0101	32 clocks	6 bits	6	10	32 new bits
%0110	64 clocks	7 bits	7	12	overflow
%0111	128 clocks	8 bits	8	14	overflow
%1000	256 clocks	9 bits	9	16	overflow
%1001	512 clocks	10 bits	10	18	overflow
%1010	1,024 clocks	11 bits	11	overflow	overflow
%1011	2,048 clocks	12 bits	12	overflow	overflow
%1100	4,096 clocks	13 bits	13	overflow	overflow
%1101	8,192 clocks	14 bits	14	overflow	overflow
%1110	16,384 clocks	overflow	overflow	overflow	overflow
%1111	32,768 clocks	overflow	overflow	overflow	overflow

* ENOB = Effective Number of Bits, or the sample resolution

For modes other than SINC2 Sampling (X.[5..4] > %00), WYPIN may be used after WXPIN to override the initial period established by X.[3..0] and replace it with the arbitrary value in Y.[13..0]. For example, if you'd like to do SINC3 filtering with a period of 320 clocks, you could follow the WXPIN with a 'WYPIN #320,adcpin'. The smart pin accumulators are 27 bits wide. This allows up to 2^(27/3), or 512, clocks per decimation in SINC3 filtering mode and up to 2^(27/2), or 11,585, clocks in SINC2 filtering mode.

Upon completion of each sample period, the measurement is placed in Z, IN is raised, and a new measurement begins. RDPIN/RQPIN can then be used to retrieve the completed measurement.

About SINC2 and SINC3 filtering

SINC2 filtering works by summing the input bit into an accumulator on each clock which, in turn, is summed into another accumulator, to create a double integration. At the end of each sampling period, the difference between the new and previous second accumulator's value is the conversion sample, and the 'previous' value is updated. This process has the pleasant effect of returning an extra bit of resolution over simple bit-summing, as well as filtering away rectangular-sampling-window effects. SINC2 filtering is best for DC measurements, where precision is important. Practical measurements of 14-bit resolution can be made every 8,192 clocks using SINC2 filtering. After starting SINC2 filtering, the filter will become accurate starting on the third sample.

SINC3 filtering is like SINC2, but employs an additional level of accumulation to increase sensitivity to dynamics in the input signal. SINC3 doubles the ENOB (effective number of bits) over simple bit-summing for fast signals, but it is only slightly better at DC measurements than SINC2 filtering at the same sample period. Because SINC3 takes more resources within the smart pin, it is limited to 512 samples per period, making it less practical than SINC2 for precision DC measurements, but quite ideal for tracking fast, dynamic signals. After starting SINC3 filtering, the filter will become accurate starting on the fifth sample.

Because the accumulators are 27 bits wide, 32-bit integer adds and subtracts in software will roll over incorrectly. There are two ways to handle this:

You can either prescale the 27-bit values to 32-bit values:

RDPIN x,#adcpin 'get SINC2 accumulator

SHL x,#5 'prescale 27-bit to 32-bit

SUB x,diff 'compute sample

ADD diff,x 'update diff value

Or you can post-trim them to 27-bit values:

RDPIN x,#adcpin 'get SINC2 accumulator

SUB x,diff 'compute sample

ADD diff,x 'update diff value

ZEROX x,#26 'trim to 27-bit

SINC2 Sampling Mode (%00)

This mode performs complete SINC2 conversions, updating the ADC output sample at the end of each period. Once this mode is enabled, it is only necessary to do a RDPIN/RQPIN to acquire the latest ADC sample. The limitation of this mode is that it only works at power-of-2 sample periods, since that stricture afforded efficient implementation within the smart pin, making complete conversions possible without software. There is an additional SINC2 filtering mode (%01) which allows non-power-of-2 sample periods, but you must perform the difference computation in software.

To begin SINC2 sampling:

WRPIN ##%100011_0000000_00_11000_0,adcpin 'configure ADC+sample pin(s)

WXPIN #%00_0111,adcpin 'SINC2 sampling at 8 bits

DIRH adcpin 'enable smart pin(s)

NOTE: The variable 'adcpin' could enable multiple pins by having the additional number of pins in bits 10..6. For example, if 'adcpin' held %00111_010000, pins 16 through 23 would have been simultaneously configured by the above code.

To read the latest ADC sample, just do a RDPIN/RQPIN:

RDPIN sample,adcpin 'read sample at any time

SINC2 Filtering Mode (%01)

This mode performs SINC2 filtering, which requires some software interaction in order to realize ADC samples.

To begin SINC2 filtering:

WRPIN ##%100011_0000000_00_11000_0,#adcpin 'configure ADC+filter pin(s)

WXPIN #%01_0111,#adcpin 'SINC2 filtering at 128 clocks

DIRH #adcpin 'enable smart pin(s)

Pin interaction must occur after each sample period, so it may be good to set up an event to detect the pin's IN going high:

SETSE1 #%001<<6 + adcpin 'SE1 triggers on pin high

.loop WAITSE1 'wait for sample period done

RDPIN x,#adcpin 'get SINC2 accumulator

SUB x,diff 'compute sample

ADD diff,x 'update diff value

SHR x,#6 'justify 8-bit sample

ZEROX x,#7 'trim 8-bit sample

'use x here 'use sample somehow

JMP #.loop 'loop for next period

x RES 1 'sample value

diff RES 1 'diff value

Note that it is necessary to shift the computed sample right by some number of bits to leave the ENOBs intact. For SINC2 filtering, you must shift right by LOG2(clocks per period)-1, which in this case is LOG2(128)-1 = 6.

SINC3 Filtering Mode (%10)

This mode performs SINC3 filtering, which requires some software interaction in order to realize ADC samples.

To begin SINC3 filtering:

WRPIN ##%100011_0000000_00_11000_0,#adcpin 'configure ADC+filter pin(s)

WXPIN #%10_0111,#adcpin 'SINC3 filtering at 128 clocks

DIRH #adcpin 'enable smart pin(s)

Pin interaction must occur after each sample period, so it may be good to set up an event to detect the pin's IN going high:

SETSE1 #%001<<6 + adcpin 'SE1 triggers on pin high

.loop WAITSE1 'wait for sample period done

RDPIN x,#adcpin 'get SINC3 accumulator

SUB x,diff1 'compute sample

ADD diff1,x 'update diff1 value

SUB x,diff2 'compute sample

ADD diff2,x 'update diff2 value

SHR x,#7 'justify 14-bit sample

ZEROX x,#13 'trim 14-bit sample

'use x here 'use sample somehow

JMP #.loop 'loop for next period

x RES 1 'sample value

diff1 RES 1 'diff1 value

diff2 RES 1 'diff2 value

Note that it is necessary to shift the computed sample right by some number of bits to leave the ENOBs intact. For SINC3 filtering, you must shift right by LOG2(clocks per period), which in this case is LOG2(128) = 7.

Bitstream Capturing Mode (%11)

This mode captures the raw bitstream coming from the ADC. It buffers 32 bits and is meant to be read once every 32 clocks, in order to get contiguous snapshots of the ADC bitstream. RDPIN/RQPIN is used to read the snapshots. Bit 31 of the data will be the most recent ADC bit, while bit 0 will be from 31 clocks earlier.

To begin raw bitstream capturing:

WRPIN ##%100011_0000000_00_11000_0,adcpin 'configure ADC+sample pin(s)

WXPIN #%11_0101,adcpin 'raw sampling every 32 clocks

DIRH adcpin 'enable smart pin(s)

To get a snapshot of the latest 32 bits of the ADC bitstream, just do a RDPIN/RQPIN:

RDPIN bitstream,adcpin 'get snapshot of ADC bitstream

This mode can be used for purposes other than capturing ADC bitstreams. It's really just capturing the A-input without regard to pin configuration.

%11010 = ADC Scope with Trigger

This mode calculates an 8-bit ADC sample and checks for hysteretic triggering on every clock, providing the basis of oscilloscope functionality. Samples from blocks of up to four pins can be grouped into a 32-bit data pipe for recording by the streamer or reading by the GETSCP instruction (see 'SCOPE Data Pipe' below).

There are three different windowed filter functions from which ADC samples can be computed. On each clock, the incoming ADC bit is shifted into a tap string and the weighted tap bits are summed together to produce the sample. The samples are normalized to 8 bits in size, but the DC dynamic range is ~5 to ~6 bits, depending on the filter length. These are plots of the actual filter shapes and sizes:

The scope trigger function is set by two 6-bit parameters, A and B, which MSB-justify to the 8-bit samples for comparison. Triggering is a two-step process of arming and then triggering, which raises the IN signal and waits for a new arming event. The relationship between A and B determine the triggering pattern:

A and B

relationship

Arming Event

(initial / after trigger)

Trigger Event

(after arming)

A > B

sample.[7..2] => A

sample.[7..2] < B

A <= B

sample.[7..2] < A

sample.[7..2] => B

WXPIN is used to configure this mode.

X.[15..10] sets the B trigger value.

X.[7..2] sets the A trigger value.

X.[1..0] selects the filter:

%00 = 68-tap Tukey filter

%01 = 45-tap Tukey filter

%1x = 28-tap Hann filter

RDPIN/RQPIN always returns the 8-bit sample, along with the 'armed' state in the C flag.

When 'armed' and then 'triggered', IN is raised and the 'armed' state is canceled.

SCOPE Data Pipe

Each cog has a 32-bit SCOPE data pipe which is intended to be used with smart pins configured to the 'scope' mode. The SCOPE data pipe continuously aggregates the lower bytes of RDPIN values from a 4-pin block, so that the streamer can record up to four time-aligned 8-bit ADC samples per clock. They can also be read at once via the GETSCP instruction.

The SETSCP instruction enables the SCOPE data pipe and selects the 4-pin block whose lower bytes of RDPIN values it will continuously carry:

SETSCP {#}D 'D[6] enables the SCOPE data pipe, D.[5..2] selects the 4-pin block

The GETSCP instruction gets the SCOPE data pipe's current four bytes:

GETSCP D 'Get the lower-byte RDPIN values of four pins into the bytes of D

If the SCOPE data pipe didn't exist, the closest you could come to the GETSCP instruction would be this sequence, which would not have time-aligned samples:

RQPIN x,#pinblock | 3 'read pin3 long into x

ROLBYTE y,x 'rotate pin3 byte into y

RQPIN x,#pinblock | 2 'read pin2 long into x

ROLBYTE y,x 'rotate pin2 byte into y

RQPIN x,#pinblock | 1 'read pin1 long into x

ROLBYTE y,x 'rotate pin1 byte into y

RQPIN x,#pinblock | 0 'read pin0 long into x

ROLBYTE y,x 'rotate pin0 byte into y

The SCOPE data pipe is generic in function and may find other uses than carrying just 'scope' data.

%11011 = USB host or device, full-speed (12Mbps) or low-speed (1.5Mbps)

This mode requires that two adjacent pins be configured together to form a USB pair, whose OUTs and %HHH_LLL drive modes will be overridden to control their output states. These pins must be an even/odd pair, having only the LSB of their pin numbers different. For example: pins 0 and 1, pins 2 and 3, and pins 4 and 5 can form USB pairs. The lower pin in the pair is DM, while the upper pin is DP, per USB naming convention. They can both be configured via a single WRPIN with D data of %1_11011_0. Using D data of %0_11011_0 will disable the output drive and effectively create a USB 'sniffer'. NOTE: In Propeller 2 emulation on an FPGA, there are no built-in 1.5k and 15k resistors, like the ASIC smart pins have, so it is up to you to install these yourself on the DP and DM lines.

WXPIN is used on the lower pin to establish the specific USB mode and set the baud rate. Once established, these settings can be changed on-the-fly without resetting the USB smart pins. This is necessary when talking to both 'full-speed' and 'low-speed' devices over a USB hub.

D.[15] must be 1 for 'host' mode or 0 for 'device' mode. This bit only affects the IDLE drive states. In 'host' mode, both pins will be pulled low via 15k resistors during IDLE. In 'device' mode, one pin will be pulled high via a 1.5k resistor, while the other pin will be floated during IDLE. In 'device' mode, D.[14] controls which pin gets pulled high and which pin gets floated.

D.[14] must be 1 for 'full-speed' mode or 0 for 'low-speed' mode. In 'full-speed' mode, the IDLE state is when DM is low and DP is high, with DP getting pulled high when in 'device' mode. In 'low-speed' mode, the IDLE state is when DP is low and DM is high, with DM getting pulled high when in 'device' mode (exact opposite of 'full speed' mode). The DP/DM electrical designations can actually be switched by swapping 'low-speed' and 'full-speed' modes, due to USB's complementary line signaling.

D.[13..0] sets the baud rate, which is a 16-bit fraction of the system clock (ie (12_000_000 FRAC clkfreq) >> 16), whose two MSBs must be 0, necessitating that the baud rate be less than 1/4th of the system clock frequency. For example, if the main clock is 80MHz and you want a 12MHz baud rate (full-speed), use 12,000,000 / 80,000,000 * $10000 = 9830, or $2666. To use this baud rate and select 'host' mode and 'full-speed', you could do 'WXPN ##$C000 | $2666,lowerpin'.

The upper (odd) pin is the DP pin. No WXPIN/WYPIN instructions are used by this pin, but if executed, they will acknowledge the pin, as if an AKPIN was executed. This pin's IN goes high whenever the two-level FIFO output buffer in the USB smart pin has room for another byte, signaling that a new output byte can be written via WYPIN to the lower (even) pin. You must do an AKPIN on the upper pin to return its IN pin to a low state, in order to detect the next FIFO-not-full signal.

The lower (even) pin is the DM pin. This pin's IN is raised whenever a change of status occurs in the receiver. At any point, a RDPIN/RQPIN can be used on this pin to read the 16-bit status word. WXPIN is used on this pin to set the NCO baud rate and WYPIN is used to write to the output buffer.

To start USB, clear the DIR bits of the intended two pins and configure them both using 'WRPIN #%1_11011_0,bothpins'. Use WXPIN to set the mode and baud rate. Then, set the pins' DIR bits to enable them. You are now ready to read the receiver status via RDPIN/RQPIN and set output states and send packets via WYPIN.

To affect the line states or send a packet, wait for the upper pin's IN to be high, indicating that the two-level FIFO buffer has room for another byte. Then, use WYPIN 'bytevalue,bothpins' to write a byte value to the bottom pin and incidentally acknowledge the upper pin to lower its IN signal.

Here are the D values for WYPIN:

0 = output IDLE - default state, two 15k pull-downs for 'host' or a 1.5k pull-up and a float for 'device'

1 = output SE0 - drive both DP and DM low

2 = output K - drive K state onto DP and DM (opposite)

3 = output J - drive J state onto DP and DM (opposite), like IDLE, but driven

4 = output EOP - output end-of-packet: SE0, SE0, J, then IDLE

$80 = SOP - output start-of-packet: KJKJKJKK, then bytes, automatic EOP when buffers empty

$00..$FF = data - after $80 (SOP), contiguous data bytes can be sent

To send a packet, first do a 'WYPIN #$80,bothpins'. Then, do a 'WYPIN byte,bothpins' to buffer each next byte. The transmitter will automatically send an EOP when you stop giving it bytes. Remember to wait for the upper pin's IN bit to be high before doing each WYPIN.

All output activity is synchronized to the NCO baud generator, so even if you output simple states, like J, K, or IDLE, they won't take effect until the next bit period and will each be one bit period in duration, if immediately followed by another state. Otherwise, the last-set state will remain.

It is necessary to know when a transmitted packet completes, so that you can start another packet or begin waiting for a response. This is done by repeating 'RDPIN status,lowerpin', waiting for bit 2 (SEO in) of status to go high. Once that bit is high, the transmitter FIFO has run out of data and is now signaling the EOP (end of packet) sequence. You can then start another packet without the transmitter interpreting the next WYPIN as a data byte of the prior packet.

There are separate state machines for transmitting and receiving. Only the baud generator is common between them. The transmitter was just described above. Below, the receiver is detailed. Note that the receiver receives not just input from another host/device, but all local output, as well.

At any time, a RDPIN/RQPIN can be executed on the lower pin to read the current 16-bit status of the receiver, with the error flag (also bit 6) going into C. The lower pin's IN will be raised whenever a change occurs in the receiver's status, but this feature is maybe only practical for detecting initial device plug-in, since during normal operation, there is activity every millisecond on the USB bus.

The receiver's status bits are as follows:

[31..16] <unused> - $0000

[15..8] byte - last byte received

[7] byte toggle - cleared on SOP, toggled on each byte received

[6] error - cleared on SOP, set on bit-unstuff error or EOP SE0 > 2 bit periods or SE1

[5] EOP in - cleared on SOP or 7 bit periods of J or K, set on EOP

[4] SOP in - cleared on EOP or 7 bit periods of J or K, set on SOP

[3] steady state - cleared on DP/DM state change, set on 7 bit periods of no change

[2] SE0 in - high when DP/DM state is SE0

[1] K in - high when DP/DM state is K

[0] J in - high when DP/DM state is J

The result of a RDPIN/RQPIN can be bit-tested for events of interest. It can also be shifted right by 8 bits to LSB-justify the last byte received and get the byte toggle bit into C, in order to determine if you have a new byte. Assume that 'flag' is initially zero:

SHR D,#8 WC 'get byte into D, get toggle bit into C

RCL flags,#1 'rotate toggle bit into buffer

TEST flags,#%11 WC 'if new and old toggle bits differ, C = 1

IF_C <use byte in D> 'if new byte, do something with it

%11100 = synchronous serial transmit

This mode overrides OUT to control the pin output state.

Words of 1 to 32 bits are shifted out on the pin, LSB first, with each new bit being output two internal clock cycles after registering a positive edge on the B input. For negative-edge clocking, the B input may be inverted by setting B.[3] in WRPIN's D value.

WXPIN is used to configure the update mode and word length.

X.[5] selects the update mode:

X.[5] = 0 sets continuous mode, where a first word is written via WYPIN during reset (DIR=0) to prime the shifter. Then, after reset (DIR=1), the second word is buffered via WYPIN and continuous clocking is started. Upon shifting each word, the buffered data written via WYPIN is advanced into the shifter and IN is raised, indicating that a new output word can be buffered via WYPIN. This mode allows steady data transmission with a continuous clock, as long as the WYPIN's after each IN-rise occur before the current word transmission is complete.

X.[5] = 1 sets start-stop mode, where the current output word can always be updated via WYPIN before the first clock, flowing right through the buffer into the shifter. Any WYPIN issued after the first clock will be buffered and loaded into the shifter after the last clock of the current output word, at which time it could be changed again via WYPIN. This mode is useful for setting up the output word before a stream of clocks are issued to shift it out.

X.[4..0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.

WYPIN is used to load the output words. The words first go into a single-stage buffer before being advanced to the shifter for output. Each time the buffer is advanced into the shifter, IN is raised, indicating that a new output word can be written via WYPIN. During reset, the buffer flows straight into the shifter.

If you intend to send MSB-first data, you must first shift and then reverse it. For example, if you had a byte in D that you wanted to send MSB-first, you would do a 'SHL D,#32-8' and then a 'REV D'.

During reset (DIR=0) the output is held low. Upon release of reset, the output will reflect the LSB of the output word written by any WYPIN during reset.

%11101 = synchronous serial receive

Words of 1 to 32 bits are shifted in by sampling the A input around the positive edge of the B input. For negative-edge clocking, the B input may be inverted by setting B.[3] in WRPIN's D value.

WXPIN is used to configure the sampling and word length.

X.[5] selects the A input sample position relative to the B input edge:

X.[5] = 0 selects the A input sample just before the B input edge was registered. This requires no hold time on the part of the sender.

X.[5] = 1 selects the sample coincident with the B edge being registered. This is useful where transmitted data remains steady after the B edge for a brief time. In the synchronous serial transmit mode, the data is steady for two internal clocks after the B edge was registered, so employing this complementary feature would enable the fastest data transmission when receiving from another smart pin in synchronous serial transmit mode.

X.[4..0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.

When a word is received, IN is raised and the data can then be read via RDPIN/RQPIN. The data read will be MSB-justified.

If you received LSB-first data, it will require right-shifting, unless the word size was 32 bits. For a word size of 8 bits, you would need to do a 'SHR D,#32-8' to get the data LSB-justified.

If you received MSB-first data, it will need to be reversed and possibly masked, unless the word size was 32 bits. For example, if you received a 9-bit word, you would do 'REV D' + 'ZEROX D,#8' to get the data LSB-justified.

%11110 = asynchronous serial transmit

This mode overrides OUT to control the pin output state.

Words from 1 to 32 bits are serially transmitted on the pin at a programmable baud rate, beginning with a low "start" bit and ending with a high "stop" bit.

WXPIN is used to configure the baud rate and word length.

X.[31..16] establishes the number of clocks in a bit period, and in case X.[31..26] is zero, X.[15..10] establishes the number of fractional clocks in a bit period. The X bit period value can be simply computed as: (clocks * $1_0000) & $FFFFFC00. For example, 7.5 clocks would be $00078000, and 33.33 clocks would be $00215400.

X.[4..0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.

WYPIN is used to load the output words. The words first go into a single-stage buffer before being advanced to a shifter for output. This buffering mechanism makes it possible to keep the shifter constantly busy, so that gapless transmissions can be achieved. Any time a word is advanced from the buffer to the shifter, IN is raised, indicating that a new word can be loaded.

Here is the internal state sequence:

Wait for an output word to be buffered via WYPIN, then set the 'buffer-full' and 'busy' flags.
Move the word into the shifter, clear the 'buffer-full' flag, and raise IN.
Output a low for one bit period (the START bit).
Output the LSB of the shifter for one bit period, shift right, and repeat until all data bits are sent.
Output a high for one bit period (the STOP bit).
If the 'buffer-full' flag is set due to an intervening WYPIN, loop to (2). Otherwise, clear the 'busy' flag and loop to (1).

RDPIN/RQPIN with WC always returns the 'busy' flag into C. This is useful for knowing when a transmission has completed. The busy flag can be polled starting three clocks after the WYPIN, which loads the output words:

WYPIN x,#txpin 'load output word

WAITX #1 'wait 2+1 clocks before polling busy

wait RDPIN x,#txpin WC 'get busy flag into C

IF_C JMP #wait 'loop until C = 0

During reset (DIR=0) the output is held high.

%11111 = asynchronous serial receive

Words from 1 to 32 bits are serially received on the A input at a programmable baud rate.

WXPIN is used to configure the baud rate and word length.

X.[4..0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.

Here is the internal state sequence:

Wait for the A input to go high (idle state).
Wait for the A input to go low (START bit edge).
Delay for half of a bit period.
If the A input is no longer low, loop to (2).
Delay for one bit period.
Right-shift the A input into the shifter and delay for one bit period, repeat until all data bits are received.
Capture the shifter into the Z register and raise IN.
Loop to (1).

RDPIN/RQPIN is used to read the received word. The word must be shifted right by 32 minus the word size. For example, to LSB-justify an 8-bit word received, you would do a 'SHR D,#32-8'.

BOOT PROCESS (needs more editing)

Boot Pattern Set By Resistors	P61	P60	P59
Serial window of 60s, default.	none	none	none
Serial window of 60s, overrides SPI and SD.	ignored	ignored	pull-up
Serial window of 100ms, then SPI flash. If SPI flash fails then serial window of 60s.	pull-up	ignored	none
SPI flash only (fast boot), no serial window. If SPI flash fails then shutdown.	pull-up	ignored	pull-down
SD card with serial window on failure. If SD card fails then serial window of 60s.	no pull-up	pull-up (built into SD card)	none
SD card only, no serial window. If SD card fails then shutdown.	no pull-up	pull-up (built into SD card)	pull-down

Boot Serial	P63 (input)	P62 (output)
Serial	RX	TX

Boot Memory	P61 (output)	P60 (output)	P59 (output)	P58 (input)
SPI flash	CSn (input)	CLK (input)	DI (input)	DO (output)
SD card	CLK (input)	CSn (input)	DI (input)	DO (output)

After a hardware reset, cog 0 loads and executes a booter program from an internal ROM. The booter program (ROM_Booter.spin2) performs the following steps^[u]:

If an external pull-up resistor is sensed on P61 (SPI_CS), then attempt to boot from SPI:

Load the first 1024 bytes (256 longs) from SPI into the hub starting at $00000.
Compute the 32-bit sum of the 256 longs.
If the sum is "Prop" ($706F7250):

Copy the first 256 longs from hub into cog registers $000..$0FF.
If an external pull-up resistor is sensed on P60 (SPI_CK):

Execute 'JMP #$000' to run the SPI program. Done.

Begin waiting for serial command(s) on P63 (RX_PIN).
If 100ms elapsed and no command begun:

Execute 'JMP #$000' to run the SPI program. Done.

If a program successfully loads serially within 60 seconds:

Execute 'COGINIT #0,#0' to relaunch cog 0 from $00000. Done.

Execute 'JMP #$000' to run the SPI program. Done.

Wait for serial command(s) on P63 (RX_PIN):

If a program successfully loads serially within 60 seconds:

Execute 'COGINIT #0,#0' to relaunch cog 0 from $00000. Done.

Slow clock to 20kHz and stop cog 0. Done.

SERIAL LOADING PROTOCOL

The built-in serial loader allows Propeller 2 chips to be loaded via 8-N-1 asynchronous serial into P63, where START=low and STOP=high, at any rate the sender uses, between 9,600 baud and 2,000,000 baud.

The loader automatically adapts to the sender's baud rate from every ">" character ($3E) it receives. It is necessary to initially send "> " ($3E, $20) before the first command, and then use ">" characters periodically throughout your data to keep the baud rate tightly calibrated to the internal RC oscillator that the loader uses during boot ROM execution. Received ">" characters are not passed to the command parser, so they can be placed anywhere.

The loader's response messages are sent back serially over P62 at the same baud rate that the sender is using. P62 is normally driven continuously during the serial protocol, but will go into open-drain mode when either the INA or INB mask of a command is non-0 (masking is explained below).

Unless preempted by a program in a SPI memory chip with a pull-up resistor on P60 (SPI_CK), the serial loader becomes active within 15ms of reset being released.

Between command keywords and data, whitespace is required. The following characters, in any contiguous combination, constitute a single whitespace:

$09 TAB

$0A LF

$0D CR

$20 SP

$3D "=" (may be present in Base64 data)

There are four commands which the sender can issue:

1) Request Propeller type:

Prop_Chk <INAmask> <INAdata> <INBmask> <INBdata>

2) Change clock setting:

Prop_Clk <INAmask> <INAdata> <INBmask> <INBdata> <HUBSETclocksetting>

3) Load and execute hex data, with and without sum checking:

Prop_Hex <INAmask> <INAdata> <INBmask> <INBdata> <hexdatabytes> ?

Prop_Hex <INAmask> <INAdata> <INBmask> <INBdata> <hexdatabytes> ~

4) Load and execute Base64 data, with and without sum checking:

Prop_Txt <INAmask> <INAdata> <INBmask> <INBdata> <base64chrs> ?

Prop_Txt <INAmask> <INAdata> <INBmask> <INBdata> <base64chrs> ~

Each command keyword is followed by four 32-bit hex values which allow selection of certain chips by their INA and INB states. If you wanted to talk to any and all chips that are connected, you would use zeroes for these values. In case multiple chips are being loaded from the same serial line, you would probably want to differentiate each download by unique INA and INB mask and data values. When the serial loader receives data and mask values which do not match its own INA and INB ports, it waits for another command. Note that you cannot use INA[1:0] for this purpose, since they are configured as smart pins used for automatic baud detection by the loader. Because the command keywords all contain an underscore ("_"), they cannot be mistaken by intervening data belonging to a command destined for another chip, while a new command is being waited for.

If, at any time, a character is received which does not comport with expectations (i.e. an "x" is received when hex digits are expected), the loader aborts the current command and waits for a new command.

Prop_Chk

The Prop_Chk command returns CR+LF+"Prop_Ver"+SP+VerChr+CR+LF. VerChr is "A".."Z" and indicates the version of Propeller chip. The Rev B/C silicon responds with "G":

Sender: "> Prop_Chk 0 0 0 0"+CR

Loader: CR+LF+"Prop_Ver G"+CR+LF

Prop_Clk

The Prop_Clk command is used to update the chip's clock source, as if a HUBSET ##$0xxxxxxx instruction were being executed. For details (and caveats), see Configuring the Clock Generator. Upon receiving a valid Prop_Clk command, the loader immediately echoes a "." character and then performs the following steps:

Switches to the internal 20MHz source.
Sets the desired configuration (except mode).
Waits ~5ms for the clock hardware to settle to the new configuration.
Enables the desired clock mode.

NOTE: After the command is sent, the sender should wait an ~10ms, then send "> " ($3E, $20) auto-baud sequence to adjust for the new clock configuration.

NOTE: If an image is loaded (see Prop_Hex/Prop_Txt) after switching to a PLL clock mode that is different than the mode used by that image, the uploaded image may need to issue a "HUBSET #$F0" before switching to the desired clock mode. See the warning in Configuring the Clock Generator for more details. An alternative approach is to use the same clock configuration as used by the image. This means that the image's call to HUBSET will effectively be a NOP, but always safe to perform.

NOTE TO FPGA USERS: The only supported clock-setting values are $00 for 20MHz and $FF for 80MHz. This value would be used instead of the 25-bit value for the regular instruction. Wait ~10ms before sending "> ".

PLL Example

To update the clock source per PLL Example:

Sender: "> Prop_Clk 0 0 0 0 19D28F8"+CR

Loader: "."

Sender: (wait ~10ms)

Sender: "> Prop_Clk 0 0 0 0 19D28FB"+CR

Loader: "."

NOTE: An initial "Prop_Clk 0 0 0 0 F0" is not required since the clock circuit starts up in this mode.

Reset to Boot Clock Configuration

To return to the clock configuration on bootup:

Sender: "> Prop_Clk 0 0 0 0 F0"+CR

Loader: "."

Prop_Hex

The Prop_Hex command is used to load byte data into the hub, starting at $00000, and then execute them. Hex bytes must be separated by whitespaces. Only the bottom 8 bits of hex values are used as data.

If the command is terminated with a "~" character, the loader will do a 'COGINIT #0,#0' to relaunch cog 0 (currently running the booter program) with the new program starting at $00000.

If the command is terminated with a "?" character, the loader will send either a "." character to signify that the embedded checksum was correct, in which case it will run the program as "~" would have. Or, it will send a "!" character to signify that the checksum was incorrect, after which it will wait for a new command.

To demonstrate hex loading, consider this small program:

DAT ORG

not dirb 'all outputs

.lp not outb 'toggle states (blinks leds on Prop123 & P2 Eval boards)

waitx ##20_000_000/4 'wait ¼ second

jmp #.lp 'loop

It assembles to:

00000- FB F7 23 F6 FD FB 23 F6 25 26 80 FF 1F 80 66 FD F0 FF 9F FD

Here is how you would run this program from the serial loader:

Sender: "> Prop_Hex 0 0 0 0 FB F7 23 F6 FD FB 23 F6 25 26 80 FF 1F 80 66 FD F0 FF 9F FD ~"

In the case of our assembled program, there are 5 little-endian longs which sum to $E6CE9A2C. To generate an embedded checksum long, you would compute $706F7250 ("Prop") minus the sum $E6CE9A2C, which results in $89A0D824. Those four bytes could be appended to the data as follows. Note that it doesn't matter where your embedded checksum long is placed, only that it be long-aligned within your data:

Sender: "> Prop_Hex 0 0 0 0 FB F7 23 F6 FD FB 23 F6 25 26 80 FF 1F 80 66 FD F0 FF 9F FD 24 D8 A0 89 ?"

Loader: "."

It's a good idea to start each hex data line with a ">" character, to keep the baud rate tightly calibrated.

Prop_Txt

The Prop_Txt command is like Prop_Hex, but with one difference: Instead of hex bytes separated by whitespaces, it takes in Base64 data, which are text characters that convey six bits, each, and get assembled into bytes as they are received. This format is 2.25x denser than hex, and so minimizes transmission size and time.

These are the characters that make up the Base64 alphabet:

"A".."Z" = $00..$19

"a".."z" = $1A..$33

"0".."9" = $34..$3D

"+" = $3E

"/" = $3F

Whitespaces are ignored among Base64 characters.

To load and run the program used in the Prop_Hex example:

Sender: "> Prop_Txt 0 0 0 0 +/cj9v37I/YlJoD/H4Bm/fD/n/0 ~"

To add the embedded checksum:

Sender: "> Prop_Txt 0 0 0 0 +/cj9v37I/YlJoD/H4Bm/fD/n/0k2KCJ ?"

Loader: "."

It's a good idea to start each^[v] Base64 data line with a ">" character, to keep the baud rate tightly calibrated.

SUMMARY

It is possible to uniquely load many Propeller chips from the same serial signal by giving them each a different INA/INB signature and not connecting SPI memory chips or SD cards to P61..P58.

To try out the serial loader, just open a terminal program on your PC with the Propeller 2 connected and type: "> Prop_Chk 0 0 0 0"+CR. You can also cut and paste those Prop_Hex and Prop_Txt example lines to load the blinker program. A simple Propeller 2 development tool needs no special serial signalling, just simple text output that needn't worry about PC/Mac/Unix new-line differences, whitespace conventions, or generating non-standard characters.

Assembly Language

For a detailed list of assembly-language instructions, see this document:

https://drive.google.com/open?id=1_vJk-Ad569UMwgXTKTdfJkHYHpc1rZwxB-DcIiAZNdk

Below are the contents of the instructions.txt file which include assembly instructions and assembler directives.

------------------

instruction timing

------------------

clk

_________------------____________------------____________------------____________------------____________------------____________------------____________-

| | | | | | |

rdRAM Ib |-------+ | rdRAM Ic |-------+ | rdRAM Id |-------+ | rdRAM Ie |

| | | | | | | | | |

| | | | | | | | | |

| +------------------ALU-----------> wrRAM Ra | +------------------ALU-----------> wrRAM Rb | +------------------ALU-----------> wrRAM Rc |

| | | | | | |

| 'get' | done = 'go' | 'get' | done = 'go' | 'get' | done = 'go' |

------------

instructions

------------

EEEE 0000000 CZI DDDDDDDDD SSSSSSSSS ROR D,S/# {WC/WZ/WCZ}

EEEE 0000001 CZI DDDDDDDDD SSSSSSSSS ROL D,S/# {WC/WZ/WCZ}

EEEE 0000010 CZI DDDDDDDDD SSSSSSSSS SHR D,S/# {WC/WZ/WCZ}

EEEE 0000011 CZI DDDDDDDDD SSSSSSSSS SHL D,S/# {WC/WZ/WCZ}

EEEE 0000100 CZI DDDDDDDDD SSSSSSSSS RCR D,S/# {WC/WZ/WCZ}

EEEE 0000101 CZI DDDDDDDDD SSSSSSSSS RCL D,S/# {WC/WZ/WCZ}

EEEE 0000110 CZI DDDDDDDDD SSSSSSSSS SAR D,S/# {WC/WZ/WCZ}

EEEE 0000111 CZI DDDDDDDDD SSSSSSSSS SAL D,S/# {WC/WZ/WCZ}

EEEE 0001000 CZI DDDDDDDDD SSSSSSSSS ADD D,S/# {WC/WZ/WCZ}

EEEE 0001001 CZI DDDDDDDDD SSSSSSSSS ADDX D,S/# {WC/WZ/WCZ}

EEEE 0001010 CZI DDDDDDDDD SSSSSSSSS ADDS D,S/# {WC/WZ/WCZ}

EEEE 0001011 CZI DDDDDDDDD SSSSSSSSS ADDSX D,S/# {WC/WZ/WCZ}

EEEE 0001100 CZI DDDDDDDDD SSSSSSSSS SUB D,S/# {WC/WZ/WCZ}

EEEE 0001101 CZI DDDDDDDDD SSSSSSSSS SUBX D,S/# {WC/WZ/WCZ}

EEEE 0001110 CZI DDDDDDDDD SSSSSSSSS SUBS D,S/# {WC/WZ/WCZ}

EEEE 0001111 CZI DDDDDDDDD SSSSSSSSS SUBSX D,S/# {WC/WZ/WCZ}

EEEE 0010000 CZI DDDDDDDDD SSSSSSSSS CMP D,S/# {WC/WZ/WCZ}

EEEE 0010001 CZI DDDDDDDDD SSSSSSSSS CMPX D,S/# {WC/WZ/WCZ}

EEEE 0010010 CZI DDDDDDDDD SSSSSSSSS CMPS D,S/# {WC/WZ/WCZ}

EEEE 0010011 CZI DDDDDDDDD SSSSSSSSS CMPSX D,S/# {WC/WZ/WCZ}

EEEE 0010100 CZI DDDDDDDDD SSSSSSSSS CMPR D,S/# {WC/WZ/WCZ}

EEEE 0010101 CZI DDDDDDDDD SSSSSSSSS CMPM D,S/# {WC/WZ/WCZ}

EEEE 0010110 CZI DDDDDDDDD SSSSSSSSS SUBR D,S/# {WC/WZ/WCZ}

EEEE 0010111 CZI DDDDDDDDD SSSSSSSSS CMPSUB D,S/# {WC/WZ/WCZ}

EEEE 0011000 CZI DDDDDDDDD SSSSSSSSS FGE D,S/# {WC/WZ/WCZ}

EEEE 0011001 CZI DDDDDDDDD SSSSSSSSS FLE D,S/# {WC/WZ/WCZ}

EEEE 0011010 CZI DDDDDDDDD SSSSSSSSS FGES D,S/# {WC/WZ/WCZ}

EEEE 0011011 CZI DDDDDDDDD SSSSSSSSS FLES D,S/# {WC/WZ/WCZ}

EEEE 0011100 CZI DDDDDDDDD SSSSSSSSS SUMC D,S/# {WC/WZ/WCZ}

EEEE 0011101 CZI DDDDDDDDD SSSSSSSSS SUMNC D,S/# {WC/WZ/WCZ}

EEEE 0011110 CZI DDDDDDDDD SSSSSSSSS SUMZ D,S/# {WC/WZ/WCZ}

EEEE 0011111 CZI DDDDDDDDD SSSSSSSSS SUMNZ D,S/# {WC/WZ/WCZ}

EEEE 0100000 CZI DDDDDDDDD SSSSSSSSS TESTB D,S/# WC/WZ

EEEE 0100001 CZI DDDDDDDDD SSSSSSSSS TESTBN D,S/# WC/WZ

EEEE 0100010 CZI DDDDDDDDD SSSSSSSSS TESTB D,S/# ANDC/ANDZ

EEEE 0100011 CZI DDDDDDDDD SSSSSSSSS TESTBN D,S/# ANDC/ANDZ

EEEE 0100100 CZI DDDDDDDDD SSSSSSSSS TESTB D,S/# ORC/ORZ

EEEE 0100101 CZI DDDDDDDDD SSSSSSSSS TESTBN D,S/# ORC/ORZ

EEEE 0100110 CZI DDDDDDDDD SSSSSSSSS TESTB D,S/# XORC/XORZ

EEEE 0100111 CZI DDDDDDDDD SSSSSSSSS TESTBN D,S/# XORC/XORZ

EEEE 0100000 CZI DDDDDDDDD SSSSSSSSS BITL D,S/# {WCZ}

EEEE 0100001 CZI DDDDDDDDD SSSSSSSSS BITH D,S/# {WCZ}

EEEE 0100010 CZI DDDDDDDDD SSSSSSSSS BITC D,S/# {WCZ}

EEEE 0100011 CZI DDDDDDDDD SSSSSSSSS BITNC D,S/# {WCZ}

EEEE 0100100 CZI DDDDDDDDD SSSSSSSSS BITZ D,S/# {WCZ}

EEEE 0100101 CZI DDDDDDDDD SSSSSSSSS BITNZ D,S/# {WCZ}

EEEE 0100110 CZI DDDDDDDDD SSSSSSSSS BITRND D,S/# {WCZ}

EEEE 0100111 CZI DDDDDDDDD SSSSSSSSS BITNOT D,S/# {WCZ}

EEEE 0101000 CZI DDDDDDDDD SSSSSSSSS AND D,S/# {WC/WZ/WCZ}

EEEE 0101001 CZI DDDDDDDDD SSSSSSSSS ANDN D,S/# {WC/WZ/WCZ}

EEEE 0101010 CZI DDDDDDDDD SSSSSSSSS OR D,S/# {WC/WZ/WCZ}

EEEE 0101011 CZI DDDDDDDDD SSSSSSSSS XOR D,S/# {WC/WZ/WCZ}

EEEE 0101100 CZI DDDDDDDDD SSSSSSSSS MUXC D,S/# {WC/WZ/WCZ}

EEEE 0101101 CZI DDDDDDDDD SSSSSSSSS MUXNC D,S/# {WC/WZ/WCZ}

EEEE 0101110 CZI DDDDDDDDD SSSSSSSSS MUXZ D,S/# {WC/WZ/WCZ}

EEEE 0101111 CZI DDDDDDDDD SSSSSSSSS MUXNZ D,S/# {WC/WZ/WCZ}

EEEE 0110000 CZI DDDDDDDDD SSSSSSSSS MOV D,S/# {WC/WZ/WCZ}

EEEE 0110001 CZI DDDDDDDDD SSSSSSSSS NOT D,S/# {WC/WZ/WCZ}

EEEE 0110010 CZI DDDDDDDDD SSSSSSSSS ABS D,S/# {WC/WZ/WCZ}

EEEE 0110011 CZI DDDDDDDDD SSSSSSSSS NEG D,S/# {WC/WZ/WCZ}

EEEE 0110100 CZI DDDDDDDDD SSSSSSSSS NEGC D,S/# {WC/WZ/WCZ}

EEEE 0110101 CZI DDDDDDDDD SSSSSSSSS NEGNC D,S/# {WC/WZ/WCZ}

EEEE 0110110 CZI DDDDDDDDD SSSSSSSSS NEGZ D,S/# {WC/WZ/WCZ}

EEEE 0110111 CZI DDDDDDDDD SSSSSSSSS NEGNZ D,S/# {WC/WZ/WCZ}

EEEE 0111000 CZI DDDDDDDDD SSSSSSSSS INCMOD D,S/# {WC/WZ/WCZ}

EEEE 0111001 CZI DDDDDDDDD SSSSSSSSS DECMOD D,S/# {WC/WZ/WCZ}

EEEE 0111010 CZI DDDDDDDDD SSSSSSSSS ZEROX D,S/# {WC/WZ/WCZ}

EEEE 0111011 CZI DDDDDDDDD SSSSSSSSS SIGNX D,S/# {WC/WZ/WCZ}

EEEE 0111100 CZI DDDDDDDDD SSSSSSSSS ENCOD D,S/# {WC/WZ/WCZ}

EEEE 0111101 CZI DDDDDDDDD SSSSSSSSS ONES D,S/# {WC/WZ/WCZ}

EEEE 0111110 CZI DDDDDDDDD SSSSSSSSS TEST D,S/# {WC/WZ/WCZ}

EEEE 0111111 CZI DDDDDDDDD SSSSSSSSS TESTN D,S/# {WC/WZ/WCZ}

EEEE 100000N NNI DDDDDDDDD SSSSSSSSS SETNIB D,S/#,#N

EEEE 100001N NNI DDDDDDDDD SSSSSSSSS GETNIB D,S/#,#N

EEEE 100010N NNI DDDDDDDDD SSSSSSSSS ROLNIB D,S/#,#N

EEEE 1000110 NNI DDDDDDDDD SSSSSSSSS SETBYTE D,S/#,#N

EEEE 1000111 NNI DDDDDDDDD SSSSSSSSS GETBYTE D,S/#,#N

EEEE 1001000 NNI DDDDDDDDD SSSSSSSSS ROLBYTE D,S/#,#N

EEEE 1001001 0NI DDDDDDDDD SSSSSSSSS SETWORD D,S/#,#N

EEEE 1001001 1NI DDDDDDDDD SSSSSSSSS GETWORD D,S/#,#N

EEEE 1001010 0NI DDDDDDDDD SSSSSSSSS ROLWORD D,S/#,#N

EEEE 1001010 10I DDDDDDDDD SSSSSSSSS ALTSN D,S/#

EEEE 1001010 11I DDDDDDDDD SSSSSSSSS ALTGN D,S/#

EEEE 1001011 00I DDDDDDDDD SSSSSSSSS ALTSB D,S/#

EEEE 1001011 01I DDDDDDDDD SSSSSSSSS ALTGB D,S/#

EEEE 1001011 10I DDDDDDDDD SSSSSSSSS ALTSW D,S/#

EEEE 1001011 11I DDDDDDDDD SSSSSSSSS ALTGW D,S/#

EEEE 1001100 00I DDDDDDDDD SSSSSSSSS ALTR D,S/#

EEEE 1001100 01I DDDDDDDDD SSSSSSSSS ALTD D,S/#

EEEE 1001100 10I DDDDDDDDD SSSSSSSSS ALTS D,S/#

EEEE 1001100 11I DDDDDDDDD SSSSSSSSS ALTB D,S/#

EEEE 1001101 00I DDDDDDDDD SSSSSSSSS ALTI D,S/#

EEEE 1001101 01I DDDDDDDDD SSSSSSSSS SETR D,S/#

EEEE 1001101 10I DDDDDDDDD SSSSSSSSS SETD D,S/#

EEEE 1001101 11I DDDDDDDDD SSSSSSSSS SETS D,S/#

EEEE 1001110 00I DDDDDDDDD SSSSSSSSS DECOD D,S/#

EEEE 1001110 01I DDDDDDDDD SSSSSSSSS BMASK D,S/#

EEEE 1001110 10I DDDDDDDDD SSSSSSSSS CRCBIT D,S/#

EEEE 1001110 11I DDDDDDDDD SSSSSSSSS CRCNIB D,S/#

EEEE 1001111 00I DDDDDDDDD SSSSSSSSS MUXNITS D,S/#

EEEE 1001111 01I DDDDDDDDD SSSSSSSSS MUXNIBS D,S/#

EEEE 1001111 10I DDDDDDDDD SSSSSSSSS MUXQ D,S/#

EEEE 1001111 11I DDDDDDDDD SSSSSSSSS MOVBYTS D,S/#

EEEE 1010000 0ZI DDDDDDDDD SSSSSSSSS MUL D,S/# {WZ}

EEEE 1010000 1ZI DDDDDDDDD SSSSSSSSS MULS D,S/# {WZ}

EEEE 1010001 0ZI DDDDDDDDD SSSSSSSSS SCA D,S/# {WZ}

EEEE 1010001 1ZI DDDDDDDDD SSSSSSSSS SCAS D,S/# {WZ}

EEEE 1010010 00I DDDDDDDDD SSSSSSSSS ADDPIX D,S/#

EEEE 1010010 01I DDDDDDDDD SSSSSSSSS MULPIX D,S/#

EEEE 1010010 10I DDDDDDDDD SSSSSSSSS BLNPIX D,S/#

EEEE 1010010 11I DDDDDDDDD SSSSSSSSS MIXPIX D,S/#

EEEE 1010011 00I DDDDDDDDD SSSSSSSSS ADDCT1 D,S/#

EEEE 1010011 01I DDDDDDDDD SSSSSSSSS ADDCT2 D,S/#

EEEE 1010011 10I DDDDDDDDD SSSSSSSSS ADDCT3 D,S/#

EEEE 1010011 11I DDDDDDDDD SSSSSSSSS WMLONG D,S/#/PTRx

EEEE 1010100 C0I DDDDDDDDD SSSSSSSSS RQPIN D,S/# {WC}

EEEE 1010100 C1I DDDDDDDDD SSSSSSSSS RDPIN D,S/# {WC}

EEEE 1010101 CZI DDDDDDDDD SSSSSSSSS RDLUT D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1010110 CZI DDDDDDDDD SSSSSSSSS RDBYTE D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1010111 CZI DDDDDDDDD SSSSSSSSS RDWORD D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1011000 CZI DDDDDDDDD SSSSSSSSS RDLONG D,S/#/PTRx {WC/WZ/WCZ}

EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,S/#rel9 {WC/WZ/WCZ}

EEEE 1011010 0LI DDDDDDDDD SSSSSSSSS CALLPA D/#,S/#rel9

EEEE 1011010 1LI DDDDDDDDD SSSSSSSSS CALLPB D/#,S/#rel9

EEEE 1011011 00I DDDDDDDDD SSSSSSSSS DJZ D,S/#rel9

EEEE 1011011 01I DDDDDDDDD SSSSSSSSS DJNZ D,S/#rel9

EEEE 1011011 10I DDDDDDDDD SSSSSSSSS DJF D,S/#rel9

EEEE 1011011 11I DDDDDDDDD SSSSSSSSS DJNF D,S/#rel9

EEEE 1011100 00I DDDDDDDDD SSSSSSSSS IJZ D,S/#rel9

EEEE 1011100 01I DDDDDDDDD SSSSSSSSS IJNZ D,S/#rel9

EEEE 1011100 10I DDDDDDDDD SSSSSSSSS TJZ D,S/#rel9

EEEE 1011100 11I DDDDDDDDD SSSSSSSSS TJNZ D,S/#rel9

EEEE 1011101 00I DDDDDDDDD SSSSSSSSS TJF D,S/#rel9

EEEE 1011101 01I DDDDDDDDD SSSSSSSSS TJNF D,S/#rel9

EEEE 1011101 10I DDDDDDDDD SSSSSSSSS TJS D,S/#rel9

EEEE 1011101 11I DDDDDDDDD SSSSSSSSS TJNS D,S/#rel9

EEEE 1011110 00I DDDDDDDDD SSSSSSSSS TJV D,S/#rel9

EEEE 1011110 01I 000000000 SSSSSSSSS JINT S/#rel9

EEEE 1011110 01I 000000001 SSSSSSSSS JCT1 S/#rel9

EEEE 1011110 01I 000000010 SSSSSSSSS JCT2 S/#rel9

EEEE 1011110 01I 000000011 SSSSSSSSS JCT3 S/#rel9

EEEE 1011110 01I 000000100 SSSSSSSSS JSE1 S/#rel9

EEEE 1011110 01I 000000101 SSSSSSSSS JSE2 S/#rel9

EEEE 1011110 01I 000000110 SSSSSSSSS JSE3 S/#rel9

EEEE 1011110 01I 000000111 SSSSSSSSS JSE4 S/#rel9

EEEE 1011110 01I 000001000 SSSSSSSSS JPAT S/#rel9

EEEE 1011110 01I 000001001 SSSSSSSSS JFBW S/#rel9

EEEE 1011110 01I 000001010 SSSSSSSSS JXMT S/#rel9

EEEE 1011110 01I 000001011 SSSSSSSSS JXFI S/#rel9

EEEE 1011110 01I 000001100 SSSSSSSSS JXRO S/#rel9

EEEE 1011110 01I 000001101 SSSSSSSSS JXRL S/#rel9

EEEE 1011110 01I 000001110 SSSSSSSSS JATN S/#rel9

EEEE 1011110 01I 000001111 SSSSSSSSS JQMT S/#rel9

EEEE 1011110 01I 000010000 SSSSSSSSS JNINT S/#rel9

EEEE 1011110 01I 000010001 SSSSSSSSS JNCT1 S/#rel9

EEEE 1011110 01I 000010010 SSSSSSSSS JNCT2 S/#rel9

EEEE 1011110 01I 000010011 SSSSSSSSS JNCT3 S/#rel9

EEEE 1011110 01I 000010100 SSSSSSSSS JNSE1 S/#rel9

EEEE 1011110 01I 000010101 SSSSSSSSS JNSE2 S/#rel9

EEEE 1011110 01I 000010110 SSSSSSSSS JNSE3 S/#rel9

EEEE 1011110 01I 000010111 SSSSSSSSS JNSE4 S/#rel9

EEEE 1011110 01I 000011000 SSSSSSSSS JNPAT S/#rel9

EEEE 1011110 01I 000011001 SSSSSSSSS JNFBW S/#rel9

EEEE 1011110 01I 000011010 SSSSSSSSS JNXMT S/#rel9

EEEE 1011110 01I 000011011 SSSSSSSSS JNXFI S/#rel9

EEEE 1011110 01I 000011100 SSSSSSSSS JNXRO S/#rel9

EEEE 1011110 01I 000011101 SSSSSSSSS JNXRL S/#rel9

EEEE 1011110 01I 000011110 SSSSSSSSS JNATN S/#rel9

EEEE 1011110 01I 000011111 SSSSSSSSS JNQMT S/#rel9

EEEE 1011110 1LI DDDDDDDDD SSSSSSSSS <empty> D/#,S/#

EEEE 1011111 0LI DDDDDDDDD SSSSSSSSS <empty> D/#,S/#

EEEE 1011111 1LI DDDDDDDDD SSSSSSSSS SETPAT D/#,S/#

EEEE 1100000 0LI DDDDDDDDD SSSSSSSSS WRPIN D/#,S/#

EEEE 1100000 1LI DDDDDDDDD SSSSSSSSS WXPIN D/#,S/#

EEEE 1100001 0LI DDDDDDDDD SSSSSSSSS WYPIN D/#,S/#

EEEE 1100001 1LI DDDDDDDDD SSSSSSSSS WRLUT D/#,S/#/PTRx

EEEE 1100010 0LI DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/#/PTRx

EEEE 1100010 1LI DDDDDDDDD SSSSSSSSS WRWORD D/#,S/#/PTRx

EEEE 1100011 0LI DDDDDDDDD SSSSSSSSS WRLONG D/#,S/#/PTRx

EEEE 1100011 1LI DDDDDDDDD SSSSSSSSS RDFAST D/#,S/#

EEEE 1100100 0LI DDDDDDDDD SSSSSSSSS WRFAST D/#,S/#

EEEE 1100100 1LI DDDDDDDDD SSSSSSSSS FBLOCK D/#,S/#

EEEE 1100101 0LI DDDDDDDDD SSSSSSSSS XINIT D/#,S/#

EEEE 1100101 1LI DDDDDDDDD SSSSSSSSS XZERO D/#,S/#

EEEE 1100110 0LI DDDDDDDDD SSSSSSSSS XCONT D/#,S/#

EEEE 1100110 1LI DDDDDDDDD SSSSSSSSS REP D/#,S/#

EEEE 1100111 CLI DDDDDDDDD SSSSSSSSS COGINIT D/#,S/# {WC}

EEEE 1101000 0LI DDDDDDDDD SSSSSSSSS QMUL D/#,S/#

EEEE 1101000 1LI DDDDDDDDD SSSSSSSSS QDIV D/#,S/#

EEEE 1101001 0LI DDDDDDDDD SSSSSSSSS QFRAC D/#,S/#

EEEE 1101001 1LI DDDDDDDDD SSSSSSSSS QSQRT D/#,S/#

EEEE 1101010 0LI DDDDDDDDD SSSSSSSSS QROTATE D/#,S/#

EEEE 1101010 1LI DDDDDDDDD SSSSSSSSS QVECTOR D/#,S/#

EEEE 1101011 00L DDDDDDDDD 000000000 HUBSET D/#

EEEE 1101011 C0L DDDDDDDDD 000000001 COGID D/# {WC}

EEEE 1101011 00L DDDDDDDDD 000000011 COGSTOP D/#

EEEE 1101011 C00 DDDDDDDDD 000000100 LOCKNEW D {WC}

EEEE 1101011 00L DDDDDDDDD 000000101 LOCKRET D/#

EEEE 1101011 C0L DDDDDDDDD 000000110 LOCKTRY D/# {WC}

EEEE 1101011 00L DDDDDDDDD 000000111 LOCKREL D/# {WC}

EEEE 1101011 00L DDDDDDDDD 000001110 QLOG D/#

EEEE 1101011 00L DDDDDDDDD 000001111 QEXP D/#

EEEE 1101011 CZ0 DDDDDDDDD 000010000 RFBYTE D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010001 RFWORD D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010010 RFLONG D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010011 RFVAR D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000010100 RFVARS D {WC/WZ/WCZ}

EEEE 1101011 00L DDDDDDDDD 000010101 WFBYTE D/#

EEEE 1101011 00L DDDDDDDDD 000010110 WFWORD D/#

EEEE 1101011 00L DDDDDDDDD 000010111 WFLONG D/#

EEEE 1101011 CZ0 DDDDDDDDD 000011000 GETQX D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000011001 GETQY D {WC/WZ/WCZ}

EEEE 1101011 C00 DDDDDDDDD 000011010 GETCT D {WC}

EEEE 1101011 CZL DDDDDDDDD 000011011 GETRND {D} {WC/WZ/WCZ}

EEEE 1101011 00L DDDDDDDDD 000011100 SETDACS D/#

EEEE 1101011 00L DDDDDDDDD 000011101 SETXFRQ D/#

EEEE 1101011 000 DDDDDDDDD 000011110 GETXACC D

EEEE 1101011 CZL DDDDDDDDD 000011111 WAITX D/# {WC/WZ/WCZ}

EEEE 1101011 00L DDDDDDDDD 000100000 SETSE1 D/#

EEEE 1101011 00L DDDDDDDDD 000100001 SETSE2 D/#

EEEE 1101011 00L DDDDDDDDD 000100010 SETSE3 D/#

EEEE 1101011 00L DDDDDDDDD 000100011 SETSE4 D/#

EEEE 1101011 CZ0 000000000 000100100 POLLINT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000001 000100100 POLLCT1 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000010 000100100 POLLCT2 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000011 000100100 POLLCT3 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000100 000100100 POLLSE1 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000101 000100100 POLLSE2 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000110 000100100 POLLSE3 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000000111 000100100 POLLSE4 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001000 000100100 POLLPAT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001001 000100100 POLLFBW {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001010 000100100 POLLXMT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001011 000100100 POLLXFI {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001100 000100100 POLLXRO {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001101 000100100 POLLXRL {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001110 000100100 POLLATN {WC/WZ/WCZ}

EEEE 1101011 CZ0 000001111 000100100 POLLQMT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010000 000100100 WAITINT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010001 000100100 WAITCT1 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010010 000100100 WAITCT2 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010011 000100100 WAITCT3 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010100 000100100 WAITSE1 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010101 000100100 WAITSE2 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010110 000100100 WAITSE3 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000010111 000100100 WAITSE4 {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011000 000100100 WAITPAT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011001 000100100 WAITFBW {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011010 000100100 WAITXMT {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011011 000100100 WAITXFI {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011100 000100100 WAITXRO {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011101 000100100 WAITXRL {WC/WZ/WCZ}

EEEE 1101011 CZ0 000011110 000100100 WAITATN {WC/WZ/WCZ}

EEEE 1101011 000 000100000 000100100 ALLOWI

EEEE 1101011 000 000100001 000100100 STALLI

EEEE 1101011 000 000100010 000100100 TRGINT1

EEEE 1101011 000 000100011 000100100 TRGINT2

EEEE 1101011 000 000100100 000100100 TRGINT3

EEEE 1101011 000 000100101 000100100 NIXINT1

EEEE 1101011 000 000100110 000100100 NIXINT2

EEEE 1101011 000 000100111 000100100 NIXINT3

EEEE 1101011 00L DDDDDDDDD 000100101 SETINT1 D/#

EEEE 1101011 00L DDDDDDDDD 000100110 SETINT2 D/#

EEEE 1101011 00L DDDDDDDDD 000100111 SETINT3 D/#

EEEE 1101011 00L DDDDDDDDD 000101000 SETQ D/#

EEEE 1101011 00L DDDDDDDDD 000101001 SETQ2 D/#

EEEE 1101011 00L DDDDDDDDD 000101010 PUSH D/#

EEEE 1101011 CZ0 DDDDDDDDD 000101011 POP D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000101100 JMP D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000101101 CALL D {WC/WZ/WCZ}

EEEE 1101011 CZ1 000000000 000101101 RET {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000101110 CALLA D {WC/WZ/WCZ}

EEEE 1101011 CZ1 000000000 000101110 RETA {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 000101111 CALLB D {WC/WZ/WCZ}

EEEE 1101011 CZ1 000000000 000101111 RETB {WC/WZ/WCZ}

EEEE 1101011 00L DDDDDDDDD 000110000 JMPREL D/#

EEEE 1101011 00L DDDDDDDDD 000110001 SKIP D/#

EEEE 1101011 00L DDDDDDDDD 000110010 SKIPF D/#

EEEE 1101011 00L DDDDDDDDD 000110011 EXECF D/#

EEEE 1101011 000 DDDDDDDDD 000110100 GETPTR D

EEEE 1101011 CZ0 DDDDDDDDD 000110101 GETBRK D WC/WZ/WCZ

EEEE 1101011 00L DDDDDDDDD 000110101 COGBRK D/#

EEEE 1101011 00L DDDDDDDDD 000110110 BRK D/#

EEEE 1101011 00L DDDDDDDDD 000110111 SETLUTS D/#

EEEE 1101011 00L DDDDDDDDD 000111000 SETCY D/#

EEEE 1101011 00L DDDDDDDDD 000111001 SETCI D/#

EEEE 1101011 00L DDDDDDDDD 000111010 SETCQ D/#

EEEE 1101011 00L DDDDDDDDD 000111011 SETCFRQ D/#

EEEE 1101011 00L DDDDDDDDD 000111100 SETCMOD D/#

EEEE 1101011 00L DDDDDDDDD 000111101 SETPIV D/#

EEEE 1101011 00L DDDDDDDDD 000111110 SETPIX D/#

EEEE 1101011 00L DDDDDDDDD 000111111 COGATN D/#

EEEE 1101011 CZL DDDDDDDDD 001000000 TESTP D/# WC/WZ

EEEE 1101011 CZL DDDDDDDDD 001000001 TESTPN D/# WC/WZ

EEEE 1101011 CZL DDDDDDDDD 001000010 TESTP D/# ANDC/ANDZ

EEEE 1101011 CZL DDDDDDDDD 001000011 TESTPN D/# ANDC/ANDZ

EEEE 1101011 CZL DDDDDDDDD 001000100 TESTP D/# ORC/ORZ

EEEE 1101011 CZL DDDDDDDDD 001000101 TESTPN D/# ORC/ORZ

EEEE 1101011 CZL DDDDDDDDD 001000110 TESTP D/# XORC/XORZ

EEEE 1101011 CZL DDDDDDDDD 001000111 TESTPN D/# XORC/XORZ

EEEE 1101011 CZL DDDDDDDDD 001000000 DIRL D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000001 DIRH D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000010 DIRC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000011 DIRNC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000100 DIRZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000101 DIRNZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000110 DIRRND D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001000111 DIRNOT D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001000 OUTL D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001001 OUTH D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001010 OUTC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001011 OUTNC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001100 OUTZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001101 OUTNZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001110 OUTRND D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001001111 OUTNOT D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010000 FLTL D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010001 FLTH D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010010 FLTC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010011 FLTNC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010100 FLTZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010101 FLTNZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010110 FLTRND D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001010111 FLTNOT D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011000 DRVL D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011001 DRVH D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011010 DRVC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011011 DRVNC D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011100 DRVZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011101 DRVNZ D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011110 DRVRND D/# {WCZ}

EEEE 1101011 CZL DDDDDDDDD 001011111 DRVNOT D/# {WCZ}

EEEE 1101011 000 DDDDDDDDD 001100000 SPLITB D

EEEE 1101011 000 DDDDDDDDD 001100001 MERGEB D

EEEE 1101011 000 DDDDDDDDD 001100010 SPLITW D

EEEE 1101011 000 DDDDDDDDD 001100011 MERGEW D

EEEE 1101011 000 DDDDDDDDD 001100100 SEUSSF D

EEEE 1101011 000 DDDDDDDDD 001100101 SEUSSR D

EEEE 1101011 000 DDDDDDDDD 001100110 RGBSQZ D

EEEE 1101011 000 DDDDDDDDD 001100111 RGBEXP D

EEEE 1101011 000 DDDDDDDDD 001101000 XORO32 D

EEEE 1101011 000 DDDDDDDDD 001101001 REV D

EEEE 1101011 CZ0 DDDDDDDDD 001101010 RCZR D {WC/WZ/WCZ}

EEEE 1101011 CZ0 DDDDDDDDD 001101011 RCZL D {WC/WZ/WCZ}

EEEE 1101011 000 DDDDDDDDD 001101100 WRC D

EEEE 1101011 000 DDDDDDDDD 001101101 WRNC D

EEEE 1101011 000 DDDDDDDDD 001101110 WRZ D

EEEE 1101011 000 DDDDDDDDD 001101111 WRNZ D

EEEE 1101011 CZ1 0cccczzzz 001101111 MODCZ c,z {WC/WZ/WCZ}

EEEE 1101011 00L DDDDDDDDD 001110000 SETSCP D/#

EEEE 1101011 000 DDDDDDDDD 001110001 GETSCP D

EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #{\}A

EEEE 1101101 RAA AAAAAAAAA AAAAAAAAA CALL #{\}A

EEEE 1101110 RAA AAAAAAAAA AAAAAAAAA CALLA #{\}A

EEEE 1101111 RAA AAAAAAAAA AAAAAAAAA CALLB #{\}A

EEEE 11100WW RAA AAAAAAAAA AAAAAAAAA CALLD register,#{\}A

EEEE 11101WW RAA AAAAAAAAA AAAAAAAAA LOC register,#{\}A

EEEE 11110NN NNN NNNNNNNNN NNNNNNNNN AUGS #N

EEEE 11111NN NNN NNNNNNNNN NNNNNNNNN AUGD #N

-------------------

instruction aliases

-------------------

NOP = $00000000

NOT register = NOT register,register

ABS register = ABS register,register

NEG register = NEG register,register

NEGC register = NEGC register,register

NEGNC register = NEGNC register,register

NEGZ register = NEGZ register,register

NEGNZ register = NEGNZ register,register

ENCOD register = ENCOD register,register

ONES register = ONES register,register

TEST register = TEST register,register

SETNIB register/# = SETNIB 0,register/#,#0 (use after ALTSN)

GETNIB register = GETNIB register,0,#0 (use after ALTGN)

ROLNIB register = ROLNIB register,0,#0 (use after ALTGN)

SETBYTE register/# = SETBYTE 0,register/#,#0 (use after ALTSB)

GETBYTE register = GETBYTE register,0,#0 (use after ALTGB)

ROLBYTE register = ROLBYTE register,0,#0 (use after ALTGB)

SETWORD register/# = SETWORD 0,register/#,#0 (use after ALTSW)

GETWORD register = GETWORD register,0,#0 (use after ALTGW)

ROLWORD register = ROLWORD register,0,#0 (use after ALTGW)

ALTSN register = ALTSN register,#0

ALTGN register = ALTGN register,#0

ALTSB register = ALTSB register,#0

ALTGB register = ALTGB register,#0

ALTSW register = ALTSW register,#0

ALTGW register = ALTGW register,#0

ALTR register = ALTR register,#0

ALTD register = ALTD register,#0

ALTS register = ALTS register,#0

ALTB register = ALTB register,#0

ALTI register = ALTI register,#%101_100_100 (substitute register for next instruction)

DECOD register = DECOD register,register

BMASK register = BMASK register,register

POPA register = RDLONG register,--PTRA

POPB register = RDLONG register,--PTRB

RESI3 = CALLD $1F0,$1F1 WCZ

RESI2 = CALLD $1F2,$1F3 WCZ

RESI1 = CALLD $1F4,$1F5 WCZ

RESI0 = CALLD INA,INB WCZ

RETI3 = CALLD INB,$1F1 WCZ

RETI2 = CALLD INB,$1F3 WCZ

RETI1 = CALLD INB,$1F5 WCZ

RETI0 = CALLD INB,INB WCZ

AKPIN register/# = WRPIN #1,register/#

PUSHA register/# = WRLONG register/#,PTRA++

PUSHB register/# = WRLONG register/#,PTRB++

XSTOP = XINIT #0,#0

MODC c = MODCZ c,0 {WC}

MODZ z = MODCZ 0,z {WZ}

---------------

MODCZ constants

---------------

_CLR = %0000

_NC_AND_NZ = %0001

_NZ_AND_NC = %0001

_GT = %0001

_NC_AND_Z = %0010

_Z_AND_NC = %0010

_NC = %0011

_GE = %0011

_C_AND_NZ = %0100

_NZ_AND_C = %0100

_NZ = %0101

_NE = %0101

_C_NE_Z = %0110

_Z_NE_C = %0110

_NC_OR_NZ = %0111

_NZ_OR_NC = %0111

_C_AND_Z = %1000

_Z_AND_C = %1000

_C_EQ_Z = %1001

_Z_EQ_C = %1001

_Z = %1010

_E = %1010

_NC_OR_Z = %1011

_Z_OR_NC = %1011

_C = %1100

_LT = %1100

_C_OR_NZ = %1101

_NZ_OR_C = %1101

_C_OR_Z = %1110

_Z_OR_C = %1110

_LE = %1110

_SET = %1111

Examples:

MODCZ _CLR, _Z_OR_C WCZ 'C = 0, Z |= C

MODCZ _NZ,0 WC 'C = !Z

MODCZ 0,_SET WZ 'Z = 1

MODC _NZ_AND_C WC 'C = !Z & C

MODZ _Z_NE_C WZ 'Z = Z ^ C

-----

notes

-----

A symbol declared under ORGH will return its hub address when referenced.

A symbol declared under ORG will return its cog address when referenced,

but can return its hub address, instead, if preceded by '@':

COGINIT #0,#@newcode

For immediate-branch and LOC address operands, "#" is used before the

address. In cases where there is an option between absolute and relative

addressing, the assembler will choose absolute addressing when the branch

crosses between cog and hub domains, or relative addressing when the

branch stays in the same domain. Absolute addressing can be forced by

following "#" with "\".

CALLPA/CALLPB/DJZ..JNXRL/JNATN/JNQMT - rel_imm9/ind_reg20

JMP/CALL/CALLA/CALLB/CALLD - abs_imm20/rel_imm20/ind_reg20

LOC - abs_imm20/rel_imm20

If a constant larger than 9 bits is desired in an instruction, use "##",

instead of "#" to invoke AUGS/AUGD:

AND address,##$FFFFF

DJNZ register,##far_away

The following assembler directives exist:

ORGH {hub_address}

Set hub mode and an optional address to fill to with $00 bytes.

ORG {cog_address {,cog_address_limit}}

Set cog mode with optional cog address and limit. Defaults to $000,$200.

If $200..$3FF used for cog address, LUT range selected. Doesn't generate

any data.

ORGF cog_address

Fill to cog_address with $00 bytes. Must be in cog mode.

RES cog_registers

Reserve cog registers. Doesn't generate any data. Must be in cog mode.

FIT cog_or_hub_address

Make sure cog code fits within cog or hub address.

ALIGNW/ALIGNL

Align to next word/long in hub.

BYTE data{[count]}{,data{[count]}...}

WORD data{[count]}{,data{[count]}...}

LONG data{[count]}{,data{[count]}...}

Generate byte/word/long data with optional repeat count.

Boot ROM / Debug ROM

Packaging

[a]Should add the RDFAST corruption bug here

[b]Yes, but I can't explain it well. Would you mind writing something here and I'll approve it when you're done?

[c]I mean, me neither. I just noticed that it happens. It's probably easier to figure out looking at the actual RTL logic

[d]Or just allow enough clock cycles before using the FIFO, like the instruction mode requires.

[e]It's still a bug though

[f]S and D also refer to operands as well as fields. The difference is a field is the encoded bits of the instruction while an operand is the associated fetched data.

[g]Add tab between index,' and #table same for ALTD and ALTR explanation.

[h]I think ozpropdev has just shown that this is broke if bitindex is >31 due to the addition of "addpins". In the BITC instruction, one must now use a version of bitindex that has been anded with $1F

[i]Maybe mention the nasty hazard that happens when there's only 1 NOP: https://forums.parallax.com/discussion/176204/hardware-oddity-dual-port-hazard

[j]the regular behavior of __RET__ with regular instructions is not described here anywhere. This needs to be added.

[k]rdfast #0,#bytecodes ?!

[l]This is at least highly misleading as for Goertzel there are 4-bit groups, which do not overlap or wrap around.

[m]This is not clear.

[n]This needs clarification. Example: In 4-pin mode the nibbles of each byte are swapped. If an immediate S operand of #$87654321 is given then in normal mode the output sequence is 1, 2, 3... etc. and in alternate mode it's 2, 1, 4, 3... and not 8,7,6... or 4, 8, 2, C... (bits reversed) as one might expect.

[o]fix this

[p]The fact that this restriction is only mentioned for the wrapping mode somewhat obscuredly implies that non-aligned adresses are allowed when wrapping is not used. I would explicitely emphasize this! It is a great advantage of the P2 architecture which might be obvious to propeller enthusiasts. But somebody who is used to other processor architectures might not even think about that this is possible.

[q]Also, if it is allowed to freely mix this instructions even if that requires longword access to adresses that are not divisible by 4 tell people that this is possible (I think it is). It is a powerful feature that is not self-evident.

[r]This list should include RDLUT and WRLUT

[s](or LUT address for RDLUT/WRLUT)

[t]I think IN is raised when smartA is high, not low.

[u]The check for pulldown at P60 (SD card) is missing in this description. It would be also very helpful to add a note that booting from SD card is possible by simply renaming any compiled program (*.binary file) to "_BOOT_P2.BIX" and putting it to an empty SD card formatted with FAT32.

See also: https://forums.parallax.com/discussion/comment/1526680/#Comment_1526680

[v]Add Explanation: '.' response is checksum valid, '!' response is check failed / NOT valid, and code will not be loaded and run when checksum NOT valid.

Design Status

KNOWN SILICON BUGS[a][b][c][d][e]

OVERVIEW

PIN DESCRIPTIONS

MEMORIES

COGS

INSTRUCTION MODES

REGISTER EXECUTION

LOOKUP EXECUTION

HUB EXECUTION

STARTING AND STOPPING COGS

COG RAM

GENERAL PURPOSE REGISTERS

DUAL-PURPOSE REGISTERS

SPECIAL-PURPOSE REGISTERS

LOOKUP RAM

LOAD/STORE ACCESS

STREAMER ACCESS

BYTECODE EXECUTION LOOKUP TABLE

RAM SHARING BETWEEN PAIRED COGS

REGISTER INDIRECTION

BRANCH ADDRESSING

INSTRUCTION REPEATING

INSTRUCTION SKIPPING

Special SKIPF Branching Rules

BYTECODE EXECUTION (XBYTE)

SETQ CONSIDERATIONS

PIXEL OPERATIONS

DACs

STREAMER

Immediate ⇢ LUT ⇢ Pins/DACs

Immediate ⇢ Pins/DACs

RDFAST ⇢ LUT ⇢ Pins/DACs

RDFAST ⇢ Pins/DACs

RDFAST ⇢ RGB ⇢ Pins/DACs

Pins ⇢ DACs/WRFAST

ADCs/Pins ⇢ DACs/WRFAST

DDS/Goertzel

Digital Video Output (DVI/HDMI)

COLORSPACE CONVERTER

I/O PIN TIMING

COG ATTENTION

EVENTS

Selectable Events

INTERRUPTS

DEBUG INTERRUPT

HUB

Configuration

Configuring the Clock Generator

PLL Example

Write-Protecting the Last 16KB of Hub RAM and Enabling Debug Interrupts

Configuring the Digital Filters for Smart Pins

Seeding the Xoroshiro128** PRNG

Rebooting the Chip

HUB RAM

THE COG -to- HUB RAM INTERFACE

FAST SEQUENTIAL FIFO INTERFACE

RANDOM ACCESS INTERFACE

FAST BLOCK MOVES

CORDIC Solver

MULTIPLY

DIVIDE

SQUARE ROOT

(X,Y) ROTATION

(X,Y) VECTORING

LOGARITHM

EXPONENT

OVERLAPPING CORDIC COMMANDS

LOCKS

Allocating Locks

Using Locks

SMART PINS

In the Spin2 documentation, there are many predefined labels documented, which cover these pin configurations, as well as the smart pin modes.

PIN CONFIGURATION MODES

Equivalent Schematics for Each Unique I/O Pin Configuration

SMART PIN MODES

%00000 = normal mode

%00001..%00011 and not DAC_MODE = long repository

%00001 and DAC_MODE = DAC noise

%00010 and DAC_MODE = DAC 16-bit with pseudo-random dither

KNOWN SILICON BUGS^[a]^[b]^[c]^[d]^[e]