Updated: Sept 1015 - starting to make changes to document to reflect all that we know about P2-2015. Append with new information to be reintegrated into document.
As the document is updated the text will be changed from red to black to indicate its status.
Webpage URL: This is the webpage version which is automatically generated from the document so refresh your browser often as updates are frequent
IMPORTANT NOTE: This is an unofficial document maintained by various forum/community members. |
TABLE OF CONTENTS
The Propeller 2 is a general-purpose 32-bit microcontroller with 8 symmetric processors called “cogs.” Each cog has 512 longs (2 KB) of memory from which it executes instructions.. Most instructions execute in a single clock cycle, with certain math intensive operations taking up to 31 clock cycles to complete. Additionally, there are a 4 stage pipeline, interrupt support and smart I/O pins that operate in a variety of modes. The hub allows each cog round-robin access to the main hub RAM; depending on the hub’s access window relative to the cog, access to hub RAM can take up to 7 clocks (if the access window was just missed) or as little as 0 clocks (if the cog is next in line for the access window). Additionally, the developer has the ability to set a one-time settable encryption key in the chip to protect code downloaded to the chip. On system startup the chip will use this protected key to decrypt the encrypted program that is stored externally in non-volatile EEPROM/FLASH. The encryption key is not accessible by any user code. If no encryption has been set, the Propeller 2 will boot from Serial, SPI Flash and finally present it’s monitor on pins 90(rx) and 91(tx). |
General
Clock Speed
Performance Metrics
Memory
| Power Specification
I/O
Counter Modules
Video Generation
Code Protection and Encryption
Supported Languages
|
| |
Diagram: Pinout | Diagram: Schematic Symbol (pbj) |
|
|
|
|
Note: All CPU and I/O GNDs must be connected to power common.
Note: Relative size
There are two primary types of memory, a shared HUB memory and individual COG memory.
128K bytes of main memory shared by all cogs
Diagram 1: Hub Memory and Registers
Each of the eight cogs contains 512 longs of register RAM and 256 longs of stack RAM.
The 512 longs of register RAM is comprised of:
Special function registers such PTRx and SPAx etc are accessed via special instructions and are not part of the memory map.
The 256 longs of stack RAM for data and video usage features:
// P2 MEMORY MAP 24SEP2015
//
// addr read write name
// ---------------------------------------------------------------
// COG REGISTERS (9-bit addressable)
//
// 000 INA - INA / IJMP0
// 001 INB - INB / IRET0
// 002 RAM RAM+OUTA OUTA
// 003 RAM RAM+OUTB OUTB
// 004 RAM RAM+DIRA DIRA
// 005 RAM RAM+DIRB DIRB
// 006 PTRA PTRA PTRA
// 007 PTRB PTRB PTRB
//
// 008 RAM RAM user / ADRA
// 009 RAM RAM user / ADRB
// 00A RAM RAM user / IJMP1
// 00B RAM RAM user / IRET1
// 00C RAM RAM user / IJMP2
// 00D RAM RAM user / IRET2
// 00E RAM RAM user / IJMP3
// 00F RAM RAM user / IRET3
//
// 010-1FF RAM RAM user
// ---------------------------------------------------------------
// LUT
// 200-3FF RAM RAM user / cog-exec
//
// LUT (possible expansion)
// 400-5FF RAM RAM user / cog-exec
// ---------------------------------------------------------------
// HUB
// 00000-7FFFF RAM RAM user / hub-exec
//
// HUB (future expansion)
// 80000-FFFFF RAM RAM user / hub-exec
// ---------------------------------------------------------------
// HUB ROM
// 00000-03FFF (not accessible) boot
// ---------------------------------------------------------------
Each cog now features two 17 bit pointer registers called PTRA and PTRB and a 16-byte/8-word/4-long read cache. The register pointers can be used for any hub memory read or write operation. They feature auto incrementing and decrementing with pre or post operation.
These instructions read and write hub memory.
All instructions use D as the data conduit, except WRQUAD/RDQUAD/RDQUADC, which uses the four QUAD registers. The QUADs can be mapped into cog register space using the SETQUAD instruction or kept hidden, in which case they are still useful as data conduit and as a read cache. If mapped, the QUADs overlay four contiguous cog registers which can begin at any double-even address (%xxxxxxx00). These overlaid registers can be read and written as any other registers, as well as executed. Any write via D to the QUAD registers, when mapped, will affect the underlying cog registers, as well. A RDQUAD/RDQUADC will affect the QUAD registers, but not the underlying cog registers.
The cached reads RDBYTEC/RDWORDC/RDLONGC/RDQUADC will do a RDQUAD if the current read address is outside of the 4-long window of the prior RDQUAD. Otherwise, they will immediately return cached data. The CACHEX instruction invalidates the cache, forcing a fresh RDQUAD next time a cached read executes.
Hub memory instructions must wait for their cog's hub cycle, which comes once every 8 clocks. The timing relationship between a cog's instruction stream and its hub cycle is generally indeterminant, causing these instructions to take varying numbers of clocks. Timing can be made determinant, though, by intentionally spacing these instructions apart so that after the first in a series executes, the subsequent hub memory instructions fall on hub cycles, making them take the minimal numbers of clocks. The trick is to write useful code to go in between them.
After a RDQUAD, the QUAD registers are accessible via D and S on the 3rd clock and executable on the 5th clock.
INSTRUCTION | DESCRIPTION |
WRBYTE D,S | Write lower byte of D to hub memory at S |
RDBYTE D,S | Read byte from hub memory at S into D |
RDBYTEC D,S | Read cached byte at S into D |
WRWORD D,S | Write lower word of D to hub memory at S |
RDWORD D,S | Read word from hub memory at S into D |
RDWORDC D,S | Read cached word at S into D |
WRLONG D,S | Write D to hub memory at S |
RDLONG D,S | Read long from hub memory at S into D |
RDLONGC D,S | Read cached long at S into D |
WRQUAD D | Write QUADs to hub memory at D |
RDQUAD D | Read into QUADs from hub memory at D |
RDQUADC D | Conditionally read into QUADs from hub memory at D |
INDEX | -32..+31 | Simple offset |
INDEX | 0..31 | ++ Auto-increments range |
INDEX | 0..32 | -- Auto-decrement range |
SCALE | 1 | BYTE |
SCALE | 2 | WORD |
SCALE | 4 | LONG |
SCALE | 16 | QUAD |
SUPNNNNNN PTR expression |
000000000 PTRA 'use PTRA 100000000 PTRB 'use PTRB 011000001 PTRA++ 'use PTRA, PTRA += SCALE 111000001 PTRB++ 'use PTRB, PTRB += SCALE 011111111 PTRA-- 'use PTRA, PTRA -= SCALE 111111111 PTRB-- 'use PTRB, PTRB -= SCALE 010000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE 110000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE 010111111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE 110111111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE 000NNNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE 100NNNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE 011NNNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE 111NNNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE 011nnnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE 111nnnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE 010NNNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE 110NNNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE 010nnnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE 110nnnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE |
S = 0 for PTRA, 1 for PTRB
U = 0 to keep PTRx same, 1 to update PTRx
P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)
NNNNNN = INDEX
nnnnnn = -INDEX
000000 Z01 1 CCCC DDDDDDDDD 000000000 RDBYTE D,PTRA 'read byte at PTRA into D 000001 000 1 CCCC DDDDDDDDD 111000001 WRWORD D,PTRB++ 'write lower word in D at PTRB, PTRB += 2 000010 Z01 1 CCCC DDDDDDDDD 011111111 RDLONG D,PTRA-- 'read long at PTRA into D, PTRA -= 4 000011 001 1 CCCC 110000001 010110001 RDQUAD ++PTRB 'read quad at PTRB+16 into QUADs, PTRB += 16 000000 000 1 CCCC DDDDDDDDD 010111111 WRBYTE D,--PTRA 'write lower byte in D at PTRA-1, PTRA -= 1 000001 000 1 CCCC DDDDDDDDD 100000111 WRWORD D,PTRB[7] 'write lower word in D to PTRB+7*2 000010 Z11 1 CCCC DDDDDDDDD 011001111 RDLONGC D,PTRA++[15] 'read cached long at PTRA into D, PTRA += 15*4 000011 001 1 CCCC 111111101 010110000 WRQUAD PTRB--[3] 'write QUADs at PTRB, PTRB -= 3*16 000000 000 1 CCCC DDDDDDDDD 010000110 WRBYTE D,++PTRA[6] 'write lower byte in D to PTRA+6*1, PTRA += 6*1 000001 Z01 1 CCCC DDDDDDDDD 110110110 RDWORD D,--PTRB[10] 'read word at PTRB-10*2 into D, PTRB -= 10*2 |
Bytes, words, longs, and quads are addressed as follows:
for WRBYTE/RDBYTE/RDBYTEC, address = %XXXXXXXXXXXXXXXXX (bits 16..0 are used)
for WRWORD/RDWORD/RDWORDC, address = %XXXXXXXXXXXXXXXX- (bits 16..1 are used)
for WRLONG/RDLONG/RDLONGC, address = %XXXXXXXXXXXXXXX-- (bits 16..2 are used)
for WRQUAD/RDQUAD/RDQUADC, address = %XXXXXXXXXXXXX---- (bits 16..4 are used)
address byte word long quad |
00000- 50 *7250 *706F7250 *0C7CCC030C7C200020302E32706F7250 00001- 72 7250 706F7250 0C7CCC030C7C200020302E32706F7250 00002- 6F *706F 706F7250 0C7CCC030C7C200020302E32706F7250 00003- 70 706F 706F7250 0C7CCC030C7C200020302E32706F7250 00004- 32 *2E32 *20302E32 0C7CCC030C7C200020302E32706F7250 00005- 2E 2E32 20302E32 0C7CCC030C7C200020302E32706F7250 00006- 30 *2030 20302E32 0C7CCC030C7C200020302E32706F7250 00007- 20 2030 20302E32 0C7CCC030C7C200020302E32706F7250 00008- 00 *2000 *0C7C2000 0C7CCC030C7C200020302E32706F7250 00009- 20 2000 0C7C2000 0C7CCC030C7C200020302E32706F7250 0000A- 7C *0C7C 0C7C2000 0C7CCC030C7C200020302E32706F7250 0000B- 0C 0C7C 0C7C2000 0C7CCC030C7C200020302E32706F7250 0000C- 03 *CC03 *0C7CCC03 0C7CCC030C7C200020302E32706F7250 0000D- CC CC03 0C7CCC03 0C7CCC030C7C200020302E32706F7250 0000E- 7C *0C7C 0C7CCC03 0C7CCC030C7C200020302E32706F7250 0000F- 0C 0C7C 0C7CCC03 0C7CCC030C7C200020302E32706F7250 00010- 45 *FE45 *0DC1FE45 *0D7CC6010C7CC6010CFCB6E30DC1FE45 00011- FE FE45 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45 00012- C1 *0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45 00013- 0D 0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45 00014- E3 *B6E3 *0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45 00015- B6 B6E3 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45 00016- FC *0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45 00017- 0C 0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45 00018- 01 *C601 *0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 00019- C6 C601 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001A- 7C *0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001B- 0C 0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001C- 01 *C601 *0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001D- C6 C601 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001E- 7C *0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 0001F- 0D 0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45 |
* new word/long/quad
Each cog has two 17-bit pointers, PTRA and PTRB, which can be read, written, modified, and used to access hub memory.
At cog startup, the PTRA and PTRB registers are initialized as follows:
PTRA = %X_XXXXXXXX_XXXXXXXX, data from launching cog, usually a pointer
PTRB = %X_XXXXXXXX_XXXXXX00, long address in hub where cog code was loaded from
INSTRUCTION | DESCRIPTION | CLOCK |
GETPTRA D | get PTRA into D, C = PTRA[16] | 1 |
GETPTRB D | get PTRB into D, C = PTRB[16] | 1 |
SETPTRA D SETPTRA #n | set PTRA to D set PTRA to 0..511 | 1 1 |
SETPTRB D SETPTRB #n | set PTRB to D set PTRB to 0..511 | 1 1 |
ADDPTRA D ADDPTRA #n | add D into PTRA add 0..511 into PTRA | 1 1 |
ADDPTRB D ADDPTRB #n | add D into PTRB add 0..511 into PTRB | 1 1 |
SUBPTRA D SUBPTRA #n | subtract D from PTRA subtract 0..511 from PTRA | 1 1 |
SUBPTRB D SUBPTRB #n | subtract D from PTRB subtract 0..511 from PTRB | 1 1 |
Each cog has four QUAD registers which form a 128-bit conduit between the hub memory and the cog. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions.
It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC.
Each COG has four QUAD registers which form a 128-bit conduit between the HUB memory and the COG. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions. It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC .
Initially hidden, these QUAD registers are mappable into COG register space by using the SETQUAD instruction to set an address where the base register is to appear, with the other three registers following.
SETQUAZ works just like SETQUAD, but also clears the four QUAD registers.
To hide the QUAD registers, use SETQUAD to set an address which is $1F8, or higher.
INSTRUCTION | DESCRIPTION | CLOCK |
CACHEX | Invalidate QUAD cache | 1 |
GETTOPS D | Get top bytes of QUADs into D | 1 |
SETQUAD D | Set QUAD base address to D | 1 |
SETQUAD #n | Set QUAD base address to 0..511 | 1 |
SETQUAZ D | set QUAD base address to D and clears the QUAD registers | 1 |
SETQUAZ #n | set QUAD base address to 0..511 and clears the QUAD registers | 1 |
These instructions are used to control hub circuits and cogs.
Hub instructions must wait for their cog's hub cycle, which comes once every 8 clocks. In cases where there is no result to wait for (ZCR = %000), these instructions complete on the hub cycle, making them take 1..8 clocks, depending on where the hub cycle is in relation to the instruction. In cases where a result is anticipated (ZCR <> %000), these instructions complete on the 1st clock after the hub cycle, making them take 2..9 clocks.
COGINIT is used to start cogs. Any cog can be (re)started, whether it is idle or running. A cog can even execute a COGINIT to restart itself with a new program.
COGINIT uses D to specify a long address in hub memory that is the start of the program that is to be loaded into a cog, while S is a 17-bit parameter (usually an address) that will be conveyed to PTRA of the started cog. PTRB of the started cog will be set to the start address of its program that was loaded from hub memory.
SETCOG must be executed before COGINIT to set the number of the cog to be started (0..7). If SETCOG sets a value with bit 3 set (%1xxx), this will cause the next idle cog to be started when COGINIT is executed, with the number of the cog started being returned in D, and the C flag returning 0 if okay, or 1 if no idle cog was available. At cog startup, SETCOG is initialized to %0000.
When a cog is started, $1F8 contiguous longs are read from hub memory (internally using RDLONGC) and written to cog registers $000..$1F7. The cog will then begin execution at $000. This process takes 1,016 clocks. (That's only 6.35us at 160MHz).
COGID COGNUM 'what cog am I? SETCOG COGNUM 'set my cog number COGINIT COGPGM,COGPTR 'restart me with the ROM Monitor COGPGM LONG $0070C 'address of the ROM Monitor COGPTR LONG 90<<9 + 91 'tx = P90, rx = P91 COGNUM RES 1 |
CLKSET writes the lower 9 bits of D to the hub clock register:
Bit 8 | Bits 7..4 | Bits 3..2 | Bits 1..0 |
RESET | PLL MULTIPLIER FOR XI PIN INPUT* | XI / XO PIN MODE | CLOCK SELECTOR |
0: continued operation | 0000: PLL disabled | 00: XI reads low, XO floats | 00: RCFAST (~20MHz) |
1: hardware reset | 0001: 2x multiplier | 01: XI input, XO floats | 01: RCSLOW (~20KHz) |
0010: 3x multiplier | 10: XI/XO crystal oscillator with 15pF internal loading and 1M-ohm feedback | 10: XTAL (10MHz-20MHz) | |
... | 11: XI/XO crystal oscillator with 30pF internal loading and 1M-ohm feedback | 11: PLL | |
1110: 15x multiplier | |||
1111: 16x multiplier |
* XI/XO Pin Mode must be set for XI input or XI/XO crystal oscillator to use PLL.
Because the the clock register is cleared to %0_0000_00_00 on reset, the chip starts up in RCFAST mode with both the crystal oscillator and the PLL disabled. Before switching to XTAL or PLL mode from RCFAST or RCSLOW, the crystal oscillator must be enabled and given 10ms to stabilize. The PLL stabilizes within 10us, so it can be enbled at the sime time as the crystal oscillator. Once the crystal is stabilized, you can switch between XTAL and RCFAST/RCSLOW without any stability concerns. If the PLL is also enabled, you can switch freely among PLL, XTAL, and RCFAST/RCSLOW modes. You can change the PLL multiplier while being in PLL mode, but beware that some frequency overshoot and undershoot will occur as the PLL settles to its new frequency. This only poses a hardware problem if you are switching upwards and the resulting overshoot might exceed the speed limit of the chip.
COGID returns the number of the cog (0..7) into D.
COGSTOP stops the cog specified in D (0..7).
LOCKNEW D
LOCKRET D
LOCKSET D
LOCKCLR D
There are eight semaphore locks available in the chip which can be borrowed with LOCKNEW, returned with LOCKRET, set with LOCKSET, and cleared with LOCKCLR.
While any cog can set or clear any lock without using LOCKNEW or LOCKRET, LOCKNEW and LOCKRET are provided so that cog programs have a dynamic and simple means of acquiring and relinquishing the locks at run-time.
When a lock is set with LOCKSET, its state is set to 1 and its prior state is returned in C. LOCKCLR works the same way, but clears the lock's state to 0. By having the hub perform the atomic operation of setting/clearing and reporting the prior state, cogs can utilize locks to insure that only one cog has permission to do something at once. If a lock starts out cleared and multiple cogs vie for the lock by doing a 'LOCKSET locknum wc', the cog to get C=0 back 'wins' and he can have exclusive access to some shared resource while the other cogs get C=1 back. When the winning cog is done, he can do a 'LOCKCLR locknum' to clear the lock and give another cog the opportunity to get C=0 back.
LOCKNEW returns the next available lock into D, with C=1 if no lock was free.
LOCKRET frees the lock in D so that it can be checked out again by LOCKNEW.
LOCKSET sets the lock in D and returns its prior state in C.
LOCKCLR clears the lock in D and returns its prior state in C.
CLKSET, COGID, COGINIT, COGSTOP, and the LOCKxxx instructions will take 1..8 clocks if their Z/C/R bits are all 0, meaning they don't have to wait for anything back from the hub (no Z, C, or D result). If they are going to receive some result back, they must wait for the next cycle to receive it. Hence, those instructions which get results back take 2..9 clocks.
INSTRUCTION | DESCRIPTION | CLOCK |
SETCOG D/#n | Set cog to be used by COGINIT, b3 = use next available | |
COGINIT D,S | launch cog at D, cog PTRA = S | 1..0 |
CLKSET D | set clock to D | 1..8 |
COGID D | get cog number into D | 2..9 |
COGSTOP D | stop cog in D | 1..8 |
LOCKNEW D | get new lock into D, C = busy | 2..9 |
LOCKRET D | return lock in D | 1..8 |
LOCKSET D | set lock in D, C = prev state | 2..9 |
LOCKCLR D | clear lock in D, C = prev state | 2..9 |
Each cog has two indirect “registers”: INDA and INDB. INDA and INDB each consist of three hidden 9-bit registers: the pointer, the bottom limit, and the top limit. The bottom and top limits are inclusive values which set automatic wrapping boundaries for the pointer. This way, circular buffers can be established within cog RAM and accessed using simple INDA/INDB references.
INDA shares address $1F6 and INDB shares address $1F7. When either of these addresses is encountered in the D or S field, the value of the associated INDx register is used for the register address in place of the $1F6 or $1F7.
NOTE: It is still possible to access the actual registers at $1F6 and $1F7 (as opposed to the INDA and INDB registers) via the D or S field. To accomplish this, set INDA to $1F6 and set $INDB to $1F7. These will still be considered indirect instructions. Operations on these registers do not affect the hidden pointer registers.
NOTE: The registers at $1F6 and $1F7 are treated the same as all other registers when interpreted as an instruction (i.e. executed).
SETINDA/SETINDB/SETINDS is used to set or adjust the pointer value(s) while forcing the associated bottom and top limits to $000 and $1FF, respectively.
FIXINDA/FIXINDB/FIXINDS sets the pointer(s) to an inital value, while setting the bottom limit(s) to the lower of the initial and terminal values and the top limit(s) to the higher.
At cog startup, INDA and INDB are configured as if these instructions had been executed:
FIXINDA $1F6,$1F6 // Set pointer to $1F6, bottom to $1F6, top to $1F6 FIXINDB $1F7,$1F7 // Set pointer to $1F7, bottom to $1F7, top to $1F7 |
Because indirect addressing occurs very early in the pipeline and indirect pointers are affected earlier than the final stage where the conditional bit field (CCCC) normally comes into use, the CCCC field is repurposed for indirect operations. The top two bits of CCCC are used for indirect D and the bottom two bits are used for indirect S.
Unconditional Execution
All instructions which use indirect registers will execute unconditionally, regardless of the CCCC bits.
Here is the INDA/INDB usage scheme which repurposes the CCCC field:
OOOOOO ZCR I CCCC DDDDDDDDD SSSSSSSSS |
xxxxxx xxx x 00xx 111110110 xxxxxxxxx D = INDA 'use INDA xxxxxx xxx x 00xx 111110111 xxxxxxxxx D = INDB 'use INDB xxxxxx xxx x 01xx 111110110 xxxxxxxxx D = INDA++ 'use INDA, INDA += 1 xxxxxx xxx x 01xx 111110111 xxxxxxxxx D = INDB++ 'use INDB, INDB += 1 xxxxxx xxx x 10xx 111110110 xxxxxxxxx D = INDA-- 'use INDA, INDA -= 1 xxxxxx xxx x 10xx 111110111 xxxxxxxxx D = INDB-- 'use INDB INDB -= 1 xxxxxx xxx x 11xx 111110110 xxxxxxxxx D = ++INDA 'use INDA+1, INDA += 1 xxxxxx xxx x 11xx 111110111 xxxxxxxxx D = ++INDB 'use INDB+1, INDB += 1 xxxxxx xxx 0 xx00 xxxxxxxxx 111110110 S = INDA 'use INDA xxxxxx xxx 0 xx00 xxxxxxxxx 111110111 S = INDB 'use INDB xxxxxx xxx 0 xx01 xxxxxxxxx 111110110 S = INDA++ 'use INDA, INDA += 1 xxxxxx xxx 0 xx01 xxxxxxxxx 111110111 S = INDB++ 'use INDB, INDB += 1 xxxxxx xxx 0 xx10 xxxxxxxxx 111110110 S = INDA-- 'use INDA, INDA -= 1 xxxxxx xxx 0 xx10 xxxxxxxxx 111110111 S = INDB-- 'use INDB INDB -= 1 xxxxxx xxx 0 xx11 xxxxxxxxx 111110110 S = ++INDA 'use INDA+1, INDA += 1 xxxxxx xxx 0 xx11 xxxxxxxxx 111110111 S = ++INDB 'use INDB+1, INDB += 1 |
If both D and S are the same indirect register, the two 2-bit fields in CCCC are OR'd together to get the post-modifier effect:
101000 001 0 0011 111110110 111110110 MOV INDA,++INDA 'Move @INDA+1 into @INDA, INDA += 1
100000 001 0 1100 111110111 111110111 ADD ++INDB,INDB 'Add @INDB into @INDB+1, INDB += 1
Note that only '++INDx,INDx' or 'INDx,++INDx' combinations can address different registers from the same INDx.
Here are the instructions which are used to set the pointer and limit values for INDA and INDB:
ENCODING | INSTRUCTION | DESCRIPTION | CLOCK |
111000 000 0 0001 000000000 AAAAAAAAA 111000 000 0 0011 000000000 aaaaaaaaa | SETINDA #A SETINDA a | Sets INDA pointer to 0..511* Increments/decrements INDA pointer -256..+255* | 1 1 |
111000 000 0 0100 BBBBBBBBB 000000000 111000 000 0 1100 bbbbbbbbb 000000000 | SETINDB #B SETINDB b | Sets INDB pointer to 0..511* Increments/decrements INDB pointer -256..+255* | 1 1 |
111000 000 0 0101 BBBBBBBBB AAAAAAAAA 111000 000 0 0111 BBBBBBBBB aaaaaaaaa 111000 000 0 1101 bbbbbbbbb AAAAAAAAA 111000 000 0 1111 bbbbbbbbb aaaaaaaaa | SETINDS #B,#A SETINDS #B,a SETINDS b,#A SETINDS b,a | Sets INDB pointer to 0..511 and sets INDA pointer 0..511* Sets INDB pointer to 0..511 and increments/decrements INDA pointer -256..+255* Sets INDB pointer -256..++255 and increments/decrements INDA pointer to 0..511* Sets INDB pointer -256..++255 and increments/decrements INDA pointer -256..+255* | 1 1 1 1 |
111001 000 0 0001 TTTTTTTTT IIIIIIIII | FIXINDA #T,#I | Sets the INDA pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher. | 1 |
111001 000 0 0100 TTTTTTTTT IIIIIIIII | FIXINDB #T,#I | Sets the INDB pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher. | 1 |
111001 000 0 0101 TTTTTTTTT IIIIIIIII | FIXINDS #T,#I | Sets the INDA and INDB pointers to an inital value, while setting the bottom limits to the lower of the initial and terminal values and the top limits to the higher. | 1 |
* All SETINDx operations reset the associated bottom and top limit to $000 and $1FF, respectively
111000 000 0 0001 000000000 000000101 SETINDA #5 'INDA = 5, bottom = 0, top = 511 111000 000 0 0011 000000000 000000011 SETINDA ++3 'INDA += 3, bottom = 0, top = 511 111000 000 0 1100 111111100 000000000 SETINDB --4 'INDB -= 4, bottom = 0, top = 511 111000 000 0 0111 000000111 000001000 SETINDS #7,++8 'INDB = 7, INDA += 8, bottoms = 0, tops = 511 111001 000 0 0001 000001111 000001000 FIXINDA #15,#8 'INDA = 8, bottom = 8, top = 15 111001 000 0 0100 000010000 000011111 FIXINDB #16,#31 'INDB = 31, bottom = 16, top = 31 111001 000 0 0101 001100011 000110010 FIXINDS #99,#50 'INDA/INDB = 50, bottoms = 50, tops = 99 |
Each cog has a 256-long stack RAM that is accessible via push and pop operations. Its contents are not initialized at either reset or cog startup. So, at cog startup, it will contain whatever it happened to power up with, or whatever was last written.
There are two stack pointers called SPA and SPB which are used to address the stack memory. Aside from automatically incrementing and decrementing via pushes and pops, SPA and SPB can be set, modified, read back, and checked:
SETSPA D/#n set SPA
SETSPB D/#n set SPB
ADDSPA D/#n add to SPA
ADDSPB D/#n add to SPB
SUBSPA D/#n subtract from SPA
SUBSPB D/#n subtract from SPB
GETSPA D get SPA, SPA==0 into Z, SPA.7 into C
GETSPB D get SPB, SPB==0 into Z, SPB.7 into C
GETSPD D get SPA minus SPB, SPA==SPB into Z, SPA<SPB into C
Data can be pushed and popped in both normal and reverse directions:
PUSHA D/#n push using SPA
PUSHB D/#n push using SPB
PUSHAR D/#n push using SPA, use pop addressing
PUSHBR D/#n push using SPB, use pop addressing
POPA D pop using SPA
POPB D pop using SPB
POPAR D pop using SPA, use push addressing
POPBR D pop using SPB, use push addressing
Aside from data, the program counter and flags can be pushed and popped using calls and returns:
CALLA D/#n call using SPA
CALLB D/#n call using SPB
CALLAD D/#n call using SPA, delay branch until three trailing instructions executed
CALLBD D/#n call using SPB, delay branch until three trailing instructions executed
RETA return using SPA
RETB return using SPB
RETAD return using SPA, delay branch until three trailing instructions executed
RETBD return using SPB, delay branch until three trailing instructions executed
instructions (stack RAM access is shown as [SPx++] and [--SPx]) clocks adj |
000011 ZC1 1 CCCC DDDDDDDDD 000010101 GETSPD D 'SPA-SPB into D, Z/C as CHKSPD 1 000011 ZC1 1 CCCC DDDDDDDDD 000010110 GETSPA D 'SPA into D, Z/C as CHKSPA 1 000011 ZC1 1 CCCC DDDDDDDDD 000010111 GETSPB D 'SPB into D, Z/C as CHKSPB 1 000011 ZC1 1 CCCC DDDDDDDDD 000011000 POPAR D 'read [SPA++] into D, MSB into C 1 000011 ZC1 1 CCCC DDDDDDDDD 000011001 POPBR D 'read [SPB++] into D, MSB into C 1 000011 ZC1 1 CCCC DDDDDDDDD 000011010 POPA D 'read [--SPA] into D, MSB into C 1 000011 ZC1 1 CCCC DDDDDDDDD 000011011 POPB D 'read [--SPB] into D, MSB into C 1 000011 ZC0 1 CCCC 000000000 000011100 RETA 'read [--SPA] into Z/C/PC* 4 000011 ZC0 1 CCCC 000000000 000011101 RETB 'read [--SPB] into Z/C/PC* 4 000011 ZC0 1 CCCC 000000000 000011110 RETAD 'read [--SPA] into Z/C/PC* 1 000011 ZC0 1 CCCC 000000000 000011111 RETBD 'read [--SPB] into Z/C/PC* 1 000011 000 1 CCCC DDDDDDDDD 010100010 SETSPA D 'set SPA to D 1 000011 001 1 CCCC 0nnnnnnnn 010100010 SETSPA #n 'set SPA to n 1 000011 000 1 CCCC DDDDDDDDD 010100011 SETSPB D 'set SPB to D 1 000011 001 1 CCCC 0nnnnnnnn 010100011 SETSPB #n 'set SPB to n 1 000011 000 1 CCCC DDDDDDDDD 010100100 ADDSPA D 'add D into SPA 1 000011 001 1 CCCC 0nnnnnnnn 010100100 ADDSPA #n 'add n into SPA 1 000011 000 1 CCCC DDDDDDDDD 010100101 ADDSPB D 'add D into SPB 1 000011 001 1 CCCC 0nnnnnnnn 010100101 ADDSPB #n 'add n into SPB 1 000011 000 1 CCCC DDDDDDDDD 010100110 SUBSPA D 'subtract D from SPA 1 000011 001 1 CCCC 0nnnnnnnn 010100110 SUBSPA #n 'subtract n from SPA 1 000011 000 1 CCCC DDDDDDDDD 010100111 SUBSPB D 'subtract D from SPB 1 000011 001 1 CCCC 0nnnnnnnn 010100111 SUBSPB #n 'subtract n from SPB 1 000011 000 1 CCCC DDDDDDDDD 010101000 PUSHAR D 'write D into [--SPA] 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101000 PUSHAR #n 'write n into [--SPA] 1 ** +1 000011 000 1 CCCC DDDDDDDDD 010101001 PUSHBR D 'write D into [--SPB] 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101001 PUSHBR #n 'write n into [--SPB] 1 ** +1 000011 000 1 CCCC DDDDDDDDD 010101010 PUSHA D 'write D into [SPA++] 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101010 PUSHA #n 'write n into [SPA++] 1 ** +1 000011 000 1 CCCC DDDDDDDDD 010101011 PUSHB D 'write D into [SPB++] 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101011 PUSHB #n 'write n into [SPB++] 1 ** +1 000011 000 1 CCCC DDDDDDDDD 010101100 CALLA D 'write Z/C/PC* into [SPA++], PC=D 4 ** +1 000011 001 1 CCCC nnnnnnnnn 010101100 CALLA #n 'write Z/C/PC* into [SPA++], PC=n 4 ** +1 000011 000 1 CCCC DDDDDDDDD 010101101 CALLB D 'write Z/C/PC* into [SPB++], PC=D 4 ** +1 000011 001 1 CCCC nnnnnnnnn 010101101 CALLB #n 'write Z/C/PC* into [SPB++], PC=n 4 ** +1 000011 000 1 CCCC DDDDDDDDD 010101110 CALLAD D 'write Z/C/PC* into [SPA++], PC=D 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101110 CALLAD #n 'write Z/C/PC* into [SPA++], PC=n 1 ** +1 000011 000 1 CCCC DDDDDDDDD 010101111 CALLBD D 'write Z/C/PC* into [SPB++], PC=D 1 ** +1 000011 001 1 CCCC nnnnnnnnn 010101111 CALLBD #n 'write Z/C/PC* into [SPB++], PC=n 1 ** +1 |
* bit 10 is Z, bit 9 is C, bits 8..0 are PC, upper bits are ignored or cleared
** if a stack RAM write is immediately followed by a stack RAM read, add one clock
Each cog has a 4-stage pipeline which all instructions progress through, in order to execute:
1st stage - Read instruction from cog register RAM
2nd stage - Determine any indirect or remapped D and S addresses, update INDA and INDB
3rd stage - Read D and S from cog register RAM
4th stage - Execute instruction, write D to cog register RAM, update Z/C/PC and any other results
On every clock cycle, the instruction data in each stage advances to the next stage, unless the instruction executing in the 4th stage is stalling the pipeline because it's waiting for something (WRBYTE waits for the hub).
To keep D and S data current within the pipeline, the resultant D from the 4th stage is passed back to the 3rd stage to substitute for any obsoleted D or S data currently being read from the cog register RAM. The same is done for instruction data currently being read in the 1st stage, but this still leaves a two-stage gap between when a register is modified and when it can be executed:
MOVD :inst,top9 '(initially 4th stage) modify instruction NOP '(initially 3rd stage) 1... NOP '(initially 2nd stage) 2... at least two instructions in-between :inst ADD A,B '(initially 1st stage) modified instruction executes |
Tasks that execute no more frequently than every 3rd time slot don't need to observe this 2-instruction spacer rule when executing self-modifying code, because their instructions will always be sufficiently spread apart in the pipeline by other tasks' instructions, enabling a just-modified instruction to be properly read and executed in that task's next time slot. If less than two spacers are afforded to a modify-execute sequence, the old instruction will be read and executed, instead of the new one. This can be used to advantage for efficient overlapped modify-execute sequences.
When a branch instruction executes, that task's program counter is abruptly changed from what had been a steadily incrementing course, requiring that the pipeline be reloaded, beginning at the new program counter address. This can leave up to three instructions in the pipeline which were trailing the branch instruction and belong to the same task as the branch.
Normally, these trailing instructions are incidental data which are not intended for execution, and therefore must be cancelled within the pipeline, so that they pass through without doing anything. However, in some cases, it may be desirable to allow those instrucions to execute, without cancellation, to increase pipeline efficiency.
To accommodate both cancelling and non-cancelling branches, branch instructions have two versions. The ones that end in the letter 'D' for 'delayed' are non-cancelling and take only one clock, but will execute any trailing pipelined instructions which belong to the branch's same task.
In a single-task program, three trailing instructions are executed before the delayed branch seems to take effect:
JMPD #somewhere '(initially 4th stage) do a delayed jmp, then toggle P0 and cycle P1 NOTP #0 '(initially 3rd stage) NOTP #1 '(initially 2nd stage) NOTP #1 '(initially 1st stage) next instruction is loaded from 'somewhere' |
In a two-task program with simple time slot allocation, only one trailing instruction is executed before the delayed branch seems to take effect:
JMPD #somewhere '(initially 4th stage) do a delayed jmp to 'somewhere', then toggle P0 NOTP #0 '(initially 2nd stage) next instruction is loaded from 'somewhere' |
The branch instructions that don't end in the letter 'D' are what would be considered 'normal' branches, where the next instruction to execute after the branch would be the instruction which was branched to.
Normal cancelling | Delayed non-cancelling | Normal cancelling | Delayed non-cancelling | |
JMP | JMPD | IJNZ | IJNZD | |
CALL | CALLD | DJZ | DJZD | |
RET | RETD | DJNZ | DJNZD | |
JMPRET | JMPRETD | TJZ | TJZD | |
CALLA | CALLAD | TJNZ | TJNZD | |
CALLB | CALLBD | JP | JPD | |
RETA | RETAD | JNP | JNPD | |
RETB | RETBD | PASSCNT | ||
IJZ | IJZD | JMPTASK |
Each cog has an instruction-block repeater that can variably repeat up to 64 instructions without any clock-cycle overhead.
REPD and REPS are used to initiate block repeats. These instructions specify how many times the trailing instruction block will be executed and how many instructions are in the block:
REPD #i - execute 1..64 instructions infinitely, requires 3 spacer instructions *
REPD D,#i - execute 1..64 instructions D+1 times, requires 3 spacer instructions *
REPD #n,#i - execute 1..64 instructions 1..512 times, requires 3 spacer instructions *
REPS #n,#i - execute 1..64 instructions 1..16384 times, requires 1 spacer instruction *
REPS differs from REPD by executing at the 2nd stage of the pipeline, instead of the 4th. By executing two stages early, it needs only one spacer instruction *.
Because of its earliness, no conditional execution is possible, so it always executes, allowing the CCCC bits to be repurposed, along with Z, to provide a 14-bit constant for the repeat count.
The instruction-block repeater will quit repeating the block if a branch instruction executes within the block. This rule does not currently apply to a JMPTASK which affects the task using the
repeater - this will be fixed at the earliest opportunity.
There is only one REPS/REPD circuit, so REPS/REPD's cannot be nested. <forum>
* Spacer instructions are required in 1-task applications to allow the pipeline to prime before repeating can commence. If REPD is used by a task that uses no more than every 4th time slot, no
spacers are needed, as three intervening instructions will be provided by the other task(s). If REPS is used by a task that uses no more than every 2nd time slot, no spacers are needed.
Example (1-task): |
Mnemonic | Operand | Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1) | Clocks |
REPD | #i | execute 1..64 inst's infintely | 1 |
REPD | D,#i | execute 1..64 inst's D+1 times | 1 |
REPD | #n,#i | execute 1..64 inst's 1x..512x | 1 |
REPS | #n,#i | execute 1..64 inst's 1x..16384x | 1 |
Note that the %iiiiii field represents 1..64 instructions, not the encoded 0..63. The %nnnnnnnnn/%n___nnnn_nnnnnnnnn fields are +1-based, too.
Each cog has four sets of flags and program counters (Z/C/PC), constituting four unique tasks that can execute and switch on each instruction cycle.
At cog startup, the tasks are initialized as follows:
task Z C PC |
0 0 0 $000 1 0 0 $001 2 0 0 $002 3 0 0 $003 |
There are 16 rotating time slots in the TASK register that determine task sequence. Initially, all time slots are set to 0, causing task 0 to execute exclusively, starting at address $000:
16 TIME SLOTS | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
TASK REGISTER b31..b00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
The two LSB's of TASK always determine which task will next be queued in the pipeline for execution. After each instruction cycle, the TASK register is rotated right by two bits, recycling slot 0 to slot 15 and getting the next task into the 2 LSB's.
To enable other tasks, SETTASK is used to set the TASK register:
SETTASK D write D to the TASK register
SETTASK #n write {n[7:0], n[7:0], n[7:0], n[7:0]} to the TASK register
If a task is given no time slot, it doesn't execute and its flags and PC stay at initial values. If a task is given a time slot, it will execute and its flags and PC will be updated at every instruction, or time slot. If an active task's time slots are all taken away, that task's flags and PC remain in the state where they left off, until it is given another time slot.
When SETTASK issues a new time slot pattern, there are already three instructions in the pipeline, so the 4th instruction after SETTASK will be from the task specified in the two LSB's of the SETTASK operand.
To immediately force any of the four PC's to a new address, JMPTASK can be used. JMPTASK uses a 4-bit mask to select which PC's are going to be written. Mask bits 0..3 represent PC's 0..3. The mask value %1010 would write PC 3 and PC 1, while %0100 would write PC 2, only.
JMPTASK D,#mask force PC's in mask to D
JMPTASK #addr,#mask force PC's in mask to #addr
For every PC/task affected by a JMPTASK instruction, all affected-task instructions currently in the pipeline are cancelled. This insures that once JMPTASK executes, the next instruction from each affected task will be from the new address.
Here is an example in which all four tasks are started and each task toggles an I/O pin at a different rate:
ORG JMP #task0 'task 0 begins here when the cog starts (this JMP takes 4 clocks) JMP #task1 'task 1 begins here after task 0 executes SETTASK (this JMP takes 1 clock) JMP #task2 'task 2 begins here after task 0 executes SETTASK (this JMP takes 1 clock) JMP #task3 'task 3 begins here after task 0 executes SETTASK (this JMP takes 1 clock) ctwardell suggests a correction
task0 SETTASK #%%3210 'enable all tasks (TASK = %11_10_01_00_11_10_01_00_11_10_01_00_11_10_01_00) :loop NOTP #0 'task 0, toggle pin 0 (loops every 8 clocks) JMP #:loop '(this JMP takes 1 clock) task1 NOTP #1 'task 1, toggle pin 1 (loops every 12 clocks) NOP JMP #task1 '(this JMP takes 1 clock) task2 NOTP #2 'task 2, toggle pin 2 (loops every 16 clocks) NOP NOP JMP #task2 '(this JMP takes 1 clock) task3 NOTP #3 'task 3, toggle pin 3 (loops every 20 clocks) NOP NOP NOP JMP #task3 '(this JMP takes 1 clock) -* |
Note: When a normal branch instruction (JMP, CALL, RET, etc.) executes in the fourth and final stage of the pipeline, all instructions progressing through the lower three stages, which belong to the same task as the branch instruction, are cancelled. This inhibits execution of incidental data that was trailing the branch instruction.
The delayed branch instructions (JMPD, CALLD, RETD, etc.) don't do any pipeline instruction cancellation and exist to provide 1-clock branches to single-task programs, where the three instructions following the branch are allowed to execute before the new instruction stream begins to execute.
For single-task programs, normal branches take 4 clocks: 1 clock for the branch and 3 clocks for the cancelled instructions to come through the pipeline before the new instruction stream begins to execute.
For multi-tasking programs that use all four tasks in sequence (ie SETTASK #%%3210), there are never any same-task instructions in the pipeline that would require cancellation due to branching, so all branches take just 1 clock.
ENCODING | INSTRUCTION | DESCRIPTION | CLOCK |
000011 000 1 CCCC DDDDDDDDD 01001mmmm 000011 001 1 CCCC nnnnnnnnn 01001mmmm | JMPTASK D,#mask JMPTASK #n,#mask | Set PC's in mask to D Set PC's in mask to 0..511 | 1 1 |
000011 000 1 CCCC DDDDDDDDD 011001011 000011 001 1 CCCC nnnnnnnnn 011001011 | SETTASK D SETTASK #n | Set TASK to D Set TASK to n[7:0] copied 4x | 1 1 |
Here's a little program that kicks off four tasks running the same code, but with different variable sets.
Register remapping is set up to remap 4 sets of 4 registers, according to the task executing. For tasks 0..3, hard addresses 0..3 remap to 0..3, 4..7, 8..11, or 12..15.
dat |
While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict, remember that there's only one of each of the following cog resources and only one task can use it at a time:
When writing multi-task programs, be aware that instructions that take multiple clocks will stall the pipeline and have a ripple effect on the tasks' timing. This may be impossible to avoid, as some task might need to access hub memory, and those instructions are not single-clock.
The WAITCNT/WAITPEQ/WAITPNE instructions should be coded discretely using 1-clock instructions, to avoid stalling the pipeline for excessive amounts of time.
The following instructions (WC versions) will take 1 clock, instead of potentially many, and return 1 in C if they were successful:
SNDSER D WC RCVSER D WC GETMULL D WC GETMULH D WC GETDIVQ D WC GETDIVR D WC GETSQRT D WC GETQX D WC GETQY D WC GETQZ D WC | attempt to send serial attempt to receive serial attempt to get lower multiplier result attempt to get upper multiplier result attempt to get divider quotient result attempt to get divider remainder result attempt to get square root result attempt to get CORDIC X result attempt to get CORDIC Y result attempt to get CORDIC Z result |
POLCTRA WC POLCTRB WC POLVID WC PASSCNT D JP/JNP D,S DJNZ D,#$ | returns 1 in C if CTRA rolled over, use instead of SYNCTRA returns 1 in C if CTRB rolled over, use instead of SYNCTRB returns 1 in C if WAITVID is ready, use to execute WAITVID without stalling jumps to itself if some amount of time has not passed, use instead of WAITCNT jumps based on pin states, use instead of WAITPEQ/WAITPNE loops until done, use instead of NOP D/#n |
The following instructions will not work in a multi-tasking program:
REPS/REPD GETPIX | operate by subtracting a value from the PC every n clocks - single-task only needs steady pipeline delays for perspective divider time - single-task only |
There are now 4 I/O ports built into the system – 3 are physical 32-bit I/O ports and 1 is an internal 32-bit I/O port. The I/O pins connected to each port can be configured separately.
Mnemonic | Operand | Operation |
SETPORTA | D/#n | Assign PORTA to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”. |
SETPORTB | D/#n | Assign PORTB to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”. |
SETPORTC | D/#n | Assign PORTC to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”. |
SETPORTD | D/#n | Assign PORTD to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”. |
Mnemonic | Operand | Operation |
GETP | D/#n | Get pin number given by register “D (0-511)” or “n (0-127)”into !Z or C flags. |
GETPN | D/#n | Get pin number given by register “D (0-511)” or “n (0-127)”into Z or !C flags. |
OFFP | D/#n | Toggle pin number given by register “D (0-511)” or “n (0-127)” off or on. DIR |
NOTP | D/#n | Invert pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT |
CLRP | D/#n | Clear pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT |
SETP | D/#n | Set pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT |
SETPC | D/#n | Set pin number given by the value in register “D (0-511)” or “n (0-127)” to C |
SETPNC | D/#n | Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !C |
SETPZ | D/#n | Set pin number given by the value in register “D (0-511)” or “n (0-127)” to Z |
SETPNZ | D/#n | Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !Z |
Each cog now features the ability, with the help of the I/O pins, to quickly stream parallel data in or out of the I/O pins aligned to a clock source. Data is streamed to/from the CLUT or WRQUAD
overlay. From there it can be quickly feed to the video generator or to the internal HUB RAM. XFR feeds data 16 Bits or 32 Bits at a time at the system clock speed.
Mnemonic | Operand | Operation |
SETXFR | D/#n | Setup the direction of the data stream, the source and destination of the data stream, and the size of the data stream given D or “n (0-63)”. |
Each cog now also features high-speed serial transfer and receive hardware for interchip communication. The hardware requires three I/O pins (SO, SI, CLK).
Mnemonic | Operand | Operation |
SNDSER | D | Sends a long (D) out of the special chip-to-chip serial port. Blocks until the long is sent. Use C flag to avoid blocking. |
RCVSER | D | Receives a long (D) in from the special chip-to-chip serial port. Blocks until the long is received. Use C flag to avoid blocking. |
SETSER | D/#n | Sets up the serial port I/O pins to use for SO, SI, and CLK given D or “n (0-63)”. |
Cogs now have the ability to remap their internal memory to help facilitate context switching between register banks. Instead of having to save a bunch of internal register to switch running
programs all references to a set of register can be changed instantaneously.
Mnemonic | Operand | Operation |
SETMAP | D/#n | Remap one cog register space to another cog register space given D or n. |
Cogs now have the ability to communicate directly to each other using the internal I/O Port D, which connects each cog to every other cog.
Mnemonic | Operand | Operation |
SETXCH | D/#n | Reconfigure Port D I/O masks given D or n to select which cogs to listen to. |
Each I/O pin is now capable of setting itself into many different modes to more easily interface with the analog world. By default, each I/O starts up in the basic robust digital I/O state. However,
once configured the I/O pin can be used for external RAM memory transfer, as an ADC, as a DAC, a Schmitt trigger, or a comparator, etc. See Figure 2 for a table of pin modes and their associated properties.
Mnemonic | Operand | Operation |
SETPORT | D/#n | Assign which port the CFGPINS instruction will configure given register “D (0-511)” or number “n (0-3)”. |
CFGPINS | D,S | Setup pins masked by register “D (0-511)” to register “S (0-511)”. The pin configuration modes are below. |
NOTE: PinA is the pin being set. PinB is its neighbor (All I/O pins have a cross coupled neighbor). Input is the Boolean statement for what the pin returns when read. Output is the statement for
what the pins outputs when it is an output (Some modes output their input to make feedback relaxation oscillators, etc). Each pin’s high and low drivers can be configured to work in many
different modes. Pins can also re-clock data sent to them locally to remove jitter in data. Every pin is setup by a 13-bit configuration value.
Code | Mode | Input | PinA Output | PinB | Compare | HHH LLL | DRIVE | |
0000_CIOHHHLLL | General I/O | PinA Logic | OUT | - | - | 000 | FAST | |
0001_CIOHHHLLL | General I/O | PinA Logic | INPUT | - | - | 001 | SLOW | |
0010_CIOHHHLLL | General I/O | PinB Logic | INPUT | - | - | 010 | 1500Ω | |
0011_CIOHHHLLL | General I/O | PinB Logic | INPUT | 1MΩ PinA | - | 011 | 10kΩ | |
0100_CIOHHHLLL | General I/O | PinA Schmitt | OUT | - | - | 100 | 100kΩ | |
0101_CIOHHHLLL | General I/O | PinA Schmitt | INPUT | - | - | 101 | 100μA | |
0110_CIOHHHLLL | General I/O | PinB Schmitt | INPUT | - | - | 110 | 10μA | |
0111_CIOHHHLLL | General I/O | PinB Schmitt | INPUT | 1MΩ PinA | - | 111 | FLOAT | |
1000_CIOHHHLLL | General I/O | PinA > VIO/2 | OUT | - | FAST | C | OUT/IN | |
1001_CIOHHHLLL | General I/O | PinA > VIO/2 | INPUT | - | FAST | 0 | LIVE | |
1010_CIOHHHLLL | General I/O | PinB > VIO/2 | INPUT | - | FAST | 1 | CLOCKED | |
1011_CIOHHHLLL | General I/O | PinB > VIO/2 | INPUT | 1MΩ PinA | FAST | I/O | IN/OUT | |
1100_CIOHHHLLL | General I/O | PinA > PinB | OUT | - | PRECISE | 0 | TRUE | |
1101_CIOHHHLLL | General I/O | PinA > PinB | INPUT | - | PRECISE | 1 | INVERTED | |
1110_CIOHHHLLL | General I/O | PinA > PinB | INPUT | 1MΩ PinA | PRECISE | |||
1111_0LLLLLLLL | Compare Level | PinA > VIO/256*L | - | - | PRECISE | |||
1111_1000xxxxx | ADC Diff, 100kΩ | PinA > VIO/2 10kΩ | 100kΩ, !IN | 10kΩ VIO/2 | FAST | |||
1111_10010xxxx | ADC Precise, DIR/OUT = Cal | ADC | 7MΩ | - | FAST | |||
1111_10011xxxx | ADC FAST, DIR/OUT = Cal | ADC | 400kΩ | - | FAST | |||
1111_101VxxCCC | DAC 75Ω, V=Video, C=Cog | 1 | 75Ω | - | - | |||
1111_110HHHLLL | SDRAM DATA I/O | PinA Logic | FAST, OUT | - | - | |||
1111_111HHHLLL | SDRAM Clock Out | 1 | FAST, OUT=1 | - | - |
Each cog has a video generator capable of generating composite, component, s-video, and VGA video. The video generator is fed pixel data through the waitvid instruction and uses the pixel data to look up colors to output from the CLUT. The video generator understands R.G.B.A.X color grouping and can handle RGB565/555/444/etc formatted data.
Mnemonic | Operand | Operation |
SETVID | D/#n | Setup the video generator according to D or n to output video from the CLUT. |
SETVIDY | D/#n | Setup the video generator color matrix transform term Y according to D or n. |
SETVIDI | D/#n | Setup the video generator color matrix transform term I according to D or n.. |
SETVIDQ | D/#n | Setup the video generator color matrix transform term Q according to D or n. |
Each cog has four DACs capable of SIN/COS wave output, saw tooth wave output, triangle wave output, and square wave output. Additionally, the video generator, when operational, will use the four DACs to produce video output. Please refer to the information below.
o DAC0 = CTRASIN, DAC1 = CTRACOS, DAC2 = CTRBSIN, DAC3 = CTRBCOS
o DAC0/2 = CTRASIN + CTRBSIN, DAC1.3 = CTRACOS + CTRBCOS
o DAC0 = SYNC, DAC1 = Q/B, DAC2 = I/G, DAC3 = Y/R
Mnemonic | Operand | Operation |
CFGDAC0 | D/#n | Configure DAC0 to D or n. See above. |
CFGDAC1 | D/#n | Configure DAC1 to D or n. See above. |
CFGDAC2 | D/#n | Configure DAC2 to D or n. See above. |
CFGDAC3 | D/#n | Configure DAC3 to D or n. See above. |
SETDAC0 | D/#n | Set DAC0 to top 18 bits of D/n. |
SETDAC1 | D/#n | Set DAC1 to top 18 bits of D/n. |
SETDAC2 | D/#n | Set DAC2 to top 18 bits of D/n. |
SETDAC3 | D/#n | Set DAC3 to top 18 bits of D/n. |
CFGDACS | D/#n | Configure DACs to D or n. See above |
SETDACS | D/#n | Set DACs to top 18 bits of D/n |
Each cog has texture mapping hardware to assist the video generator with displaying textures and performing color blending on screen.
Mnemonic | Operand | Operation |
GETPIX | D | Store texture pointer address in D. |
SETPIX | D/#n | Set texture size and address to D/n |
SETPIXU | D/#n | Set texture pointer x address to D/n. |
SETPIXV | D/#n | Set texture pointer y address to D/n. |
SETPIXZ | D/#n | Set texture pointer z address to D/n. |
SETPIXR | D/#n | Set texture pointer R blending to D/n |
SETPIXG | D/#n | Set texture pointer G blending to D/n |
SETPIXB | D/#n | Set texture pointer B blending to D/n |
SETPIXA | D/#n | Set texture pointer A blending to D/n |
Each cog now features a 256 Long Color Look Up Table (CLUT) designed for use with the video generator in each cog. While the video generator is in use each long in the CLUT holds R.G.B.A.Z information for the video generator to display video with. When the video generator is not in use the CLUT may be used as a general-purpose memory scratch space, or as a 256 Long FIFO buffer, or as a call stack and evaluation stack (at the same time). The CLUT has two pointers used to index it called SPA and SPB.
Mnemonic | OPR | CLK | Operation |
GETSPD | D | 1 | SPA-SPB into D, Z/C as CHKSPD |
GETSPA | D | 1 | SPA into D, Z/C as CHKSPA |
GETSPB | D | 1 | SPB into D, Z/C as CHKSPB |
POPAR | D | 1 | Store CLUT[SPA] in register “D (0-511)” and then increment SPA |
POPBR | D | 1 | Store CLUT[SPA] in register “D (0-511)” and then increment SPB |
POPA | D | 1 | Decrement SPA and then store CLUT[SPA] in register “D (0-511)”. |
POPB | D | 1 | Decrement SPB and then store CLUT[SPB] in register “D (0-511)”. |
RETA | Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF). Flush pipeline before jump – results in a two-cycle loss. | ||
RETB | Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF). Flush pipeline before jump – results in a two-cycle loss. | ||
RETAD | Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF). Do not flush pipeline before jump – must be executed two instructions before intended jump space. | ||
RETBD | Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF). Do not flush pipeline before jump – must be executed two instructions before intended jump space. | ||
SETSPA | D/#n | 1 | Set SPA to register “D (0-511)” or “n (0-511)” |
SETSPB | D/#n | 1 | Set SPB to register “D (0-511)” or “n (0-511)” |
ADDSPA | D/#n | 1 | Add to SPA register “D (0-511)” or “n (0-511)”. |
ADDSPB | D/#n | 1 | Add to SPB register “D (0-511)” or “n (0-511)”. |
SUBSPA | D/#n | 1 | Subtract from SPA register “D (0-511)” or “n (0-511)”. |
SUBSPB | D/#n | 1 | Subtract from SPB register “D (0-511)” or “n (0-511)”. |
PUSHAR | D/#n | 1..2 | Decrement SPA and then store register “D (0 511)” in CLUT[SPA]. |
PUSHBR | D/#n | 1..2 | Decrement SPB and then store register “D (0 511)” in CLUT[SPB]. |
PUSHA | D/#n | 1..2 | Store register “D (0-511)” in CLUT[SPA] and then increment SPA. |
PUSHB | D/#n | 1..2 | Store register “D (0-511)” in CLUT[SPB] and then increment SPB. |
CALLA | D/#n | 4..5 | Store Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address “n (0-511)”. Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush. |
CALLB | D/#n | 4..5 | Store Z/C/PC* and then increment SPB and then jump to the address in register “D (0-511)” or address “n (0-511)”. Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush. |
CALLAD | D/#n | 4..5 | Store Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address “n (0-511)”... |
CALLBD | D/#n | 4..5 | Store Z/C/PC* in CLUT[SPB] and then increment SPB and then jump to the address in register “D (0-511)” or address “n (0-511)”... |
GETSPD | D | 4..5 | Stores ((SPA - SPB) & 0x7F) in register “D (0-511)”. FOR FIFO MODE. |
GETSPA | D | 4..5 | Stores SPA in register “D (0-511)”. |
GETSPB | D | 4..5 | Stores SPB in register “D (0-511)”. |
Each cog now features the ability to perform 32-bit multi-cycle multiplies, 32-bit multi-cycle divides, 32-bit multi-cycle square roots, and 32-bit CORDIC transcendental operations. All of the
advanced multi-cycle math operations use separate state machines that run concurrently with processor execution.
Note: The CORDIC algorithm rotates a point in the XY plane by a given angle. Look at X/Y/A results for SIN/COS/TAN/ARCSIN/ARCOS/ARCTAN values of X/Y/A.
Mnemonic | Operand | Operation |
GETMULL | D | Store the bottom 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet. |
GETMULH | D | Store the top 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet. |
GETDIVQ | D | Store the quotient of the divide in register “D (0-511)”, waits for divide FSM if not done yet. |
GETDIVR | D | Store the remainder of the divide in register “D (0-511)”, waits for divide FSM if not done yet. |
GETSQRT | D | Store the result of the square root in register “D (0-511)”, waits for square root FSM if not done yet. |
GETQX | D | Store the result of the CORDIC X part in register “D (0-511)”, waits for CORDIC FSM if not done yet. |
GETQY | D | Store the result of the CORDIC Y part in register “D (0-511)”, waits for CORDIC FSM if not done yet. |
GETQZ | D | Store the result of the CORDIC A part in register “D (0-511)”, waits for CORDIC FSM if not done yet. |
SETMULA | D/#n | Setup long A to be multiplied by long B given the value in register “D (0-511)” or number “n (0-511)”. Will take 16 cycles. |
SETMULB | D/#n | Setup long B to be multiplied by long A given the value in register “D (0-511)” or number “n (0-511)”. Starts multiply. |
SETDIVA | D/#n | Setup the dividend long given the value in register “D (0-511)” or number “n (0-511)”. Will take 16 cycles. |
SETDIVB | D/#n | Setup the divisor long given the value in register “D (0-511)” or number “n (0-511)”. Starts divide. |
SETQI | D/#n | Set iteration override to 0..31 (otherwise, iteration counts are load-dependent) |
SETQZ | D/#n | Setup the CORDIC state machine with the angle given by the value in register “D (0-511)” or number “n (0-511)”. |
QROTATE | D,S | Start the CORDIC rotation operation given the value in register “D (0-511) or “S (0-31)” iterations. |
QARCTAN | D,S | Start the CORDIC arc tangent operation given the value in register “D (0-511) or “S (0-31)” iterations. |
QEXP | D/#n | Start the CORDIC exponential operation given the value in register “D (0-511) or “n (0-31)” iterations. |
QLOG | D/#n | Start the CORDIC logarithmic operation given the value in register “D (0-511) or “n (0-31)” iterations. |
QSINCOS | D,S | Get sine and cosine of angle D with magnitude S (use GETQX D & GETQY D after) |
Each cog has a free running LFSR (Linear Feedback Shift Register) and System Counter that change every clock cycle. Each access of the LFSR taps into a 32 bit wide sequence of numbers
that is traversed in a pseudo random order, for a 232 .
The system counter counts the number of clock ticks since power up – it is a 64-bit counter, the LFSR is 32 Bits.
Mnemonic | Operand | Operation |
GETCNT | D | Store the bottom 32 Bits of the System Counter (CNT) in register “D (0-511)”. If executed again(no instruction in between previous execution) store the top 32 Bits of the System Counter in register “D (0-511)”. If a roll over occurs between accesses TOP-1 is stored. |
SUBCNT | D | Subtracts the system count value when the GETCNT instruction was last executed from the current system count value. Results are stored in the register referenced by “D (0-511)”. |
GETLFSR | D | Store the LSFR value in register “D (0-511)”. |
Each cog additionally has a single cycle 24-bit hardware multiplier capable of unsigned and signed multiplications. The multiplication also adds into a 64-bit register ACCx for MAC ops.
Mnemonic | Operand | Operation |
MACA | D,S | Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator A. |
MACB | D,S | Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator B. |
MUL | D,S | Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and store in register D. |
SCL | D,S | Scale the result of the multiplication of two 24 bit numbers (D,S) to fit into the 32 bit destination register specified by “D (0-512)”. |
CLRACCA | Zero Multiply Accumulator A (ACCA). | |
CLRACCB | Zero Multiply Accumulator B (ACCB). | |
CLRACCS | Zero both multiply accumulators (accumulator A and B). | |
GETACCA | D | Store the bottom 32 Bits of the A accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the A accumulator in register “D (0-511)”. |
GETACCB | D | Store the bottom 32 Bits of the B accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the B accumulator in register “D (0-511)”. |
SETACCA | D,S | Sets the high and low values of the 64 bit accumulator A. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long. |
SETACCB | D,S | Sets the high and low values of the 64 bit accumulator B. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long. |
FITACCA | Shifts accumulator A’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator A’s high long is then replaced with the number of bit places required to MSB justify Accumulator A’s original value. | |
FITACCB | Shifts accumulator B’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator B’s high long is then replaced with the number of bit places required to MSB justify Accumulator B’s original value. | |
FITACCS | Similar operation to FITACCA/FITACCB. Examines both accumulator A and B and right shifts both accumulators so that the greater value of the two accumulators is MSB justified. The number of bits shifted is written to both accumulator’s high long. This has the effect of scaling both accumulators equally. |
Each cog additionally features a number of new instructions to make many common operations much easier to perform than before. Most of the new instructions are in the extended instruction
set while a few of the new instruction are in the original set.
Mnemonic | Operand | Operation |
DECOD5 | D | Overwrite register “D (0-511)” with decoded D[4:0] repeated 1 time. (e.g. $00000001 << D[4:0]) |
DECOD4 | D | Overwrite register “D (0-511)” with decoded D[3:0] repeated 2 times. (e.g. $00010001 << D[3:0]) |
DECOD3 | D | Overwrite register “D (0-511)” with decoded D[2:0] repeated 4 times. (e.g. $01010101 << D[2:0]) |
DECOD2 | D | Overwrite register “D (0-511)” with decoded D[1:0] repeated 8 times. (e.g. $11111111 << D[1:0]) |
BLMASK | D | Overwrite register “D (0-511)” with a bit length mask specified by D[5:0]. |
NOT | D | Overwrite register “D (0-511)” with the bitwise inverted register “D (0-511)” |
ONECNT | D | Overwrite register “D (0-511)” with the count of ones in register D. |
ZERCNT | D | Overwrite register “D (0-511)” with the count of zeros in register D. |
INCPAT | D | Overwrite register “D (0-511)” with the next bit pattern that keeps the number of ones and zeros the same in register D. |
DECPAT | D | Overwrite register “D (0-511)” with the previous bit pattern that keeps the number of ones and zeros the same in register D. |
BINGRY | D | Overwrite the binary pattern in register “D (0-511)” with its gray code pattern. |
GRYBIN | D | Overwrite the grey code pattern in register “D (0-511)” with its binary pattern. |
MERGEW | D | Merge the high word and the low word of register “D (0-511)” into each other and overwrite register D with the new value. Bits of the low word occupy bit spaces 0, 2, 4, etc. Bits of the high word occupy bit spaces 1, 3, 5, etc. (Interleave) |
SPLITW | D | Split the bits of register “D (0-511)” into a high word and low word and overwrite register D with the new value. Bits of the low word come from bit spaces 0, 2, 4, etc. Bits of the high word come from bit spaces 1, 3, 5, etc. (De-interleave) |
SEUSSF | D | Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 forward iterations, the original bit pattern is returned. |
SEUSSR | D | Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 reversed iterations, the original bit pattern is returned. |
ISOB | D.b | Isolate bit “b (0-31)” of register “D (0-511).” |
NOTB | D.b | Invert bit “b (0-31)” of register “D (0-511).” |
CLRB | D.b | Clear bit “b (0-31)” of register “D (0-511).” |
SETB | D.b | Set bit “b (0-31)” of register “D (0-511).” |
SETBC | D.b | Set bit “b (0-31)” of register “D (0-511) to C.” |
SETBNC | D.b | Set bit “b (0-31)” of register “D (0-511) to NC.” |
SETBZ | D.b | Set bit “b (0-31)” of register “D (0-511) to Z.” |
SETBNZ | D.b | Set bit “b (0-31)” of register “D (0-511) to NZ.” |
Mnemonic | Operand | Operation |
PUSHZC | D | Push the Z and C flags into D[1:0] and pop D[31:30] into Z and C through WZ and WC. |
POPZC | D | Pop D[1:0] into the Z and C flags and push D[31:30] into Z and C through WZ and WC. |
SETZC | D/#n, #i | Set the Z and C flags with D[1:0] through WZ and WC effects. |
Mnemonic | Operand | Operation |
REPD | D/#n | Delayed repeat of the following “i (0-31)” instructions the value in register “D(0-511)” or “n(0-511)” times. The pipeline causes a delay of three instructions before the repeated set of instructions begins to execute. |
NOPX | D/#n | Repeat the NOP instruction the value in register “D(0-511)” or “n(0-511)” times. |
SETSKIP | D/#n | Executes up to the next 32 instructions as NOPs described by the set bit pattern of a register “D(0-511)” or literal “N(0-63)”. |
Mnemonic | Operand | Operation |
ENC | D,S | Store encoded S in D. |
JMPRET | D,S | See P8X32A – No instruction change. |
ROR | D,S | See P8X32A – No instruction change. |
ROL | D,S | See P8X32A – No instruction change. |
SHR | D,S | See P8X32A – No instruction change. |
SHL | D,S | See P8X32A – No instruction change. |
RCR | D,S | See P8X32A – No instruction change. |
RCL | D,S | See P8X32A – No instruction change. |
SAR | D,S | See P8X32A – No instruction change. |
REV | D,S | See P8X32A – No instruction change. |
MINS | D,S | See P8X32A – No instruction change. |
MAXS | D,S | See P8X32A – No instruction change. |
MIN | D,S | See P8X32A – No instruction change. |
MAX | D,S | See P8X32A – No instruction change. |
MOVS | D,S | See P8X32A – No instruction change. |
MOVD | D,S | See P8X32A – No instruction change. |
MOVI | D,S | See P8X32A – No instruction change. |
JMPRETD | D,S | See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
AND | D,S | See P8X32A – No instruction change. |
ANDN | D,S | See P8X32A – No instruction change. |
OR | D,S | See P8X32A – No instruction change. |
XOR | D,S | See P8X32A – No instruction change. |
MUXC | D,S | See P8X32A – No instruction change. |
MUXNC | D,S | See P8X32A – No instruction change. |
MUXZ | D,S | See P8X32A – No instruction change. |
MUXNZ | D,S | See P8X32A – No instruction change. |
ADD | D,S | See P8X32A – No instruction change. |
SUB | D,S | See P8X32A – No instruction change. |
ADDABS | D,S | See P8X32A – No instruction change. |
SUBABS | D,S | See P8X32A – No instruction change. |
SUMC | D,S | See P8X32A – No instruction change. |
SUMNC | D,S | See P8X32A – No instruction change. |
SUMZ | D,S | See P8X32A – No instruction change. |
SUMNZ | D,S | See P8X32A – No instruction change. |
MOV | D,S | See P8X32A – No instruction change. |
NEG | D,S | See P8X32A – No instruction change. |
ABS | D,S | See P8X32A – No instruction change. |
ABSNEG | D,S | See P8X32A – No instruction change. |
NEGC | D,S | See P8X32A – No instruction change. |
NEGNC | D,S | See P8X32A – No instruction change. |
NEGZ | D,S | See P8X32A – No instruction change. |
NEGNZ | D,S | See P8X32A – No instruction change. |
CMPS | D,S | See P8X32A – No instruction change. |
CMPSX | D,S | See P8X32A – No instruction change. |
ADDX | D,S | See P8X32A – No instruction change. |
SUBX | D,S | See P8X32A – No instruction change. |
ADDS | D,S | See P8X32A – No instruction change. |
SUBS | D,S | See P8X32A – No instruction change. |
ADDSX | D,S | See P8X32A – No instruction change. |
SUBSX | D,S | See P8X32A – No instruction change. |
SUBR | D,S | Subtract D from S and store in D |
CMPSUB | D,S | See P8X32A – No instruction change. |
INCMOD | D,S | Increment D between 0 and S. Wraps around to 0 when above S |
DECMOD | D,S | Decrement D between S and 0. Wraps around to S when below 0. |
IJZ | D,S | Increment D and jump to S if D is zero |
IJZD | D,S | Increment D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
IJNZ | D,S | Increment D and jump to S if D is not zero |
IJNZD | D,S | Increment D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
DJZ | D,S | Decrement D and jump to S if D is zero |
DJZD | D,S | Decrement D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
DJNZ | D,S | Decrement D and jump to S if D is not zero. |
DJNZD | D,S | Decrement D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
TJZ | D,S | See P8X32A – No instruction change. |
TJZD | D,S | See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
TJNZ | D,S | See P8X32A – No instruction change. |
TJNZD | D,S | See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space. |
SETINDA | D,S | Setup indirection register address A bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range. |
SETINDB | D,S | Setup indirection register address B bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range. |
WAITVID | D,S | Wait to pass pixels to the video generator. |
WAITCNT | D,S | Wait for the CNT[31:0] register to equal D and then add S to D and store in D. If WC is specified then wait for CNT[63:32] to equal D. |
WAITPEQ | D,S | See P8X32A – No instruction change. |
WAITPNE | D,S | See P8X32A – No instruction change. |
Each cog has 10 memory mapped registers that allow control over I/O pins and indirection. The OUTx and INx registers have now been combined to form the PIN registers. The IND registers
allow indirect register access to avoid self-modifying code. All other REGs are free.
Register | Location | Operation |
INDA | $1F6 | When read or written writes to the cog memory address set my SETINDA. After being accessed auto increments. Condition codes are not allowed to be used with INDA register access. |
INDB | $1F7 | When read or written writes to the cog memory address set my SETINDB. After being accessed auto increments. Condition codes are not allowed to be used with INDB register access. |
PINA | $1F8 | When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINA. |
PINB | $1F9 | When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINB. |
PINC | $1FA | When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINC. |
PIND | $1FB | When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PIND. |
DIRA | $1FC | Enables or disables the output functionally of PORTA. Input reading is never disabled. |
DIRB | $1FD | Enables or disables the output functionally of PORTB. Input reading is never disabled. |
DIRC | $1FE | Enables or disables the output functionally of PORTC. Input reading is never disabled. |
DIRD | $1FF | Enables or disables the output functionally of PORTD. Input reading is never disabled. |
Each cog has two counter modules – CTRA and CTRB. Each counter module has a FRQ, PHS, SIN, and COS register. The counter modules control the SIN and COS registers to track the phase and power of a signal. The FRQ and PHS registers work the same. Each counter module also has logic modes, which allow it to accumulate given different logic equations involving a selected pin A and pin B – see P8X32A. The counter modes now also feature quadrature encoder accumulation and automatic PWM generation.
Mnemonic | Operand | Operation |
GETPHSA | D | Store PHSA in D |
GETPHZA | D | Store PHSA in D and zero PHSA. |
GETCOSA | D | Store COSA in D |
GETSINA | D | Store SINA in D |
GETPHSB | D | Store PHSB in D |
GETPHZB | D | Store PHSB in D and zero PHSB |
GETCOSB | D | Store COSB in D |
GETSINB | D | Store SINB in D |
SETCTRA | D/#n | Set CTRA mode to D/n. |
SETWAVA | D/#n | Set CTRA wave mode to D/n. |
SETFRQA | D/#n | Set FRQA to D/n |
SETPHSA | D/#n | Set PSHA to D/n. |
ADDPHSA | D/#n | Add D/n to PSHA |
SUBPHSA | D/#n | Subtract D/n from PSHA. |
SYNCTRA | Wait for PHSA to overflow | |
CAPCTRA | Remove current sum from PHSA | |
SETCTRB | D/#n | Set CTRB mode to D/n |
SETWAVB | D/#n | Set CTRB wave mode to D/n. |
SETFRQB | D/#n | Set FRQB to D/n |
SETPHSB | D/#n | Set PSHB to D/n |
ADDPHSB | D/#n | Add D/n to PSHB |
SUBPHSB | D/#n | Subtract D/n from PSHB |
SYNCTRB | Wait for PHSB to overflow | |
CAPCTRB | Remove current sum from PHSB |
Each cog has a field mover that can move a byte or word from any field in S into any field in D. To use the field mover, you must first configure it using SETF. Then, you can use MOVF to perform the moves.
SETF uses a 9-bit value %W_DDdd_SSss to configure the field mover:
W | word/byte | DD | D field mode | dd | D field pointer | SS | S field mode | ss | S field pointer |
0 | byte mode | %00 | D field pointer stays same after MOVF | %00 | byte 0 / word 0 | %00 | S field pointer stays same after MOVF | %00 | byte 0 / word 0 |
1 | word mode | %01 | D field pointer stays same after MOVF, D rotates left by byte/word | %01 | byte 1 / word 0 | %01 | S field pointer stays same after MOVF | %01 | byte 1 / word 0 |
%10 | D field pointer increments after MOVF | %10 | byte 2 / word 1 | %10 | S field pointer increments after MOVF | %10 | byte 2 / word 1 | ||
%11 | D field pointer deccrements after MOVF | %11 | byte 3 / word 1 | %11 | S field pointer deccrements after MOVF | %11 | byte 3 / word 1 |
On cog startup, SETF is initialized to %0_0100_0000, so that MOVF will rotate D left by 8 bits and then fill the bottom byte with the lower byte in S.
Mnemonic | Operand | Operation | Clocks |
SETF | D | Configure field mover with D | 1 |
SETF | #n | Configure field mover with 0..511 | 1 |
MOVF | D,S | Move field from S into D | 1 |
MOVF | D,#n | Move field from 0..511 into D | 1 |
The hub contains a 64-bit counter called CNT that increments on each clock cycle. Each cog can use CNT to mark time in various ways. On chip reset, the ROM Booter initializes CNT to $00000000_00000000. For the purpose of describing the cog instructions which relate to CNT, the lower long of CNT is alternately
called CNTL and the upper long, delayed by one clock cycle, is called CNTH. The one-clock delay of CNTH enables proper reading of the entire CNT value when two instructions must be used in sequence to access its bottom and top longs.
Mnemonic | Operand | Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1) | Clocks |
SUBCNT | D | Subtracts D from CNTL, then CNTH Get CNTL minus D into D. If another SUBCNT is executed in the next clock cycle by the same task, it gets CNTH minus D minus carry from previous SUBCNT into D. In either case, the logical not of the MSB of the D result (not the carry) goes into C, indicating by C=1 if CNTL (or CNT) has exceeded the original D value(s). | 1 |
CMPCNT | D | Compares D to CNTL, then CNTH Same as SUBCNT, but doesn't store the D result(s). Useful for periodic checking if a time target has been reached yet. | 1 |
PASSCNT | D | Loops until CNTL passes D Jump to self if MSB of CNTL minus D is 1. In other words, loop until CNTL exceeds D. This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs. | 1* |
GETCNT | D | Get CNTL into D. If another GETCNT is executed in the next clock cycle by the same task, it gets CNTH into D. | 1 |
WAITCNT | D,S | Wait for CNTL or CNT (WC), D += S Wait for CNTL to be equal to D. Adds S/#n into D. | ? |
WAITPEQ | D,S WC | Wait for (pins & S) = D with timeout Like WAITPEQ without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPEQ condition was met, or 1 if the timeout occurred first. | ? |
WAITPNE | D,S WC | Wait for (pins & S) = D with timeout Like WAITPNE without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPNE condition was met, or 1 if the timeout occurred first. | ? |
* 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)
'Measure time using lower 32 bits of CNT GETCNT ticks 'get CNTL into ticks <somecode> 'execute some code SUBCNT ticks 'get CNTL minus ticks into ticks, <somecode> took ticks-1 to execute 'Measure time using full 64 bits of CNT (single task) GETCNT ticks_low 'get CNT into {ticks_high, ticks_low} GETCNT ticks_high <somecode> 'execute some code SUBCNT ticks_low 'get CNT minus {ticks_high, ticks_low} into {ticks_high, ticks_low} SUBCNT ticks_high 'Do something for some time GETCNT ticks 'get CNTL ADD ticks,#500 'add 500 loop <somecode> 'execute some code CMPCNT ticks WC 'check if 500 clocks have elapsed yet if_nc JMP #loop 'if not, loop 'Do something every Nth clock (multi-task) GETCNT ticks 'get CNTL loop ADD ticks,#500 'add 500 PASSCNT ticks 'wait for next 500th clock <somecode> 'execute some code jmp #loop 'loop 'Do something every Nth clock (single-task) GETCNT ticks 'get CNTL ADD ticks,#500 'add initial 500 loop WAITCNT ticks,#500 'wait for next 500th clock, add next 500 <somecode> 'execute some code jmp #loop 'loop 'Wait for pins to equal a value, with time-out GETCNT ticks 'get CNTL ADD ticks,#200 'allow 200 clock cycles for WAITPEQ WAITPEQ value,mask WC 'wait for (pins & mask) = value if_c JMP #timeout 'if C=1 then timeout |
instruction mnem operand |
Every assembly instruction can conditionally update the Z and/or C flag with WC and WZ effects. Additionally, the result can conditionally be written using the NR and WR flags. In addition, instructions can be conditionally executed given the Z and/or C flag—see P8X32A.
Topic | URL |
Hub Memory Instructions | |
Hub Control Instructions | |
COG RAM Indirection | |
COG Stack RAM | |
Multitasking | |
Pipeline | |
DECODx Instructions | |
QUAD Instructions | |
NOTE: This section is intended only for use by the editors of this document.
TASK | Documenters | Notes | Status |
Assembly Language Reference | Seairth | ||
Hardware associated doc | Peter Jakacki | ||
P2 document updates from Chip | |||
Scavenging useful notes and examples | |||
Assembler Language summary | Cluso99 | Similar to P1's summary | |
TASK | Notes | Status |