Updated: Sept 1015 - starting to make changes to document to reflect all that we know about P2-2015. Append with new information to be reintegrated into document.

As the document is updated the text will be changed from red to black to indicate its status.

Webpage URL: This is the webpage version which is automatically generated from the document so refresh your browser often as updates are frequent

IMPORTANT NOTE: This is an unofficial document maintained by various forum/community members.

TABLE OF CONTENTS

Code Protection and Encryption

Supported Languages

Packaging

Diagram: Pinout

Diagram: Schematic Symbol (pbj)

Table: Pin Definitions

Diagram: P1 QFP vs P2 TQFP Footprint

Memory

Hub Memory

Cog Memory

Diagram 2: Cog Memory and Registers

Hub

Hub Memory Instructions

Table: Hub Memory Instructions

PTR Expressions:

Table: PTR Expressions

Examples: Using the PTR

Table: Memory Addressing Example

PTRA/PTRB Instructions

Table: PTR Instructions

QUAD related Instructions

Read Cache

Mapping QUAD Registers

Hiding QUAD Registers

Table: QUAD related Instructions

Hub Control Instructions

Table: Hub Control Instructions

Indirect Registers

Table: INDA/INDB Usage Scheme

Tabke: INDA/INDB Instructions

Example: Indirect Pointer Usage

Stack RAM

Table: Stack RAM Instructions

Instruction Pipeline

Example: Single-task self-modifying code

Example: Single-task delayed branch

Example: Two-task delayed branch (SETTASK #%%1010 timing)

Table: Branching Instructions

Instruction-Block Repeating

Example: Using REP instruction

Table: REP Instructions

Example: Starting four tasks

Table: Task Instructions

Example: Register Remapping

Tips for coding multi-tasking programs

Tasks and the Pipeline

Avoiding Pipeline Stall

Other instruction alternatives:

Instructions to avoid in multi-tasking

I/O Ports

Table 15: Port Access Instructions

Table 16: Pin State Access Instructions

External RAM

Table 17: External RAM Instruction

InterChip Communication

Table 18: InterChip Communication Instructions

Cog Memory Remapping

Table 19: Cog Memory Remapping Instruction

InterCog Communication

Table 20: InterCog Communication Instruction

Pin Modes

Table 21: Pin Mode Access Instructions

Figure 2: Pin Modes

Video Generator

Table 22: Video Generator Access Instructions

DAC Hardware

Table 23: DAC Hardware Access Instructions

Texture Mapping

Table 24: Texture Mapping Instructions

CLUT or Stack RAM

Table 6: CLUT Instructions

Math

Table 7: Math Operation Instructions

Miscellaneous Hardware

LFSR

System Counter

Table 8: System Counter Instructions

Multiply Accumulate

Table 9: Multiply and Accumulate Instructions

Miscellaneous Instructions

Table 10: Extended Miscellaneous Instructions

Table 11: Extended Miscellaneous Flag Manipulation Instructions

Table 12: Extended Miscellaneous Flow Control Instructions

Table 13: Miscellaneous Instructions

Table 14: Register Map Setup

Counter Modules

Table 25: Counter Hardware Access Instructions

Byte/Word Field Mover

Table: Field mover configuration bits

Table: Byte/Word Field Mover Instructions

Hub Counter

Table: Hub Counter Instructions

Example: Hub Counter

Table: Instruction List

Effects and Condition Codes

Links

Assembler Reference Section

Assembler Instruction Summary Chart

Appendix A. Original Documentation Sources

Appendix Z. Style Guide and Templates

DOCUMENT TASK LIST

TO DO

Introduction

The Propeller 2 is a general-purpose 32-bit microcontroller with 8 symmetric processors called “cogs.” Each cog has 512 longs (2 KB) of memory from which it executes instructions.. Most instructions execute in a single clock cycle, with certain math intensive operations taking up to 31 clock cycles to complete. Additionally, there are a 4 stage pipeline, interrupt support and smart I/O pins that operate in a variety of modes.

The hub allows each cog round-robin access to the main hub RAM; depending on the hub’s access window relative to the cog, access to hub RAM can take up to 7 clocks (if the access window was just missed) or as little as 0 clocks (if the cog is next in line for the access window). Additionally, the developer has the ability to set a one-time settable encryption key in the chip to protect code downloaded to the chip. On system startup the chip will use this protected key to decrypt the encrypted program that is stored externally in non-volatile EEPROM/FLASH. The encryption key is not accessible by any user code.

If no encryption has been set, the Propeller 2 will boot from Serial, SPI Flash and finally present it’s monitor on pins 90(rx) and 91(tx).

Features

General

32-bit, general purpose multi-core microcontroller
8 identical processors (cogs)
128-pin TQFP package

20 KHz and 20 MHz internal RC oscillator.
External oscillator or 10MHz to 20MHz crystal.
The chip is expected to be clocked at 160 MHz in normal operation, across the full industrial temperature range. With all eight cogs running at full capacity, 1,280 MIPS can be achieved.

Clock Speed

160 MHz planned maximum clock speed
Internal RC: 20 kHz or 20 MHz (cannot use PLL)
External oscillator: DC to 160 MHz (without PLL) or 10 MHz to 32 MHz (with PLL) for system clock speed of 160 MHz maximum
PLL modes: 1x, 2x, 3x ... 15x, 16x input clock multiplier

Performance Metrics

4-stage pipeline
Most instructions are single cycle
1.28 BIPS (160 MIPS x 8 cogs) maximum instruction execution rate(1); assumes that all cogs are running, their pipelines are always full, and only single-cycle instructions are being executed

Memory

Main memory: 127,360B RAM + 3.7 KB ROM
Cog memory: 2 KB (512 longs) cog RAM + 256 long stack
Optional external 32-bit addressable SDRAM for run-time data workspace; code space is not extendable
Non-volatile application and data storage via xternal SPI EEPROM or SD card
Cogs can access Main Memory at each hub access window in units of 1 byte, 1 word, 1 long, or 4 contiguous quad-aligned longs.
Hub access window arrives for each cog in a round-robin fashion every 8 cycles.

Power Specification

Core voltage: 1.8 VDC
I/O pin voltage: 1.8 VDC–3.3 VDC
Current source or sink per I/O: 40 mA
Total current draw @ 1.8 VDC Core, 3.3 VDC I/O, 25° C: TBD

1.8 V Core – 3.3/1.8 V I/O pins. (Each group of 8 I/O pins is powered by a VP pin and GP).

I/O

92 I/O pins total: 84 fully general purpose I/O + 8 additional general purpose I/O available after boot-up
Each I/O pin is planned(1) to have internal:

Input ADC
Output DAC
True or inverted input/output
Differential input/output
Comparator
Schmitt input

Counter Modules

2 counter modules, each with 2 integrated waveform generators, per cog

Video Generation

Each cog has independent video generation hardware capable of VGA, Standard PAL/NTSC, and HD up to 1080p (at 30 Hz)

Code Protection and Encryption

Propeller application and data optionally encrypted in non-volatile storage

Supported Languages

Propeller 2 Spin and Propeller 2 Assembly
Propeller 2 Assembly is not fully backwards compatible with Propeller 1 Assembly
Some Propeller 1 Spin code may need to be ported to the Propeller 2

Packaging

Package Type: (T)QFP-128 Package Size: 14mm Pin pitch: 0.4mm No center pad
Diagram: Pinout	Diagram: Schematic Symbol (pbj)

Table: Pin Definitions

PIN	NAME	TYPE	NOTES
01	GND	GND
02	P0	I/O
03	P1	I/O
04	GP0	I/O GND
05	P2	I/O
06	P3	I/O
07	P4	I/O
08	P5	I/O
09	VP0	I/O PWR	1.8V-3.3V
10	P6	I/O
11	P7	I/O
12	P8	I/O
13	P9	I/O
14	GP1	I/O GND
15	P10	I/O
16	P11	I/O
17	P12	I/O
18	P13	I/O
19	VP1	I/O PWR	1.8V-3.3V
20	P14	I/O
21	P15	I/O
22	P16	I/O
23	P17	I/O
24	GP2	I/O GND
25	P18	I/O
26	P19	I/O
27	P20	I/O
28	P21	I/O
29	VP2	I/O PWR	1.8V-3.3V
30	P22	I/O
31	P23	I/O
32	VDD	PWR	1.8V

PIN	NAME	TYPE	NOTES
33	GND	GND
34	P24	I/O
35	P25	I/O
36	GP3	I/O GND
37	P26	I/O
38	P27	I/O
39	P28	I/O
40	P29	I/O
41	VP3	I/O PWR	1.8V-3.3V
42	P30	I/O
43	P31	I/O
44	P32	I/O
45	P33	I/O
46	GP4	I/O GND
47	P34	I/O
48	P35	I/O
49	P36	I/O
50	P37	I/O
51	VP4	I/O PWR	1.8V-3.3V
52	P38	I/O
53	P39	I/O
54	P40	I/O
55	P41	I/O
56	GP5	I/O GND
57	P42	I/O
58	P43	I/O
59	P44	I/O
60	P45	I/O
61	VP5	I/O PWR	1.8V-3.3V
62	P46	I/O
63	P47	I/O
64	VDD	PWR	1.8V

PIN	NAME	TYPE	NOTES
65	GND	GND
66	P48	I/O
67	P49	I/O
68	GP6	I/O GND
69	P50	I/O
70	P51	I/O
71	P52	I/O
72	P53	I/O
73	VP6	I/O PWR	1.8V-3.3V
74	P54	I/O
75	P55	I/O
76	P56	I/O
77	P57	I/O
78	GP7	I/O GND
79	P58	I/O
80	P59	I/O
81	P60	I/O
82	P61	I/O
83	VP7	I/O PWR	1.8V-3.3V
84	P62	I/O
85	P63	I/O
86	P64	I/O
87	P65	I/O
88	GP8	I/O GND
89	P66	I/O
90	P67	I/O
91	P68	I/O
92	P69	I/O
93	VP8	I/O PWR	1.8V-3.3V
94	P70	I/O
95	P71	I/O
96	VDD	PWR	1.8V

PIN	NAME	TYPE	NOTES
97	GND	GND
98	P72	I/O
99	P73	I/O
100	GP9	I/O GND
101	P74	I/O
102	P75	I/O
103	P76	I/O
104	P77	I/O
105	VP9	I/O PWR	1.8V-3.3V
106	P78	I/O
107	P79	I/O
108	P80	I/O
109	P81	I/O
110	GP10	I/O GND
111	P82	I/O
112	P83	I/O
113	P84	I/O
114	P85	I/O
115	VP10	I/O PWR	1.8V-3.3V
116	P86	I/O	SPI DO in
117	P87	I/O	SPI DI out
118	P88	I/O	SPI CK out
119	P89	I/O	SPI CS out
120	GP11	I/O GND
121	P90	I/O	TXD out
122	P91	I/O	RXD in
123	BOEn	IN	Brown out En
124	RESn	I/O	Reset
125	VP11	I/O PWR	1.8V-3.3V
126	XO	OUT	Crystal out
127	XI	IN	Crystal/Osc in
128	VDD	PWR	1.8V

Note: All CPU and I/O GNDs must be connected to power common.

Diagram: P1 QFP vs P2 TQFP Footprint

Note: Relative size

Memory

There are two primary types of memory, a shared HUB memory and individual COG memory.

Hub Memory

128K bytes of main memory shared by all cogs

cogs launch from this memory
cogs can access this memory as bytes, words, longs, and quads (4 longs)
$00000..$00E7F is ROM - contains Booter, SHA-256/HMAC, and Monitor
$00E80..$1FFFF is RAM - for application usage

Diagram 1: Hub Memory and Registers

Cog Memory

Each of the eight cogs contains 512 longs of register RAM and 256 longs of stack RAM.

The 512 longs of register RAM is comprised of:

504 registers (including INDx) are generally available for instructions and data
Two of these registers can be used for indirect references to cog memory
Eight registers are reserved for pin I/O and control

Special function registers such PTRx and SPAx etc are accessed via special instructions and are not part of the memory map.

The 256 longs of stack RAM for data and video usage features:

Accessible via push and pop operations
Video circuit can read data simultaneously and asynchronously

Diagram 2: Cog Memory and Registers

// P2 MEMORY MAP 24SEP2015
//
// addr read write name
// ---------------------------------------------------------------
// COG REGISTERS (9-bit addressable)
//
// 000 INA - INA / IJMP0
// 001 INB - INB / IRET0
// 002 RAM RAM+OUTA OUTA
// 003 RAM RAM+OUTB OUTB
// 004 RAM RAM+DIRA DIRA
// 005 RAM RAM+DIRB DIRB
// 006 PTRA PTRA PTRA
// 007 PTRB PTRB PTRB
//
// 008 RAM RAM user / ADRA
// 009 RAM RAM user / ADRB
// 00A RAM RAM user / IJMP1
// 00B RAM RAM user / IRET1
// 00C RAM RAM user / IJMP2
// 00D RAM RAM user / IRET2
// 00E RAM RAM user / IJMP3
// 00F RAM RAM user / IRET3
//
// 010-1FF RAM RAM user
// ---------------------------------------------------------------
// LUT
// 200-3FF RAM RAM user / cog-exec
//
// LUT (possible expansion)
// 400-5FF RAM RAM user / cog-exec
// ---------------------------------------------------------------
// HUB
// 00000-7FFFF RAM RAM user / hub-exec
//
// HUB (future expansion)
// 80000-FFFFF RAM RAM user / hub-exec
// ---------------------------------------------------------------
// HUB ROM
// 00000-03FFF (not accessible) boot
// ---------------------------------------------------------------

Hub

Each cog now features two 17 bit pointer registers called PTRA and PTRB and a 16-byte/8-word/4-long read cache. The register pointers can be used for any hub memory read or write operation. They feature auto incrementing and decrementing with pre or post operation.

Hub Memory Instructions

These instructions read and write hub memory.

All instructions use D as the data conduit, except WRQUAD/RDQUAD/RDQUADC, which uses the four QUAD registers. The QUADs can be mapped into cog register space using the SETQUAD instruction or kept hidden, in which case they are still useful as data conduit and as a read cache. If mapped, the QUADs overlay four contiguous cog registers which can begin at any double-even address (%xxxxxxx00). These overlaid registers can be read and written as any other registers, as well as executed. Any write via D to the QUAD registers, when mapped, will affect the underlying cog registers, as well. A RDQUAD/RDQUADC will affect the QUAD registers, but not the underlying cog registers.

The cached reads RDBYTEC/RDWORDC/RDLONGC/RDQUADC will do a RDQUAD if the current read address is outside of the 4-long window of the prior RDQUAD. Otherwise, they will immediately return cached data. The CACHEX instruction invalidates the cache, forcing a fresh RDQUAD next time a cached read executes.

Hub memory instructions must wait for their cog's hub cycle, which comes once every 8 clocks. The timing relationship between a cog's instruction stream and its hub cycle is generally indeterminant, causing these instructions to take varying numbers of clocks. Timing can be made determinant, though, by intentionally spacing these instructions apart so that after the first in a series executes, the subsequent hub memory instructions fall on hub cycles, making them take the minimal numbers of clocks. The trick is to write useful code to go in between them.

WRBYTE/WRWORD/WRLONG/WRQUAD/RDQUAD complete on the hub cycle, making them take 1..8 clocks.

RDBYTE/RDWORD/RDLONG complete on the 2nd clock after the hub cycle, making them take 3..10 clocks.

RDBYTEC/RDWORDC/RDLONGC take only 1 clock if data is cached, otherwise 3..10 clocks.

RDQUADC takes only 1 clock if data is cached, otherwise 1..8 clocks.

After a RDQUAD, the QUAD registers are accessible via D and S on the 3rd clock and executable on the 5th clock.

Table: Hub Memory Instructions

INSTRUCTION	DESCRIPTION
WRBYTE D,S	Write lower byte of D to hub memory at S
RDBYTE D,S	Read byte from hub memory at S into D
RDBYTEC D,S	Read cached byte at S into D
WRWORD D,S	Write lower word of D to hub memory at S
RDWORD D,S	Read word from hub memory at S into D
RDWORDC D,S	Read cached word at S into D
WRLONG D,S	Write D to hub memory at S
RDLONG D,S	Read long from hub memory at S into D
RDLONGC D,S	Read cached long at S into D
WRQUAD D	Write QUADs to hub memory at D
RDQUAD D	Read into QUADs from hub memory at D
RDQUADC D	Conditionally read into QUADs from hub memory at D

PTR Expressions:

INDEX	-32..+31	Simple offset
INDEX	0..31	++ Auto-increments range
INDEX	0..32	-- Auto-decrement range
SCALE	1	BYTE
SCALE	2	WORD
SCALE	4	LONG
SCALE	16	QUAD

Table: PTR Expressions

SUPNNNNNN PTR expression

000000000 PTRA 'use PTRA

100000000 PTRB 'use PTRB

011000001 PTRA++ 'use PTRA, PTRA += SCALE

111000001 PTRB++ 'use PTRB, PTRB += SCALE

011111111 PTRA-- 'use PTRA, PTRA -= SCALE

111111111 PTRB-- 'use PTRB, PTRB -= SCALE

010000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE

110000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE

010111111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE

110111111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE

000NNNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE

100NNNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE

011NNNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE

111NNNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE

011nnnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE

111nnnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE

010NNNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE

110NNNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE

010nnnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE

110nnnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE

S = 0 for PTRA, 1 for PTRB

U = 0 to keep PTRx same, 1 to update PTRx

P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)

NNNNNN = INDEX

nnnnnn = -INDEX

Examples: Using the PTR

000000 Z01 1 CCCC DDDDDDDDD 000000000 RDBYTE D,PTRA 'read byte at PTRA into D

000001 000 1 CCCC DDDDDDDDD 111000001 WRWORD D,PTRB++ 'write lower word in D at PTRB, PTRB += 2

000010 Z01 1 CCCC DDDDDDDDD 011111111 RDLONG D,PTRA-- 'read long at PTRA into D, PTRA -= 4

000011 001 1 CCCC 110000001 010110001 RDQUAD ++PTRB 'read quad at PTRB+16 into QUADs, PTRB += 16

000000 000 1 CCCC DDDDDDDDD 010111111 WRBYTE D,--PTRA 'write lower byte in D at PTRA-1, PTRA -= 1

000001 000 1 CCCC DDDDDDDDD 100000111 WRWORD D,PTRB[7] 'write lower word in D to PTRB+7*2

000010 Z11 1 CCCC DDDDDDDDD 011001111 RDLONGC D,PTRA++[15] 'read cached long at PTRA into D, PTRA += 15*4

000011 001 1 CCCC 111111101 010110000 WRQUAD PTRB--[3] 'write QUADs at PTRB, PTRB -= 3*16

000000 000 1 CCCC DDDDDDDDD 010000110 WRBYTE D,++PTRA[6] 'write lower byte in D to PTRA+6*1, PTRA += 6*1

000001 Z01 1 CCCC DDDDDDDDD 110110110 RDWORD D,--PTRB[10] 'read word at PTRB-10*2 into D, PTRB -= 10*2

Bytes, words, longs, and quads are addressed as follows:

for WRBYTE/RDBYTE/RDBYTEC, address = %XXXXXXXXXXXXXXXXX (bits 16..0 are used)

for WRWORD/RDWORD/RDWORDC, address = %XXXXXXXXXXXXXXXX- (bits 16..1 are used)

for WRLONG/RDLONG/RDLONGC, address = %XXXXXXXXXXXXXXX-- (bits 16..2 are used)

for WRQUAD/RDQUAD/RDQUADC, address = %XXXXXXXXXXXXX---- (bits 16..4 are used)

Table: Memory Addressing Example

address byte word long quad

00000- 50 *7250 *706F7250 *0C7CCC030C7C200020302E32706F7250

00001- 72 7250 706F7250 0C7CCC030C7C200020302E32706F7250

00002- 6F *706F 706F7250 0C7CCC030C7C200020302E32706F7250

00003- 70 706F 706F7250 0C7CCC030C7C200020302E32706F7250

00004- 32 *2E32 *20302E32 0C7CCC030C7C200020302E32706F7250

00005- 2E 2E32 20302E32 0C7CCC030C7C200020302E32706F7250

00006- 30 *2030 20302E32 0C7CCC030C7C200020302E32706F7250

00007- 20 2030 20302E32 0C7CCC030C7C200020302E32706F7250

00008- 00 *2000 *0C7C2000 0C7CCC030C7C200020302E32706F7250

00009- 20 2000 0C7C2000 0C7CCC030C7C200020302E32706F7250

0000A- 7C *0C7C 0C7C2000 0C7CCC030C7C200020302E32706F7250

0000B- 0C 0C7C 0C7C2000 0C7CCC030C7C200020302E32706F7250

0000C- 03 *CC03 *0C7CCC03 0C7CCC030C7C200020302E32706F7250

0000D- CC CC03 0C7CCC03 0C7CCC030C7C200020302E32706F7250

0000E- 7C *0C7C 0C7CCC03 0C7CCC030C7C200020302E32706F7250

0000F- 0C 0C7C 0C7CCC03 0C7CCC030C7C200020302E32706F7250

00010- 45 *FE45 *0DC1FE45 *0D7CC6010C7CC6010CFCB6E30DC1FE45

00011- FE FE45 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45

00012- C1 *0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45

00013- 0D 0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE45

00014- E3 *B6E3 *0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45

00015- B6 B6E3 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45

00016- FC *0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45

00017- 0C 0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE45

00018- 01 *C601 *0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

00019- C6 C601 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001A- 7C *0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001B- 0C 0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001C- 01 *C601 *0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001D- C6 C601 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001E- 7C *0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

0001F- 0D 0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE45

* new word/long/quad

PTRA/PTRB Instructions

Each cog has two 17-bit pointers, PTRA and PTRB, which can be read, written, modified, and used to access hub memory.

At cog startup, the PTRA and PTRB registers are initialized as follows:

PTRA = %X_XXXXXXXX_XXXXXXXX, data from launching cog, usually a pointer

PTRB = %X_XXXXXXXX_XXXXXX00, long address in hub where cog code was loaded from

Table: PTR Instructions

INSTRUCTION

DESCRIPTION

CLOCK

GETPTRA D

get PTRA into D, C = PTRA[16]

GETPTRB D

get PTRB into D, C = PTRB[16]

SETPTRA D

SETPTRA #n

set PTRA to D

set PTRA to 0..511

SETPTRB D

SETPTRB #n

set PTRB to D

set PTRB to 0..511

ADDPTRA D

ADDPTRA #n

add D into PTRA

add 0..511 into PTRA

ADDPTRB D

ADDPTRB #n

add D into PTRB

add 0..511 into PTRB

SUBPTRA D

SUBPTRA #n

subtract D from PTRA

subtract 0..511 from PTRA

SUBPTRB D

SUBPTRB #n

subtract D from PTRB

subtract 0..511 from PTRB

QUAD related Instructions

Each cog has four QUAD registers which form a 128-bit conduit between the hub memory and the cog. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions.

Read Cache

It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC.

Mapping QUAD Registers

Each COG has four QUAD registers which form a 128-bit conduit between the HUB memory and the COG. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions. It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC .

Initially hidden, these QUAD registers are mappable into COG register space by using the SETQUAD instruction to set an address where the base register is to appear, with the other three registers following.

SETQUAZ works just like SETQUAD, but also clears the four QUAD registers.

Hiding QUAD Registers

To hide the QUAD registers, use SETQUAD to set an address which is $1F8, or higher.

Table: QUAD related Instructions

INSTRUCTION	DESCRIPTION	CLOCK
CACHEX	Invalidate QUAD cache	1
GETTOPS D	Get top bytes of QUADs into D	1
SETQUAD D	Set QUAD base address to D	1
SETQUAD #n	Set QUAD base address to 0..511	1
SETQUAZ D	set QUAD base address to D and clears the QUAD registers	1
SETQUAZ #n	set QUAD base address to 0..511 and clears the QUAD registers	1

Hub Control Instructions

These instructions are used to control hub circuits and cogs.

Hub instructions must wait for their cog's hub cycle, which comes once every 8 clocks. In cases where there is no result to wait for (ZCR = %000), these instructions complete on the hub cycle, making them take 1..8 clocks, depending on where the hub cycle is in relation to the instruction. In cases where a result is anticipated (ZCR <> %000), these instructions complete on the 1st clock after the hub cycle, making them take 2..9 clocks.

COGINIT D,S

COGINIT is used to start cogs. Any cog can be (re)started, whether it is idle or running. A cog can even execute a COGINIT to restart itself with a new program.

COGINIT uses D to specify a long address in hub memory that is the start of the program that is to be loaded into a cog, while S is a 17-bit parameter (usually an address) that will be conveyed to PTRA of the started cog. PTRB of the started cog will be set to the start address of its program that was loaded from hub memory.

SETCOG S

SETCOG must be executed before COGINIT to set the number of the cog to be started (0..7). If SETCOG sets a value with bit 3 set (%1xxx), this will cause the next idle cog to be started when COGINIT is executed, with the number of the cog started being returned in D, and the C flag returning 0 if okay, or 1 if no idle cog was available. At cog startup, SETCOG is initialized to %0000.

COGINIT Process

When a cog is started, $1F8 contiguous longs are read from hub memory (internally using RDLONGC) and written to cog registers $000..$1F7. The cog will then begin execution at $000. This process takes 1,016 clocks. (That's only 6.35us at 160MHz).

Example: COGINIT

COGID COGNUM 'what cog am I?

SETCOG COGNUM 'set my cog number

COGINIT COGPGM,COGPTR 'restart me with the ROM Monitor

COGPGM LONG $0070C 'address of the ROM Monitor

COGPTR LONG 90<<9 + 91 'tx = P90, rx = P91

COGNUM RES 1

CLKSET D

CLKSET writes the lower 9 bits of D to the hub clock register:

Table: CLKSET Fields

Bit 8	Bits 7..4	Bits 3..2	Bits 1..0
RESET	PLL MULTIPLIER FOR XI PIN INPUT*	XI / XO PIN MODE	CLOCK SELECTOR
0: continued operation	0000: PLL disabled	00: XI reads low, XO floats	00: RCFAST (~20MHz)
1: hardware reset	0001: 2x multiplier	01: XI input, XO floats	01: RCSLOW (~20KHz)
	0010: 3x multiplier	10: XI/XO crystal oscillator with 15pF internal loading and 1M-ohm feedback	10: XTAL (10MHz-20MHz)
	...	11: XI/XO crystal oscillator with 30pF internal loading and 1M-ohm feedback	11: PLL
	1110: 15x multiplier
	1111: 16x multiplier

* XI/XO Pin Mode must be set for XI input or XI/XO crystal oscillator to use PLL.

Because the the clock register is cleared to %0_0000_00_00 on reset, the chip starts up in RCFAST mode with both the crystal oscillator and the PLL disabled. Before switching to XTAL or PLL mode from RCFAST or RCSLOW, the crystal oscillator must be enabled and given 10ms to stabilize. The PLL stabilizes within 10us, so it can be enbled at the sime time as the crystal oscillator. Once the crystal is stabilized, you can switch between XTAL and RCFAST/RCSLOW without any stability concerns. If the PLL is also enabled, you can switch freely among PLL, XTAL, and RCFAST/RCSLOW modes. You can change the PLL multiplier while being in PLL mode, but beware that some frequency overshoot and undershoot will occur as the PLL settles to its new frequency. This only poses a hardware problem if you are switching upwards and the resulting overshoot might exceed the speed limit of the chip.

COGID D

COGID returns the number of the cog (0..7) into D.

COGSTOP D

COGSTOP stops the cog specified in D (0..7).

LOCKS

LOCKNEW D

LOCKRET D

LOCKSET D

LOCKCLR D

There are eight semaphore locks available in the chip which can be borrowed with LOCKNEW, returned with LOCKRET, set with LOCKSET, and cleared with LOCKCLR.

While any cog can set or clear any lock without using LOCKNEW or LOCKRET, LOCKNEW and LOCKRET are provided so that cog programs have a dynamic and simple means of acquiring and relinquishing the locks at run-time.

When a lock is set with LOCKSET, its state is set to 1 and its prior state is returned in C. LOCKCLR works the same way, but clears the lock's state to 0. By having the hub perform the atomic operation of setting/clearing and reporting the prior state, cogs can utilize locks to insure that only one cog has permission to do something at once. If a lock starts out cleared and multiple cogs vie for the lock by doing a 'LOCKSET locknum wc', the cog to get C=0 back 'wins' and he can have exclusive access to some shared resource while the other cogs get C=1 back. When the winning cog is done, he can do a 'LOCKCLR locknum' to clear the lock and give another cog the opportunity to get C=0 back.

LOCKNEW returns the next available lock into D, with C=1 if no lock was free.

LOCKRET frees the lock in D so that it can be checked out again by LOCKNEW.

LOCKSET sets the lock in D and returns its prior state in C.

LOCKCLR clears the lock in D and returns its prior state in C.

CLKSET, COGID, COGINIT, COGSTOP, and the LOCKxxx instructions will take 1..8 clocks if their Z/C/R bits are all 0, meaning they don't have to wait for anything back from the hub (no Z, C, or D result). If they are going to receive some result back, they must wait for the next cycle to receive it. Hence, those instructions which get results back take 2..9 clocks.

Table: Hub Control Instructions

INSTRUCTION	DESCRIPTION	CLOCK
SETCOG D/#n	Set cog to be used by COGINIT, b3 = use next available
COGINIT D,S	launch cog at D, cog PTRA = S	1..0
CLKSET D	set clock to D	1..8
COGID D	get cog number into D	2..9
COGSTOP D	stop cog in D	1..8
LOCKNEW D	get new lock into D, C = busy	2..9
LOCKRET D	return lock in D	1..8
LOCKSET D	set lock in D, C = prev state	2..9
LOCKCLR D	clear lock in D, C = prev state	2..9

Indirect Registers

Each cog has two indirect “registers”: INDA and INDB. INDA and INDB each consist of three hidden 9-bit registers: the pointer, the bottom limit, and the top limit. The bottom and top limits are inclusive values which set automatic wrapping boundaries for the pointer. This way, circular buffers can be established within cog RAM and accessed using simple INDA/INDB references.

INDA shares address $1F6 and INDB shares address $1F7. When either of these addresses is encountered in the D or S field, the value of the associated INDx register is used for the register address in place of the $1F6 or $1F7.

NOTE: It is still possible to access the actual registers at $1F6 and $1F7 (as opposed to the INDA and INDB registers) via the D or S field. To accomplish this, set INDA to $1F6 and set $INDB to $1F7. These will still be considered indirect instructions. Operations on these registers do not affect the hidden pointer registers.

NOTE: The registers at $1F6 and $1F7 are treated the same as all other registers when interpreted as an instruction (i.e. executed).

SETINDA/SETINDB/SETINDS is used to set or adjust the pointer value(s) while forcing the associated bottom and top limits to $000 and $1FF, respectively.

FIXINDA/FIXINDB/FIXINDS sets the pointer(s) to an inital value, while setting the bottom limit(s) to the lower of the initial and terminal values and the top limit(s) to the higher.

At cog startup, INDA and INDB are configured as if these instructions had been executed:

FIXINDA $1F6,$1F6 // Set pointer to $1F6, bottom to $1F6, top to $1F6

FIXINDB $1F7,$1F7 // Set pointer to $1F7, bottom to $1F7, top to $1F7

Because indirect addressing occurs very early in the pipeline and indirect pointers are affected earlier than the final stage where the conditional bit field (CCCC) normally comes into use, the CCCC field is repurposed for indirect operations. The top two bits of CCCC are used for indirect D and the bottom two bits are used for indirect S.

Unconditional Execution

All instructions which use indirect registers will execute unconditionally, regardless of the CCCC bits.

Here is the INDA/INDB usage scheme which repurposes the CCCC field:

Table: INDA/INDB Usage Scheme

OOOOOO ZCR I CCCC DDDDDDDDD SSSSSSSSS

xxxxxx xxx x 00xx 111110110 xxxxxxxxx D = INDA 'use INDA

xxxxxx xxx x 00xx 111110111 xxxxxxxxx D = INDB 'use INDB

xxxxxx xxx x 01xx 111110110 xxxxxxxxx D = INDA++ 'use INDA, INDA += 1

xxxxxx xxx x 01xx 111110111 xxxxxxxxx D = INDB++ 'use INDB, INDB += 1

xxxxxx xxx x 10xx 111110110 xxxxxxxxx D = INDA-- 'use INDA, INDA -= 1

xxxxxx xxx x 10xx 111110111 xxxxxxxxx D = INDB-- 'use INDB INDB -= 1

xxxxxx xxx x 11xx 111110110 xxxxxxxxx D = ++INDA 'use INDA+1, INDA += 1

xxxxxx xxx x 11xx 111110111 xxxxxxxxx D = ++INDB 'use INDB+1, INDB += 1

xxxxxx xxx 0 xx00 xxxxxxxxx 111110110 S = INDA 'use INDA

xxxxxx xxx 0 xx00 xxxxxxxxx 111110111 S = INDB 'use INDB

xxxxxx xxx 0 xx01 xxxxxxxxx 111110110 S = INDA++ 'use INDA, INDA += 1

xxxxxx xxx 0 xx01 xxxxxxxxx 111110111 S = INDB++ 'use INDB, INDB += 1

xxxxxx xxx 0 xx10 xxxxxxxxx 111110110 S = INDA-- 'use INDA, INDA -= 1

xxxxxx xxx 0 xx10 xxxxxxxxx 111110111 S = INDB-- 'use INDB INDB -= 1

xxxxxx xxx 0 xx11 xxxxxxxxx 111110110 S = ++INDA 'use INDA+1, INDA += 1

xxxxxx xxx 0 xx11 xxxxxxxxx 111110111 S = ++INDB 'use INDB+1, INDB += 1

If both D and S are the same indirect register, the two 2-bit fields in CCCC are OR'd together to get the post-modifier effect:

101000 001 0 0011 111110110 111110110 MOV INDA,++INDA 'Move @INDA+1 into @INDA, INDA += 1

100000 001 0 1100 111110111 111110111 ADD ++INDB,INDB 'Add @INDB into @INDB+1, INDB += 1

Note that only '++INDx,INDx' or 'INDx,++INDx' combinations can address different registers from the same INDx.

Here are the instructions which are used to set the pointer and limit values for INDA and INDB:

Tabke: INDA/INDB Instructions

ENCODING	INSTRUCTION	DESCRIPTION	CLOCK
111000 000 0 0001 000000000 AAAAAAAAA 111000 000 0 0011 000000000 aaaaaaaaa	SETINDA #A SETINDA a	Sets INDA pointer to 0..511* Increments/decrements INDA pointer -256..+255*	1 1
111000 000 0 0100 BBBBBBBBB 000000000 111000 000 0 1100 bbbbbbbbb 000000000	SETINDB #B SETINDB b	Sets INDB pointer to 0..511* Increments/decrements INDB pointer -256..+255*	1 1
111000 000 0 0101 BBBBBBBBB AAAAAAAAA 111000 000 0 0111 BBBBBBBBB aaaaaaaaa 111000 000 0 1101 bbbbbbbbb AAAAAAAAA 111000 000 0 1111 bbbbbbbbb aaaaaaaaa	SETINDS #B,#A SETINDS #B,a SETINDS b,#A SETINDS b,a	Sets INDB pointer to 0..511 and sets INDA pointer 0..511* Sets INDB pointer to 0..511 and increments/decrements INDA pointer -256..+255* Sets INDB pointer -256..++255 and increments/decrements INDA pointer to 0..511* Sets INDB pointer -256..++255 and increments/decrements INDA pointer -256..+255*	1 1 1 1
111001 000 0 0001 TTTTTTTTT IIIIIIIII	FIXINDA #T,#I	Sets the INDA pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher.	1
111001 000 0 0100 TTTTTTTTT IIIIIIIII	FIXINDB #T,#I	Sets the INDB pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher.	1
111001 000 0 0101 TTTTTTTTT IIIIIIIII	FIXINDS #T,#I	Sets the INDA and INDB pointers to an inital value, while setting the bottom limits to the lower of the initial and terminal values and the top limits to the higher.	1

* All SETINDx operations reset the associated bottom and top limit to $000 and $1FF, respectively

Example: Indirect Pointer Usage

111000 000 0 0001 000000000 000000101 SETINDA #5 'INDA = 5, bottom = 0, top = 511

111000 000 0 0011 000000000 000000011 SETINDA ++3 'INDA += 3, bottom = 0, top = 511

111000 000 0 1100 111111100 000000000 SETINDB --4 'INDB -= 4, bottom = 0, top = 511

111000 000 0 0111 000000111 000001000 SETINDS #7,++8 'INDB = 7, INDA += 8, bottoms = 0, tops = 511

111001 000 0 0001 000001111 000001000 FIXINDA #15,#8 'INDA = 8, bottom = 8, top = 15

111001 000 0 0100 000010000 000011111 FIXINDB #16,#31 'INDB = 31, bottom = 16, top = 31

111001 000 0 0101 001100011 000110010 FIXINDS #99,#50 'INDA/INDB = 50, bottoms = 50, tops = 99

Stack RAM

Each cog has a 256-long stack RAM that is accessible via push and pop operations. Its contents are not initialized at either reset or cog startup. So, at cog startup, it will contain whatever it happened to power up with, or whatever was last written.

There are two stack pointers called SPA and SPB which are used to address the stack memory. Aside from automatically incrementing and decrementing via pushes and pops, SPA and SPB can be set, modified, read back, and checked:

SETSPA D/#n set SPA

SETSPB D/#n set SPB

ADDSPA D/#n add to SPA

ADDSPB D/#n add to SPB

SUBSPA D/#n subtract from SPA

SUBSPB D/#n subtract from SPB

GETSPA D get SPA, SPA==0 into Z, SPA.7 into C

GETSPB D get SPB, SPB==0 into Z, SPB.7 into C

GETSPD D get SPA minus SPB, SPA==SPB into Z, SPA<SPB into C

Data can be pushed and popped in both normal and reverse directions:

PUSHA D/#n push using SPA

PUSHB D/#n push using SPB

PUSHAR D/#n push using SPA, use pop addressing

PUSHBR D/#n push using SPB, use pop addressing

POPA D pop using SPA

POPB D pop using SPB

POPAR D pop using SPA, use push addressing

POPBR D pop using SPB, use push addressing

Aside from data, the program counter and flags can be pushed and popped using calls and returns:

CALLA D/#n call using SPA

CALLB D/#n call using SPB

CALLAD D/#n call using SPA, delay branch until three trailing instructions executed

CALLBD D/#n call using SPB, delay branch until three trailing instructions executed

RETA return using SPA

RETB return using SPB

RETAD return using SPA, delay branch until three trailing instructions executed

RETBD return using SPB, delay branch until three trailing instructions executed

Table: Stack RAM Instructions

instructions (stack RAM access is shown as [SPx++] and [--SPx]) clocks adj

000011 ZC1 1 CCCC DDDDDDDDD 000010101 GETSPD D 'SPA-SPB into D, Z/C as CHKSPD 1

000011 ZC1 1 CCCC DDDDDDDDD 000010110 GETSPA D 'SPA into D, Z/C as CHKSPA 1

000011 ZC1 1 CCCC DDDDDDDDD 000010111 GETSPB D 'SPB into D, Z/C as CHKSPB 1

000011 ZC1 1 CCCC DDDDDDDDD 000011000 POPAR D 'read [SPA++] into D, MSB into C 1

000011 ZC1 1 CCCC DDDDDDDDD 000011001 POPBR D 'read [SPB++] into D, MSB into C 1

000011 ZC1 1 CCCC DDDDDDDDD 000011010 POPA D 'read [--SPA] into D, MSB into C 1

000011 ZC1 1 CCCC DDDDDDDDD 000011011 POPB D 'read [--SPB] into D, MSB into C 1

000011 ZC0 1 CCCC 000000000 000011100 RETA 'read [--SPA] into Z/C/PC* 4

000011 ZC0 1 CCCC 000000000 000011101 RETB 'read [--SPB] into Z/C/PC* 4

000011 ZC0 1 CCCC 000000000 000011110 RETAD 'read [--SPA] into Z/C/PC* 1

000011 ZC0 1 CCCC 000000000 000011111 RETBD 'read [--SPB] into Z/C/PC* 1

000011 000 1 CCCC DDDDDDDDD 010100010 SETSPA D 'set SPA to D 1

000011 001 1 CCCC 0nnnnnnnn 010100010 SETSPA #n 'set SPA to n 1

000011 000 1 CCCC DDDDDDDDD 010100011 SETSPB D 'set SPB to D 1

000011 001 1 CCCC 0nnnnnnnn 010100011 SETSPB #n 'set SPB to n 1

000011 000 1 CCCC DDDDDDDDD 010100100 ADDSPA D 'add D into SPA 1

000011 001 1 CCCC 0nnnnnnnn 010100100 ADDSPA #n 'add n into SPA 1

000011 000 1 CCCC DDDDDDDDD 010100101 ADDSPB D 'add D into SPB 1

000011 001 1 CCCC 0nnnnnnnn 010100101 ADDSPB #n 'add n into SPB 1

000011 000 1 CCCC DDDDDDDDD 010100110 SUBSPA D 'subtract D from SPA 1

000011 001 1 CCCC 0nnnnnnnn 010100110 SUBSPA #n 'subtract n from SPA 1

000011 000 1 CCCC DDDDDDDDD 010100111 SUBSPB D 'subtract D from SPB 1

000011 001 1 CCCC 0nnnnnnnn 010100111 SUBSPB #n 'subtract n from SPB 1

000011 000 1 CCCC DDDDDDDDD 010101000 PUSHAR D 'write D into [--SPA] 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101000 PUSHAR #n 'write n into [--SPA] 1 ** +1

000011 000 1 CCCC DDDDDDDDD 010101001 PUSHBR D 'write D into [--SPB] 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101001 PUSHBR #n 'write n into [--SPB] 1 ** +1

000011 000 1 CCCC DDDDDDDDD 010101010 PUSHA D 'write D into [SPA++] 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101010 PUSHA #n 'write n into [SPA++] 1 ** +1

000011 000 1 CCCC DDDDDDDDD 010101011 PUSHB D 'write D into [SPB++] 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101011 PUSHB #n 'write n into [SPB++] 1 ** +1

000011 000 1 CCCC DDDDDDDDD 010101100 CALLA D 'write Z/C/PC* into [SPA++], PC=D 4 ** +1

000011 001 1 CCCC nnnnnnnnn 010101100 CALLA #n 'write Z/C/PC* into [SPA++], PC=n 4 ** +1

000011 000 1 CCCC DDDDDDDDD 010101101 CALLB D 'write Z/C/PC* into [SPB++], PC=D 4 ** +1

000011 001 1 CCCC nnnnnnnnn 010101101 CALLB #n 'write Z/C/PC* into [SPB++], PC=n 4 ** +1

000011 000 1 CCCC DDDDDDDDD 010101110 CALLAD D 'write Z/C/PC* into [SPA++], PC=D 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101110 CALLAD #n 'write Z/C/PC* into [SPA++], PC=n 1 ** +1

000011 000 1 CCCC DDDDDDDDD 010101111 CALLBD D 'write Z/C/PC* into [SPB++], PC=D 1 ** +1

000011 001 1 CCCC nnnnnnnnn 010101111 CALLBD #n 'write Z/C/PC* into [SPB++], PC=n 1 ** +1

* bit 10 is Z, bit 9 is C, bits 8..0 are PC, upper bits are ignored or cleared

** if a stack RAM write is immediately followed by a stack RAM read, add one clock

Instruction Pipeline

forum link

Each cog has a 4-stage pipeline which all instructions progress through, in order to execute:

1st stage - Read instruction from cog register RAM

2nd stage - Determine any indirect or remapped D and S addresses, update INDA and INDB

3rd stage - Read D and S from cog register RAM

4th stage - Execute instruction, write D to cog register RAM, update Z/C/PC and any other results

On every clock cycle, the instruction data in each stage advances to the next stage, unless the instruction executing in the 4th stage is stalling the pipeline because it's waiting for something (WRBYTE waits for the hub).

To keep D and S data current within the pipeline, the resultant D from the 4th stage is passed back to the 3rd stage to substitute for any obsoleted D or S data currently being read from the cog register RAM. The same is done for instruction data currently being read in the 1st stage, but this still leaves a two-stage gap between when a register is modified and when it can be executed:

Example: Single-task self-modifying code

MOVD :inst,top9 '(initially 4th stage) modify instruction

NOP '(initially 3rd stage) 1...

NOP '(initially 2nd stage) 2... at least two instructions in-between

:inst ADD A,B '(initially 1st stage) modified instruction executes

Tasks that execute no more frequently than every 3rd time slot don't need to observe this 2-instruction spacer rule when executing self-modifying code, because their instructions will always be sufficiently spread apart in the pipeline by other tasks' instructions, enabling a just-modified instruction to be properly read and executed in that task's next time slot. If less than two spacers are afforded to a modify-execute sequence, the old instruction will be read and executed, instead of the new one. This can be used to advantage for efficient overlapped modify-execute sequences.

When a branch instruction executes, that task's program counter is abruptly changed from what had been a steadily incrementing course, requiring that the pipeline be reloaded, beginning at the new program counter address. This can leave up to three instructions in the pipeline which were trailing the branch instruction and belong to the same task as the branch.

Normally, these trailing instructions are incidental data which are not intended for execution, and therefore must be cancelled within the pipeline, so that they pass through without doing anything. However, in some cases, it may be desirable to allow those instrucions to execute, without cancellation, to increase pipeline efficiency.

To accommodate both cancelling and non-cancelling branches, branch instructions have two versions. The ones that end in the letter 'D' for 'delayed' are non-cancelling and take only one clock, but will execute any trailing pipelined instructions which belong to the branch's same task.

In a single-task program, three trailing instructions are executed before the delayed branch seems to take effect:

Example: Single-task delayed branch

JMPD #somewhere '(initially 4th stage) do a delayed jmp, then toggle P0 and cycle P1

NOTP #0 '(initially 3rd stage)

NOTP #1 '(initially 2nd stage)

NOTP #1 '(initially 1st stage) next instruction is loaded from 'somewhere'

In a two-task program with simple time slot allocation, only one trailing instruction is executed before the delayed branch seems to take effect:

Example: Two-task delayed branch (SETTASK #%%1010 timing)

JMPD #somewhere '(initially 4th stage) do a delayed jmp to 'somewhere', then toggle P0

NOTP #0 '(initially 2nd stage) next instruction is loaded from 'somewhere'

The branch instructions that don't end in the letter 'D' are what would be considered 'normal' branches, where the next instruction to execute after the branch would be the instruction which was branched to.

Table: Branching Instructions

Normal cancelling	Delayed non-cancelling	Normal cancelling	Delayed non-cancelling
JMP	JMPD	IJNZ	IJNZD
CALL	CALLD	DJZ	DJZD
RET	RETD	DJNZ	DJNZD
JMPRET	JMPRETD	TJZ	TJZD
CALLA	CALLAD	TJNZ	TJNZD
CALLB	CALLBD	JP	JPD
RETA	RETAD	JNP	JNPD
RETB	RETBD	PASSCNT
IJZ	IJZD	JMPTASK

Instruction-Block Repeating

forum link

Each cog has an instruction-block repeater that can variably repeat up to 64 instructions without any clock-cycle overhead.

REPD and REPS are used to initiate block repeats. These instructions specify how many times the trailing instruction block will be executed and how many instructions are in the block:

REPD #i - execute 1..64 instructions infinitely, requires 3 spacer instructions *
REPD D,#i - execute 1..64 instructions D+1 times, requires 3 spacer instructions *
REPD #n,#i - execute 1..64 instructions 1..512 times, requires 3 spacer instructions *

REPS #n,#i - execute 1..64 instructions 1..16384 times, requires 1 spacer instruction *

REPS differs from REPD by executing at the 2nd stage of the pipeline, instead of the 4th. By executing two stages early, it needs only one spacer instruction *.

Because of its earliness, no conditional execution is possible, so it always executes, allowing the CCCC bits to be repurposed, along with Z, to provide a 14-bit constant for the repeat count.

The instruction-block repeater will quit repeating the block if a branch instruction executes within the block. This rule does not currently apply to a JMPTASK which affects the task using the
repeater - this will be fixed at the earliest opportunity.

There is only one REPS/REPD circuit, so REPS/REPD's cannot be nested. <forum>

* Spacer instructions are required in 1-task applications to allow the pipeline to prime before repeating can commence. If REPD is used by a task that uses no more than every 4th time slot, no
spacers are needed, as three intervening instructions will be provided by the other task(s). If REPS is used by a task that uses no more than every 2nd time slot, no spacers are needed.

Example: Using REP instruction

Example (1-task):

REPD D,#1 'execute 1 instruction D+1 times

NOP '3 spacer instructions needed (could do something useful)
NOP
NOP

NOTP #0 'toggle P0, block repeats every 1 clock

Example (1-task):

REPS #20_000,#4 'execute 4 instructions 20,000 times

NOP '1 spacer instruction needed (make the most of it)

NOTP #0 'toggle P0
NOTP #1 'toggle P1
NOTP #2 'toggle P2
NOTP #3 'toggle P3, block repeats every 4 clocks

Example (4-task, SETTASK #%%3210 timing):

task0 REPD #1 'task0 will own the block repeater (no need for spacers)
NOTP #0 'toggle P0 every 4 clocks

task1 NOTP #1 'toggle P1 every 8 clocks
JMP #task1

task2 NOTP #2 'toggle P2 every 8 clocks
JMP #task2

task3 NOTP #3 'toggle P3 every 8 clocks
JMP #task3

Table: REP Instructions

Mnemonic	Operand	Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1)	Clocks
REPD	#i	execute 1..64 inst's infintely	1
REPD	D,#i	execute 1..64 inst's D+1 times	1
REPD	#n,#i	execute 1..64 inst's 1x..512x	1
REPS	#n,#i	execute 1..64 inst's 1x..16384x	1

Note that the %iiiiii field represents 1..64 instructions, not the encoded 0..63. The %nnnnnnnnn/%n___nnnn_nnnnnnnnn fields are +1-based, too.

Multi-tasking

Each cog has four sets of flags and program counters (Z/C/PC), constituting four unique tasks that can execute and switch on each instruction cycle.

At cog startup, the tasks are initialized as follows:

task Z C PC

0 0 0 $000

1 0 0 $001

2 0 0 $002

3 0 0 $003

There are 16 rotating time slots in the TASK register that determine task sequence. Initially, all time slots are set to 0, causing task 0 to execute exclusively, starting at address $000:

Task Time Slots

16 TIME SLOTS

TASK REGISTER

b31..b00

The two LSB's of TASK always determine which task will next be queued in the pipeline for execution. After each instruction cycle, the TASK register is rotated right by two bits, recycling slot 0 to slot 15 and getting the next task into the 2 LSB's.

SETTASK

To enable other tasks, SETTASK is used to set the TASK register:

SETTASK D write D to the TASK register

SETTASK #n write {n[7:0], n[7:0], n[7:0], n[7:0]} to the TASK register

If a task is given no time slot, it doesn't execute and its flags and PC stay at initial values. If a task is given a time slot, it will execute and its flags and PC will be updated at every instruction, or time slot. If an active task's time slots are all taken away, that task's flags and PC remain in the state where they left off, until it is given another time slot.

When SETTASK issues a new time slot pattern, there are already three instructions in the pipeline, so the 4th instruction after SETTASK will be from the task specified in the two LSB's of the SETTASK operand.

JMPTASK

To immediately force any of the four PC's to a new address, JMPTASK can be used. JMPTASK uses a 4-bit mask to select which PC's are going to be written. Mask bits 0..3 represent PC's 0..3. The mask value %1010 would write PC 3 and PC 1, while %0100 would write PC 2, only.

JMPTASK D,#mask force PC's in mask to D

JMPTASK #addr,#mask force PC's in mask to #addr

For every PC/task affected by a JMPTASK instruction, all affected-task instructions currently in the pipeline are cancelled. This insures that once JMPTASK executes, the next instruction from each affected task will be from the new address.

Here is an example in which all four tasks are started and each task toggles an I/O pin at a different rate:

Example: Starting four tasks

ORG

JMP #task0 'task 0 begins here when the cog starts (this JMP takes 4 clocks)

JMP #task1 'task 1 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

JMP #task2 'task 2 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

JMP #task3 'task 3 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

ctwardell suggests a correction

JMPTASK #task1,#%0010 'task 1 begins here after task 0 executes SETTASK (%0010 = Set PC1)

JMPTASK #task2,#%0100 'task 2 begins here after task 0 executes SETTASK (%0010 = Set PC2)

JMPTASK #task3,#%1000 'task 3 begins here after task 0 executes SETTASK (%0010 = Set PC3)

task0 SETTASK #%%3210 'enable all tasks (TASK = %11_10_01_00_11_10_01_00_11_10_01_00_11_10_01_00)

:loop NOTP #0 'task 0, toggle pin 0 (loops every 8 clocks)

JMP #:loop '(this JMP takes 1 clock)

task1 NOTP #1 'task 1, toggle pin 1 (loops every 12 clocks)

NOP

JMP #task1 '(this JMP takes 1 clock)

task2 NOTP #2 'task 2, toggle pin 2 (loops every 16 clocks)

NOP

JMP #task2 '(this JMP takes 1 clock)

task3 NOTP #3 'task 3, toggle pin 3 (loops every 20 clocks)

NOP

JMP #task3 '(this JMP takes 1 clock)

Note: When a normal branch instruction (JMP, CALL, RET, etc.) executes in the fourth and final stage of the pipeline, all instructions progressing through the lower three stages, which belong to the same task as the branch instruction, are cancelled. This inhibits execution of incidental data that was trailing the branch instruction.

The delayed branch instructions (JMPD, CALLD, RETD, etc.) don't do any pipeline instruction cancellation and exist to provide 1-clock branches to single-task programs, where the three instructions following the branch are allowed to execute before the new instruction stream begins to execute.

For single-task programs, normal branches take 4 clocks: 1 clock for the branch and 3 clocks for the cancelled instructions to come through the pipeline before the new instruction stream begins to execute.

For multi-tasking programs that use all four tasks in sequence (ie SETTASK #%%3210), there are never any same-task instructions in the pipeline that would require cancellation due to branching, so all branches take just 1 clock.

Table: Task Instructions

ENCODING

INSTRUCTION

DESCRIPTION

CLOCK

000011 000 1 CCCC DDDDDDDDD 01001mmmm

000011 001 1 CCCC nnnnnnnnn 01001mmmm

JMPTASK D,#mask

JMPTASK #n,#mask

Set PC's in mask to D

Set PC's in mask to 0..511

000011 000 1 CCCC DDDDDDDDD 011001011

000011 001 1 CCCC nnnnnnnnn 011001011

SETTASK D

SETTASK #n

Set TASK to D

Set TASK to n[7:0] copied 4x

Register Remapping

<forum>

Here's a little program that kicks off four tasks running the same code, but with different variable sets.

Register remapping is set up to remap 4 sets of 4 registers, according to the task executing. For tasks 0..3, hard addresses 0..3 remap to 0..3, 4..7, 8..11, or 12..15.

Example: Register Remapping

dat
org                        'longs are like nop's, get skipped

pin long        0                'task 0 data
count long        1
delay long        0
extra long        0

long        1                'task 1 data
long        5
long        0
long        0

long        2                'task 2 data
long        13
long        0
long        0

long        3                'task 3 data
long        29
long        0
long        0

setmap        #%1_010_010        'remap registers by task, 4 sets, 4 registers each
settask        #%%3210                'enable all tasks
jmptask        #loop,#%1111        'before any newly-started tasks get to execute stage, jump all tasks to loop

loop notp        pin                'toggle task x pin
mov        delay,count        'get task x delay
djnz        delay,#$        'count down delay
jmp        #loop                'loop (count + 3 clocks)

Tips for coding multi-tasking programs

While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict, remember that there's only one of each of the following cog resources and only one task can use it at a time:

SPA, SPB
INDA, INDB
PTRA, PTRB
ACCA, ACCB
32x32 multiplier
64/32 divider
64-bit square rooter
CORDIC computer
CTRA, CTRB
VID
PIX (not usable in multi-tasking, requires single-task timing)
XFR
SER
Bitfield mover

Tasks and the Pipeline

When writing multi-task programs, be aware that instructions that take multiple clocks will stall the pipeline and have a ripple effect on the tasks' timing. This may be impossible to avoid, as some task might need to access hub memory, and those instructions are not single-clock.

Avoiding Pipeline Stall

The WAITCNT/WAITPEQ/WAITPNE instructions should be coded discretely using 1-clock instructions, to avoid stalling the pipeline for excessive amounts of time.

The following instructions (WC versions) will take 1 clock, instead of potentially many, and return 1 in C if they were successful:

SNDSER D WC

RCVSER D WC

GETMULL D WC

GETMULH D WC

GETDIVQ D WC

GETDIVR D WC

GETSQRT D WC

GETQX D WC

GETQY D WC

GETQZ D WC

attempt to send serial

attempt to receive serial

attempt to get lower multiplier result

attempt to get upper multiplier result

attempt to get divider quotient result

attempt to get divider remainder result

attempt to get square root result

attempt to get CORDIC X result

attempt to get CORDIC Y result

attempt to get CORDIC Z result

Other instruction alternatives:

POLCTRA WC

POLCTRB WC

POLVID WC

PASSCNT D

JP/JNP D,S

DJNZ D,#$

returns 1 in C if CTRA rolled over, use instead of SYNCTRA

returns 1 in C if CTRB rolled over, use instead of SYNCTRB

returns 1 in C if WAITVID is ready, use to execute WAITVID without stalling

jumps to itself if some amount of time has not passed, use instead of WAITCNT

jumps based on pin states, use instead of WAITPEQ/WAITPNE

loops until done, use instead of NOP D/#n

Instructions to avoid in multi-tasking

The following instructions will not work in a multi-tasking program:

REPS/REPD

GETPIX

operate by subtracting a value from the PC every n clocks - single-task only

needs steady pipeline delays for perspective divider time - single-task only

I/O Ports

There are now 4 I/O ports built into the system – 3 are physical 32-bit I/O ports and 1 is an internal 32-bit I/O port. The I/O pins connected to each port can be configured separately.

Table 15: Port Access Instructions

Mnemonic	Operand	Operation
SETPORTA	D/#n	Assign PORTA to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.
SETPORTB	D/#n	Assign PORTB to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.
SETPORTC	D/#n	Assign PORTC to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.
SETPORTD	D/#n	Assign PORTD to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.

Table 16: Pin State Access Instructions

Mnemonic	Operand	Operation
GETP	D/#n	Get pin number given by register “D (0-511)” or “n (0-127)”into !Z or C flags.
GETPN	D/#n	Get pin number given by register “D (0-511)” or “n (0-127)”into Z or !C flags.
OFFP	D/#n	Toggle pin number given by register “D (0-511)” or “n (0-127)” off or on. DIR
NOTP	D/#n	Invert pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT
CLRP	D/#n	Clear pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT
SETP	D/#n	Set pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT
SETPC	D/#n	Set pin number given by the value in register “D (0-511)” or “n (0-127)” to C
SETPNC	D/#n	Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !C
SETPZ	D/#n	Set pin number given by the value in register “D (0-511)” or “n (0-127)” to Z
SETPNZ	D/#n	Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !Z

External RAM

Each cog now features the ability, with the help of the I/O pins, to quickly stream parallel data in or out of the I/O pins aligned to a clock source. Data is streamed to/from the CLUT or WRQUAD

overlay. From there it can be quickly feed to the video generator or to the internal HUB RAM. XFR feeds data 16 Bits or 32 Bits at a time at the system clock speed.

Table 17: External RAM Instruction

Mnemonic	Operand	Operation
SETXFR	D/#n	Setup the direction of the data stream, the source and destination of the data stream, and the size of the data stream given D or “n (0-63)”.

InterChip Communication

Each cog now also features high-speed serial transfer and receive hardware for interchip communication. The hardware requires three I/O pins (SO, SI, CLK).

Table 18: InterChip Communication Instructions

Mnemonic	Operand	Operation
SNDSER	D	Sends a long (D) out of the special chip-to-chip serial port. Blocks until the long is sent. Use C flag to avoid blocking.
RCVSER	D	Receives a long (D) in from the special chip-to-chip serial port. Blocks until the long is received. Use C flag to avoid blocking.
SETSER	D/#n	Sets up the serial port I/O pins to use for SO, SI, and CLK given D or “n (0-63)”.

Cog Memory Remapping

Cogs now have the ability to remap their internal memory to help facilitate context switching between register banks. Instead of having to save a bunch of internal register to switch running

programs all references to a set of register can be changed instantaneously.

Table 19: Cog Memory Remapping Instruction

Mnemonic	Operand	Operation
SETMAP	D/#n	Remap one cog register space to another cog register space given D or n.

InterCog Communication

Cogs now have the ability to communicate directly to each other using the internal I/O Port D, which connects each cog to every other cog.

Table 20: InterCog Communication Instruction

Mnemonic	Operand	Operation
SETXCH	D/#n	Reconfigure Port D I/O masks given D or n to select which cogs to listen to.

Pin Modes

Each I/O pin is now capable of setting itself into many different modes to more easily interface with the analog world. By default, each I/O starts up in the basic robust digital I/O state. However,

once configured the I/O pin can be used for external RAM memory transfer, as an ADC, as a DAC, a Schmitt trigger, or a comparator, etc. See Figure 2 for a table of pin modes and their associated properties.

Table 21: Pin Mode Access Instructions

Mnemonic	Operand	Operation
SETPORT	D/#n	Assign which port the CFGPINS instruction will configure given register “D (0-511)” or number “n (0-3)”.
CFGPINS	D,S	Setup pins masked by register “D (0-511)” to register “S (0-511)”. The pin configuration modes are below.

NOTE: PinA is the pin being set. PinB is its neighbor (All I/O pins have a cross coupled neighbor). Input is the Boolean statement for what the pin returns when read. Output is the statement for

what the pins outputs when it is an output (Some modes output their input to make feedback relaxation oscillators, etc). Each pin’s high and low drivers can be configured to work in many

different modes. Pins can also re-clock data sent to them locally to remove jitter in data. Every pin is setup by a 13-bit configuration value.

Figure 2: Pin Modes

Code	Mode	Input	PinA Output	PinB	Compare	HHH LLL	DRIVE
0000_CIOHHHLLL	General I/O	PinA Logic	OUT	-	-	000	FAST
0001_CIOHHHLLL	General I/O	PinA Logic	INPUT	-	-	001	SLOW
0010_CIOHHHLLL	General I/O	PinB Logic	INPUT	-	-	010	1500Ω
0011_CIOHHHLLL	General I/O	PinB Logic	INPUT	1MΩ PinA	-	011	10kΩ
0100_CIOHHHLLL	General I/O	PinA Schmitt	OUT	-	-	100	100kΩ
0101_CIOHHHLLL	General I/O	PinA Schmitt	INPUT	-	-	101	100μA
0110_CIOHHHLLL	General I/O	PinB Schmitt	INPUT	-	-	110	10μA
0111_CIOHHHLLL	General I/O	PinB Schmitt	INPUT	1MΩ PinA	-	111	FLOAT
1000_CIOHHHLLL	General I/O	PinA > VIO/2	OUT	-	FAST	C	OUT/IN
1001_CIOHHHLLL	General I/O	PinA > VIO/2	INPUT	-	FAST	0	LIVE
1010_CIOHHHLLL	General I/O	PinB > VIO/2	INPUT	-	FAST	1	CLOCKED
1011_CIOHHHLLL	General I/O	PinB > VIO/2	INPUT	1MΩ PinA	FAST	I/O	IN/OUT
1100_CIOHHHLLL	General I/O	PinA > PinB	OUT	-	PRECISE	0	TRUE
1101_CIOHHHLLL	General I/O	PinA > PinB	INPUT	-	PRECISE	1	INVERTED
1110_CIOHHHLLL	General I/O	PinA > PinB	INPUT	1MΩ PinA	PRECISE
1111_0LLLLLLLL	Compare Level	PinA > VIO/256*L	-	-	PRECISE
1111_1000xxxxx	ADC Diff, 100kΩ	PinA > VIO/2 10kΩ	100kΩ, !IN	10kΩ VIO/2	FAST
1111_10010xxxx	ADC Precise, DIR/OUT = Cal	ADC	7MΩ	-	FAST
1111_10011xxxx	ADC FAST, DIR/OUT = Cal	ADC	400kΩ	-	FAST
1111_101VxxCCC	DAC 75Ω, V=Video, C=Cog	1	75Ω	-	-
1111_110HHHLLL	SDRAM DATA I/O	PinA Logic	FAST, OUT	-	-
1111_111HHHLLL	SDRAM Clock Out	1	FAST, OUT=1	-	-

Video Generator

Each cog has a video generator capable of generating composite, component, s-video, and VGA video. The video generator is fed pixel data through the waitvid instruction and uses the pixel data to look up colors to output from the CLUT. The video generator understands R.G.B.A.X color grouping and can handle RGB565/555/444/etc formatted data.

Table 22: Video Generator Access Instructions

Mnemonic	Operand	Operation
SETVID	D/#n	Setup the video generator according to D or n to output video from the CLUT.
SETVIDY	D/#n	Setup the video generator color matrix transform term Y according to D or n.
SETVIDI	D/#n	Setup the video generator color matrix transform term I according to D or n..
SETVIDQ	D/#n	Setup the video generator color matrix transform term Q according to D or n.

DAC Hardware

Each cog has four DACs capable of SIN/COS wave output, saw tooth wave output, triangle wave output, and square wave output. Additionally, the video generator, when operational, will use the four DACs to produce video output. Please refer to the information below.

CFGDAC – 00 = 9-bit level with 9-bit dither.
CFGDAC – 01 = 9-bit level from counter with 9-bit dither from counter.

o DAC0 = CTRASIN, DAC1 = CTRACOS, DAC2 = CTRBSIN, DAC3 = CTRBCOS

CFGDAC – 10 = 9-bit level from counter with 9-bit dither from counter.

o DAC0/2 = CTRASIN + CTRBSIN, DAC1.3 = CTRACOS + CTRBCOS

CFGDAC – 11 = Video generator controlled.

o DAC0 = SYNC, DAC1 = Q/B, DAC2 = I/G, DAC3 = Y/R

Table 23: DAC Hardware Access Instructions

Mnemonic	Operand	Operation
CFGDAC0	D/#n	Configure DAC0 to D or n. See above.
CFGDAC1	D/#n	Configure DAC1 to D or n. See above.
CFGDAC2	D/#n	Configure DAC2 to D or n. See above.
CFGDAC3	D/#n	Configure DAC3 to D or n. See above.
SETDAC0	D/#n	Set DAC0 to top 18 bits of D/n.
SETDAC1	D/#n	Set DAC1 to top 18 bits of D/n.
SETDAC2	D/#n	Set DAC2 to top 18 bits of D/n.
SETDAC3	D/#n	Set DAC3 to top 18 bits of D/n.
CFGDACS	D/#n	Configure DACs to D or n. See above
SETDACS	D/#n	Set DACs to top 18 bits of D/n

Texture Mapping

Each cog has texture mapping hardware to assist the video generator with displaying textures and performing color blending on screen.

Table 24: Texture Mapping Instructions

Mnemonic	Operand	Operation
GETPIX	D	Store texture pointer address in D.
SETPIX	D/#n	Set texture size and address to D/n
SETPIXU	D/#n	Set texture pointer x address to D/n.
SETPIXV	D/#n	Set texture pointer y address to D/n.
SETPIXZ	D/#n	Set texture pointer z address to D/n.
SETPIXR	D/#n	Set texture pointer R blending to D/n
SETPIXG	D/#n	Set texture pointer G blending to D/n
SETPIXB	D/#n	Set texture pointer B blending to D/n
SETPIXA	D/#n	Set texture pointer A blending to D/n

CLUT or Stack RAM

Each cog now features a 256 Long Color Look Up Table (CLUT) designed for use with the video generator in each cog. While the video generator is in use each long in the CLUT holds R.G.B.A.Z information for the video generator to display video with. When the video generator is not in use the CLUT may be used as a general-purpose memory scratch space, or as a 256 Long FIFO buffer, or as a call stack and evaluation stack (at the same time). The CLUT has two pointers used to index it called SPA and SPB.

Table 6: CLUT Instructions

Mnemonic	OPR	CLK	Operation
GETSPD	D	1	SPA-SPB into D, Z/C as CHKSPD
GETSPA	D	1	SPA into D, Z/C as CHKSPA
GETSPB	D	1	SPB into D, Z/C as CHKSPB
POPAR	D	1	Store CLUT[SPA] in register “D (0-511)” and then increment SPA
POPBR	D	1	Store CLUT[SPA] in register “D (0-511)” and then increment SPB
POPA	D	1	Decrement SPA and then store CLUT[SPA] in register “D (0-511)”.
POPB	D	1	Decrement SPB and then store CLUT[SPB] in register “D (0-511)”.
RETA			Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF). Flush pipeline before jump – results in a two-cycle loss.
RETB			Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF). Flush pipeline before jump – results in a two-cycle loss.
RETAD			Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF). Do not flush pipeline before jump – must be executed two instructions before intended jump space.
RETBD			Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF). Do not flush pipeline before jump – must be executed two instructions before intended jump space.
SETSPA	D/#n	1	Set SPA to register “D (0-511)” or “n (0-511)”
SETSPB	D/#n	1	Set SPB to register “D (0-511)” or “n (0-511)”
ADDSPA	D/#n	1	Add to SPA register “D (0-511)” or “n (0-511)”.
ADDSPB	D/#n	1	Add to SPB register “D (0-511)” or “n (0-511)”.
SUBSPA	D/#n	1	Subtract from SPA register “D (0-511)” or “n (0-511)”.
SUBSPB	D/#n	1	Subtract from SPB register “D (0-511)” or “n (0-511)”.
PUSHAR	D/#n	1..2	Decrement SPA and then store register “D (0 511)” in CLUT[SPA].
PUSHBR	D/#n	1..2	Decrement SPB and then store register “D (0 511)” in CLUT[SPB].
PUSHA	D/#n	1..2	Store register “D (0-511)” in CLUT[SPA] and then increment SPA.
PUSHB	D/#n	1..2	Store register “D (0-511)” in CLUT[SPB] and then increment SPB.
CALLA	D/#n	4..5	Store Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address “n (0-511)”. Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush.
CALLB	D/#n	4..5	Store Z/C/PC* and then increment SPB and then jump to the address in register “D (0-511)” or address “n (0-511)”. Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush.
CALLAD	D/#n	4..5	Store Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address “n (0-511)”...
CALLBD	D/#n	4..5	Store Z/C/PC* in CLUT[SPB] and then increment SPB and then jump to the address in register “D (0-511)” or address “n (0-511)”...
GETSPD	D	4..5	Stores ((SPA - SPB) & 0x7F) in register “D (0-511)”. FOR FIFO MODE.
GETSPA	D	4..5	Stores SPA in register “D (0-511)”.
GETSPB	D	4..5	Stores SPB in register “D (0-511)”.

Math

Each cog now features the ability to perform 32-bit multi-cycle multiplies, 32-bit multi-cycle divides, 32-bit multi-cycle square roots, and 32-bit CORDIC transcendental operations. All of the

advanced multi-cycle math operations use separate state machines that run concurrently with processor execution.

Note: The CORDIC algorithm rotates a point in the XY plane by a given angle. Look at X/Y/A results for SIN/COS/TAN/ARCSIN/ARCOS/ARCTAN values of X/Y/A.

Table 7: Math Operation Instructions

Mnemonic	Operand	Operation
GETMULL	D	Store the bottom 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet.
GETMULH	D	Store the top 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet.
GETDIVQ	D	Store the quotient of the divide in register “D (0-511)”, waits for divide FSM if not done yet.
GETDIVR	D	Store the remainder of the divide in register “D (0-511)”, waits for divide FSM if not done yet.
GETSQRT	D	Store the result of the square root in register “D (0-511)”, waits for square root FSM if not done yet.
GETQX	D	Store the result of the CORDIC X part in register “D (0-511)”, waits for CORDIC FSM if not done yet.
GETQY	D	Store the result of the CORDIC Y part in register “D (0-511)”, waits for CORDIC FSM if not done yet.
GETQZ	D	Store the result of the CORDIC A part in register “D (0-511)”, waits for CORDIC FSM if not done yet.
SETMULA	D/#n	Setup long A to be multiplied by long B given the value in register “D (0-511)” or number “n (0-511)”. Will take 16 cycles.
SETMULB	D/#n	Setup long B to be multiplied by long A given the value in register “D (0-511)” or number “n (0-511)”. Starts multiply.
SETDIVA	D/#n	Setup the dividend long given the value in register “D (0-511)” or number “n (0-511)”. Will take 16 cycles.
SETDIVB	D/#n	Setup the divisor long given the value in register “D (0-511)” or number “n (0-511)”. Starts divide.
SETQI	D/#n	Set iteration override to 0..31 (otherwise, iteration counts are load-dependent)
SETQZ	D/#n	Setup the CORDIC state machine with the angle given by the value in register “D (0-511)” or number “n (0-511)”.
QROTATE	D,S	Start the CORDIC rotation operation given the value in register “D (0-511) or “S (0-31)” iterations.
QARCTAN	D,S	Start the CORDIC arc tangent operation given the value in register “D (0-511) or “S (0-31)” iterations.
QEXP	D/#n	Start the CORDIC exponential operation given the value in register “D (0-511) or “n (0-31)” iterations.
QLOG	D/#n	Start the CORDIC logarithmic operation given the value in register “D (0-511) or “n (0-31)” iterations.
QSINCOS	D,S	Get sine and cosine of angle D with magnitude S (use GETQX D & GETQY D after)

Miscellaneous Hardware

LFSR

Each cog has a free running LFSR (Linear Feedback Shift Register) and System Counter that change every clock cycle. Each access of the LFSR taps into a 32 bit wide sequence of numbers

that is traversed in a pseudo random order, for a 232 .

System Counter

The system counter counts the number of clock ticks since power up – it is a 64-bit counter, the LFSR is 32 Bits.

Table 8: System Counter Instructions

Mnemonic	Operand	Operation
GETCNT	D	Store the bottom 32 Bits of the System Counter (CNT) in register “D (0-511)”. If executed again(no instruction in between previous execution) store the top 32 Bits of the System Counter in register “D (0-511)”. If a roll over occurs between accesses TOP-1 is stored.
SUBCNT	D	Subtracts the system count value when the GETCNT instruction was last executed from the current system count value. Results are stored in the register referenced by “D (0-511)”.
GETLFSR	D	Store the LSFR value in register “D (0-511)”.

Multiply Accumulate

Each cog additionally has a single cycle 24-bit hardware multiplier capable of unsigned and signed multiplications. The multiplication also adds into a 64-bit register ACCx for MAC ops.

Table 9: Multiply and Accumulate Instructions

Mnemonic	Operand	Operation
MACA	D,S	Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator A.
MACB	D,S	Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator B.
MUL	D,S	Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and store in register D.
SCL	D,S	Scale the result of the multiplication of two 24 bit numbers (D,S) to fit into the 32 bit destination register specified by “D (0-512)”.
CLRACCA		Zero Multiply Accumulator A (ACCA).
CLRACCB		Zero Multiply Accumulator B (ACCB).
CLRACCS		Zero both multiply accumulators (accumulator A and B).
GETACCA	D	Store the bottom 32 Bits of the A accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the A accumulator in register “D (0-511)”.
GETACCB	D	Store the bottom 32 Bits of the B accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the B accumulator in register “D (0-511)”.
SETACCA	D,S	Sets the high and low values of the 64 bit accumulator A. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long.
SETACCB	D,S	Sets the high and low values of the 64 bit accumulator B. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long.
FITACCA		Shifts accumulator A’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator A’s high long is then replaced with the number of bit places required to MSB justify Accumulator A’s original value.
FITACCB		Shifts accumulator B’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator B’s high long is then replaced with the number of bit places required to MSB justify Accumulator B’s original value.
FITACCS		Similar operation to FITACCA/FITACCB. Examines both accumulator A and B and right shifts both accumulators so that the greater value of the two accumulators is MSB justified. The number of bits shifted is written to both accumulator’s high long. This has the effect of scaling both accumulators equally.

Miscellaneous Instructions

Each cog additionally features a number of new instructions to make many common operations much easier to perform than before. Most of the new instructions are in the extended instruction

set while a few of the new instruction are in the original set.

Table 10: Extended Miscellaneous Instructions

Mnemonic	Operand	Operation
DECOD5	D	Overwrite register “D (0-511)” with decoded D[4:0] repeated 1 time. (e.g. $00000001 << D[4:0])
DECOD4	D	Overwrite register “D (0-511)” with decoded D[3:0] repeated 2 times. (e.g. $00010001 << D[3:0])
DECOD3	D	Overwrite register “D (0-511)” with decoded D[2:0] repeated 4 times. (e.g. $01010101 << D[2:0])
DECOD2	D	Overwrite register “D (0-511)” with decoded D[1:0] repeated 8 times. (e.g. $11111111 << D[1:0])
BLMASK	D	Overwrite register “D (0-511)” with a bit length mask specified by D[5:0].
NOT	D	Overwrite register “D (0-511)” with the bitwise inverted register “D (0-511)”
ONECNT	D	Overwrite register “D (0-511)” with the count of ones in register D.
ZERCNT	D	Overwrite register “D (0-511)” with the count of zeros in register D.
INCPAT	D	Overwrite register “D (0-511)” with the next bit pattern that keeps the number of ones and zeros the same in register D.
DECPAT	D	Overwrite register “D (0-511)” with the previous bit pattern that keeps the number of ones and zeros the same in register D.
BINGRY	D	Overwrite the binary pattern in register “D (0-511)” with its gray code pattern.
GRYBIN	D	Overwrite the grey code pattern in register “D (0-511)” with its binary pattern.
MERGEW	D	Merge the high word and the low word of register “D (0-511)” into each other and overwrite register D with the new value. Bits of the low word occupy bit spaces 0, 2, 4, etc. Bits of the high word occupy bit spaces 1, 3, 5, etc. (Interleave)
SPLITW	D	Split the bits of register “D (0-511)” into a high word and low word and overwrite register D with the new value. Bits of the low word come from bit spaces 0, 2, 4, etc. Bits of the high word come from bit spaces 1, 3, 5, etc. (De-interleave)
SEUSSF	D	Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 forward iterations, the original bit pattern is returned.
SEUSSR	D	Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 reversed iterations, the original bit pattern is returned.
ISOB	D.b	Isolate bit “b (0-31)” of register “D (0-511).”
NOTB	D.b	Invert bit “b (0-31)” of register “D (0-511).”
CLRB	D.b	Clear bit “b (0-31)” of register “D (0-511).”
SETB	D.b	Set bit “b (0-31)” of register “D (0-511).”
SETBC	D.b	Set bit “b (0-31)” of register “D (0-511) to C.”
SETBNC	D.b	Set bit “b (0-31)” of register “D (0-511) to NC.”
SETBZ	D.b	Set bit “b (0-31)” of register “D (0-511) to Z.”
SETBNZ	D.b	Set bit “b (0-31)” of register “D (0-511) to NZ.”

Table 11: Extended Miscellaneous Flag Manipulation Instructions

Mnemonic	Operand	Operation
PUSHZC	D	Push the Z and C flags into D[1:0] and pop D[31:30] into Z and C through WZ and WC.
POPZC	D	Pop D[1:0] into the Z and C flags and push D[31:30] into Z and C through WZ and WC.
SETZC	D/#n, #i	Set the Z and C flags with D[1:0] through WZ and WC effects.

Table 12: Extended Miscellaneous Flow Control Instructions

Mnemonic	Operand	Operation
REPD	D/#n	Delayed repeat of the following “i (0-31)” instructions the value in register “D(0-511)” or “n(0-511)” times. The pipeline causes a delay of three instructions before the repeated set of instructions begins to execute.
NOPX	D/#n	Repeat the NOP instruction the value in register “D(0-511)” or “n(0-511)” times.
SETSKIP	D/#n	Executes up to the next 32 instructions as NOPs described by the set bit pattern of a register “D(0-511)” or literal “N(0-63)”.

Table 13: Miscellaneous Instructions

Mnemonic	Operand	Operation
ENC	D,S	Store encoded S in D.
JMPRET	D,S	See P8X32A – No instruction change.
ROR	D,S	See P8X32A – No instruction change.
ROL	D,S	See P8X32A – No instruction change.
SHR	D,S	See P8X32A – No instruction change.
SHL	D,S	See P8X32A – No instruction change.
RCR	D,S	See P8X32A – No instruction change.
RCL	D,S	See P8X32A – No instruction change.
SAR	D,S	See P8X32A – No instruction change.
REV	D,S	See P8X32A – No instruction change.
MINS	D,S	See P8X32A – No instruction change.
MAXS	D,S	See P8X32A – No instruction change.
MIN	D,S	See P8X32A – No instruction change.
MAX	D,S	See P8X32A – No instruction change.
MOVS	D,S	See P8X32A – No instruction change.
MOVD	D,S	See P8X32A – No instruction change.
MOVI	D,S	See P8X32A – No instruction change.
JMPRETD	D,S	See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
AND	D,S	See P8X32A – No instruction change.
ANDN	D,S	See P8X32A – No instruction change.
OR	D,S	See P8X32A – No instruction change.
XOR	D,S	See P8X32A – No instruction change.
MUXC	D,S	See P8X32A – No instruction change.
MUXNC	D,S	See P8X32A – No instruction change.
MUXZ	D,S	See P8X32A – No instruction change.
MUXNZ	D,S	See P8X32A – No instruction change.
ADD	D,S	See P8X32A – No instruction change.
SUB	D,S	See P8X32A – No instruction change.
ADDABS	D,S	See P8X32A – No instruction change.
SUBABS	D,S	See P8X32A – No instruction change.
SUMC	D,S	See P8X32A – No instruction change.
SUMNC	D,S	See P8X32A – No instruction change.
SUMZ	D,S	See P8X32A – No instruction change.
SUMNZ	D,S	See P8X32A – No instruction change.
MOV	D,S	See P8X32A – No instruction change.
NEG	D,S	See P8X32A – No instruction change.
ABS	D,S	See P8X32A – No instruction change.
ABSNEG	D,S	See P8X32A – No instruction change.
NEGC	D,S	See P8X32A – No instruction change.
NEGNC	D,S	See P8X32A – No instruction change.
NEGZ	D,S	See P8X32A – No instruction change.
NEGNZ	D,S	See P8X32A – No instruction change.
CMPS	D,S	See P8X32A – No instruction change.
CMPSX	D,S	See P8X32A – No instruction change.
ADDX	D,S	See P8X32A – No instruction change.
SUBX	D,S	See P8X32A – No instruction change.
ADDS	D,S	See P8X32A – No instruction change.
SUBS	D,S	See P8X32A – No instruction change.
ADDSX	D,S	See P8X32A – No instruction change.
SUBSX	D,S	See P8X32A – No instruction change.
SUBR	D,S	Subtract D from S and store in D
CMPSUB	D,S	See P8X32A – No instruction change.
INCMOD	D,S	Increment D between 0 and S. Wraps around to 0 when above S
DECMOD	D,S	Decrement D between S and 0. Wraps around to S when below 0.
IJZ	D,S	Increment D and jump to S if D is zero
IJZD	D,S	Increment D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
IJNZ	D,S	Increment D and jump to S if D is not zero
IJNZD	D,S	Increment D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
DJZ	D,S	Decrement D and jump to S if D is zero
DJZD	D,S	Decrement D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
DJNZ	D,S	Decrement D and jump to S if D is not zero.
DJNZD	D,S	Decrement D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
TJZ	D,S	See P8X32A – No instruction change.
TJZD	D,S	See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
TJNZ	D,S	See P8X32A – No instruction change.
TJNZD	D,S	See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space.
SETINDA	D,S	Setup indirection register address A bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range.
SETINDB	D,S	Setup indirection register address B bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range.
WAITVID	D,S	Wait to pass pixels to the video generator.
WAITCNT	D,S	Wait for the CNT[31:0] register to equal D and then add S to D and store in D. If WC is specified then wait for CNT[63:32] to equal D.
WAITPEQ	D,S	See P8X32A – No instruction change.
WAITPNE	D,S	See P8X32A – No instruction change.

Register Map

Each cog has 10 memory mapped registers that allow control over I/O pins and indirection. The OUTx and INx registers have now been combined to form the PIN registers. The IND registers

allow indirect register access to avoid self-modifying code. All other REGs are free.

Table 14: Register Map Setup

Register	Location	Operation
INDA	$1F6	When read or written writes to the cog memory address set my SETINDA. After being accessed auto increments. Condition codes are not allowed to be used with INDA register access.
INDB	$1F7	When read or written writes to the cog memory address set my SETINDB. After being accessed auto increments. Condition codes are not allowed to be used with INDB register access.
PINA	$1F8	When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINA.
PINB	$1F9	When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINB.
PINC	$1FA	When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINC.
PIND	$1FB	When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PIND.
DIRA	$1FC	Enables or disables the output functionally of PORTA. Input reading is never disabled.
DIRB	$1FD	Enables or disables the output functionally of PORTB. Input reading is never disabled.
DIRC	$1FE	Enables or disables the output functionally of PORTC. Input reading is never disabled.
DIRD	$1FF	Enables or disables the output functionally of PORTD. Input reading is never disabled.

Counter Modules

Each cog has two counter modules – CTRA and CTRB. Each counter module has a FRQ, PHS, SIN, and COS register. The counter modules control the SIN and COS registers to track the phase and power of a signal. The FRQ and PHS registers work the same. Each counter module also has logic modes, which allow it to accumulate given different logic equations involving a selected pin A and pin B – see P8X32A. The counter modes now also feature quadrature encoder accumulation and automatic PWM generation.

Table 25: Counter Hardware Access Instructions

Mnemonic	Operand	Operation
GETPHSA	D	Store PHSA in D
GETPHZA	D	Store PHSA in D and zero PHSA.
GETCOSA	D	Store COSA in D
GETSINA	D	Store SINA in D
GETPHSB	D	Store PHSB in D
GETPHZB	D	Store PHSB in D and zero PHSB
GETCOSB	D	Store COSB in D
GETSINB	D	Store SINB in D
SETCTRA	D/#n	Set CTRA mode to D/n.
SETWAVA	D/#n	Set CTRA wave mode to D/n.
SETFRQA	D/#n	Set FRQA to D/n
SETPHSA	D/#n	Set PSHA to D/n.
ADDPHSA	D/#n	Add D/n to PSHA
SUBPHSA	D/#n	Subtract D/n from PSHA.
SYNCTRA		Wait for PHSA to overflow
CAPCTRA		Remove current sum from PHSA
SETCTRB	D/#n	Set CTRB mode to D/n
SETWAVB	D/#n	Set CTRB wave mode to D/n.
SETFRQB	D/#n	Set FRQB to D/n
SETPHSB	D/#n	Set PSHB to D/n
ADDPHSB	D/#n	Add D/n to PSHB
SUBPHSB	D/#n	Subtract D/n from PSHB
SYNCTRB		Wait for PHSB to overflow
CAPCTRB		Remove current sum from PHSB

Byte/Word Field Mover

<forum>

Each cog has a field mover that can move a byte or word from any field in S into any field in D. To use the field mover, you must first configure it using SETF. Then, you can use MOVF to perform the moves.

SETF uses a 9-bit value %W_DDdd_SSss to configure the field mover:

Table: Field mover configuration bits

W	word/byte	DD	D field mode	dd	D field pointer	SS	S field mode	ss	S field pointer
0	byte mode	%00	D field pointer stays same after MOVF	%00	byte 0 / word 0	%00	S field pointer stays same after MOVF	%00	byte 0 / word 0
1	word mode	%01	D field pointer stays same after MOVF, D rotates left by byte/word	%01	byte 1 / word 0	%01	S field pointer stays same after MOVF	%01	byte 1 / word 0
		%10	D field pointer increments after MOVF	%10	byte 2 / word 1	%10	S field pointer increments after MOVF	%10	byte 2 / word 1
		%11	D field pointer deccrements after MOVF	%11	byte 3 / word 1	%11	S field pointer deccrements after MOVF	%11	byte 3 / word 1

On cog startup, SETF is initialized to %0_0100_0000, so that MOVF will rotate D left by 8 bits and then fill the bottom byte with the lower byte in S.

Table: Byte/Word Field Mover Instructions

Mnemonic	Operand	Operation	Clocks
SETF	D	Configure field mover with D	1
SETF	#n	Configure field mover with 0..511	1
MOVF	D,S	Move field from S into D	1
MOVF	D,#n	Move field from 0..511 into D	1

Hub Counter

The hub contains a 64-bit counter called CNT that increments on each clock cycle. Each cog can use CNT to mark time in various ways. On chip reset, the ROM Booter initializes CNT to $00000000_00000000. For the purpose of describing the cog instructions which relate to CNT, the lower long of CNT is alternately

called CNTL and the upper long, delayed by one clock cycle, is called CNTH. The one-clock delay of CNTH enables proper reading of the entire CNT value when two instructions must be used in sequence to access its bottom and top longs.

Table: Hub Counter Instructions

Mnemonic	Operand	Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1)	Clocks
SUBCNT	D	Subtracts D from CNTL, then CNTH Get CNTL minus D into D. If another SUBCNT is executed in the next clock cycle by the same task, it gets CNTH minus D minus carry from previous SUBCNT into D. In either case, the logical not of the MSB of the D result (not the carry) goes into C, indicating by C=1 if CNTL (or CNT) has exceeded the original D value(s).	1
CMPCNT	D	Compares D to CNTL, then CNTH Same as SUBCNT, but doesn't store the D result(s). Useful for periodic checking if a time target has been reached yet.	1
PASSCNT	D	Loops until CNTL passes D Jump to self if MSB of CNTL minus D is 1. In other words, loop until CNTL exceeds D. This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs.	1*
GETCNT	D	Get CNTL into D. If another GETCNT is executed in the next clock cycle by the same task, it gets CNTH into D.	1
WAITCNT	D,S	Wait for CNTL or CNT (WC), D += S Wait for CNTL to be equal to D. Adds S/#n into D.	?
WAITPEQ	D,S WC	Wait for (pins & S) = D with timeout Like WAITPEQ without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPEQ condition was met, or 1 if the timeout occurred first.	?
WAITPNE	D,S WC	Wait for (pins & S) = D with timeout Like WAITPNE without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPNE condition was met, or 1 if the timeout occurred first.	?

* 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)

Example: Hub Counter

'Measure time using lower 32 bits of CNT

GETCNT ticks 'get CNTL into ticks

<somecode> 'execute some code

SUBCNT ticks 'get CNTL minus ticks into ticks, <somecode> took ticks-1 to execute

'Measure time using full 64 bits of CNT (single task)

GETCNT ticks_low 'get CNT into {ticks_high, ticks_low}

GETCNT ticks_high

<somecode> 'execute some code

SUBCNT ticks_low 'get CNT minus {ticks_high, ticks_low} into {ticks_high, ticks_low}

SUBCNT ticks_high

'Do something for some time

GETCNT ticks 'get CNTL

ADD ticks,#500 'add 500

loop <somecode> 'execute some code

CMPCNT ticks WC 'check if 500 clocks have elapsed yet

if_nc JMP #loop 'if not, loop

'Do something every Nth clock (multi-task)

GETCNT ticks 'get CNTL

loop ADD ticks,#500 'add 500

PASSCNT ticks 'wait for next 500th clock

<somecode> 'execute some code

jmp #loop 'loop

'Do something every Nth clock (single-task)

GETCNT ticks 'get CNTL

ADD ticks,#500 'add initial 500

loop WAITCNT ticks,#500 'wait for next 500th clock, add next 500

<somecode> 'execute some code

jmp #loop 'loop

'Wait for pins to equal a value, with time-out

GETCNT ticks 'get CNTL

ADD ticks,#200 'allow 200 clock cycles for WAITPEQ

WAITPEQ value,mask WC 'wait for (pins & mask) = value

if_c JMP #timeout 'if C=1 then timeout

Table: Instruction List

instruction                                         mnem         operand
-------------------------------------------------------------------------------------------------
000000 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRBYTE        D,S/PTR                (waits for hub)
000000 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDBYTE        D,S/PTR                (waits for hub)
000000 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDBYTEC        D,S/PTR                (waits for hub if cache miss)

000001 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRWORD        D,S/PTR                (waits for hub)
000001 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDWORD        D,S/PTR                (waits for hub)
000001 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDWORDC        D,S/PTR                (waits for hub if cache miss)

000010 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRLONG        D,S/PTR                (waits for hub)
000010 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDLONG        D,S/PTR                (waits for hub)
000010 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDLONGC        D,S/PTR                (waits for hub if cache miss)

000011 ZCR 0 CCCC DDDDDDDDD SSSSSSSSS                COGINIT        D,S                (waits for hub)

000011 ZCR 1 CCCC DDDDDDDDD 000000000                CLKSET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000001                COGID        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000010         ( COGINIT        D )                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000011                COGSTOP        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000100                LOCKNEW        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000101                LOCKRET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000110                LOCKSET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000111                LOCKCLR        D                (waits for hub)

000011 ZCR 1 CCCC 000000000 000001000                CACHEX
000011 ZCR 1 CCCC 000000001 000001000                CLRACCA
000011 ZCR 1 CCCC 000000010 000001000                CLRACCB
000011 ZCR 1 CCCC 000000011 000001000                CLRACCS
000011 ZCR 1 CCCC 000000101 000001000                FITACCA                        (waits for mac)
000011 ZCR 1 CCCC 000000110 000001000                FITACCB                        (waits for mac)
000011 ZCR 1 CCCC 000000111 000001000                FITACCS                        (waits for mac)

000011 ZC0 1 CCCC DDDDDDDDD 000001001                SNDSER        D                (waits for tx if !wc)
000011 ZC1 1 CCCC DDDDDDDDD 000001001                RCVSER        D                (waits for rx if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000001010                PUSHZC        D
000011 ZCR 1 CCCC DDDDDDDDD 000001011                POPZC        D

000011 ZCR 1 CCCC DDDDDDDDD 000001100                SUBCNT        D                (subtracts D from cnt[31:0], then cntl if same thread)
000011 ZC0 1 CCCC DDDDDDDDD 000001101                PASSCNT        D                (loops if (cnt[31:0] - D) msb set)
000011 ZC1 1 CCCC DDDDDDDDD 000001101                GETCNT        D                (gets cnt[31:0], then cntl if same thread)
000011 ZCR 1 CCCC DDDDDDDDD 000001110                GETACCA        D                (gets acca[31:0], then acca[63:32], waits for mac)
000011 ZCR 1 CCCC DDDDDDDDD 000001111                GETACCB        D                (gets accb[31:0], then accb[63:32], waits for mac)

000011 ZCR 1 CCCC DDDDDDDDD 000010000                GETLFSR        D
000011 ZCR 1 CCCC DDDDDDDDD 000010001                GETTOPS        D                (GETTOPS wc,nr = POLVID wc)
000011 ZCR 1 CCCC DDDDDDDDD 000010010                GETPTRA        D
000011 ZCR 1 CCCC DDDDDDDDD 000010011                GETPTRB        D

000011 ZCR 1 CCCC DDDDDDDDD 000010100                GETPIX        D                (waits two clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000010101                GETSPD        D
000011 ZCR 1 CCCC DDDDDDDDD 000010110                GETSPA        D
000011 ZCR 1 CCCC DDDDDDDDD 000010111                GETSPB        D

000011 ZCR 1 CCCC DDDDDDDDD 000011000                POPAR        D
000011 ZCR 1 CCCC DDDDDDDDD 000011001                POPBR        D
000011 ZCR 1 CCCC DDDDDDDDD 000011010                POPA        D
000011 ZCR 1 CCCC DDDDDDDDD 000011011                POPB        D
000011 ZCR 1 CCCC 000000000 000011100                RETA
000011 ZCR 1 CCCC 000000000 000011101                RETB
000011 ZCR 1 CCCC 000000000 000011110                RETAD
000011 ZCR 1 CCCC 000000000 000011111                RETBD

000011 ZCR 1 CCCC DDDDDDDDD 000100000                DECOD2        D
000011 ZCR 1 CCCC DDDDDDDDD 000100001                DECOD3        D
000011 ZCR 1 CCCC DDDDDDDDD 000100010                DECOD4        D
000011 ZCR 1 CCCC DDDDDDDDD 000100011                DECOD5        D
000011 ZCR 1 CCCC DDDDDDDDD 000100100                BLMASK        D
000011 ZCR 1 CCCC DDDDDDDDD 000100101                NOT        D
000011 ZCR 1 CCCC DDDDDDDDD 000100110                ONECNT        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000100111                ZERCNT        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101000                INCPAT        D                (waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101001                DECPAT        D                (waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101010                BINGRY        D
000011 ZCR 1 CCCC DDDDDDDDD 000101011                GRYBIN        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101100                MERGEW        D
000011 ZCR 1 CCCC DDDDDDDDD 000101101                SPLITW        D
000011 ZCR 1 CCCC DDDDDDDDD 000101110                SEUSSF        D
000011 ZCR 1 CCCC DDDDDDDDD 000101111                SEUSSR        D

000011 ZCR 1 CCCC DDDDDDDDD 000110000                GETMULL        D                (waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110001                GETMULH        D                (waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110010                GETDIVQ        D                (waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110011                GETDIVR        D                (waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110100                GETSQRT        D                (waits for sqrt if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110101                GETQX        D                (waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110110                GETQY        D                (waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110111                GETQZ        D                (waits for cordic if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000111000                GETPHSA        D                (GETPHSA wc,nr = POLCTRA wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111001                GETPHZA        D                (clears phsa)
000011 ZCR 1 CCCC DDDDDDDDD 000111010                GETCOSA        D
000011 ZCR 1 CCCC DDDDDDDDD 000111011                GETSINA        D

000011 ZCR 1 CCCC DDDDDDDDD 000111100                GETPHSB        D                (GETPHSB wc,nr = POLCTRB wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111101                GETPHZB        D                (clears phsb)
000011 ZCR 1 CCCC DDDDDDDDD 000111110                GETCOSB        D
000011 ZCR 1 CCCC DDDDDDDDD 000111111                GETSINB        D

000011 Z00 1 CCCC 111111111 001iiiiii                REPD        #i                (infinite repeat)
000011 Z0N 1 CCCC nnnnnnnnn 001iiiiii                REPD        D/#n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii                REPS        #n,#i

000011 ZCN 1 CCCC nnnnnnnnn 01000----                <empty>

000011 ZCN 1 CCCC nnnnnnnnn 01001tttt                JMPTASK        D/#n,#t

000011 ZCN 1 CCCC nnnnnnnnn 010100000                NOPX        D/#n                (waits)
000011 ZCN 1 CCCC nnnnnnnnn 010100001                SETZC        D/#n                (d[1:0] into z/c via wz/wc)
000011 ZCN 1 CCCC Dnnnnnnnn 010100010                SETSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100011                SETSPB        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100100                ADDSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100101                ADDSPB        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100110                SUBSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100111                SUBSPB        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010101000                PUSHAR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101001                PUSHBR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101010                PUSHA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101011                PUSHB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101100                CALLA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101101                CALLB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101110                CALLAD        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101111                CALLBD        D/#n

000011 ZCN 1 CCCC SUPIIIIII 010110000                WRQUAD        D/PTR                (waits for hub)
000011 Z0N 1 CCCC SUPIIIIII 010110001                RDQUAD        D/PTR                (waits for hub)
000011 Z1N 1 CCCC SUPIIIIII 010110001                RDQUADC        D/PTR                (waits for hub if cache miss)
000011 ZCN 1 CCCC nnnnnnnnn 010110010                SETPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110011                SETPTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110100                ADDPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110101                ADDPTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110110                SUBPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110111                SUBPTRB        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010111000                SETPIX        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111001                SETPIXU        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111010                SETPIXV        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111011                SETPIXZ        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111100                SETPIXA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111101                SETPIXR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111110                SETPIXG        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111111                SETPIXB        D/#n

000011 Z0N 1 CCCC nnnnnnnnn 011000000                SETMULU        D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011000000                SETMULA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000001                SETMULB        D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011000010                SETDIVU        D/#n                (loads [31:0], then [63:32])
000011 Z1N 1 CCCC nnnnnnnnn 011000010                SETDIVA        D/#n                (loads [31:0], then [63:32])
000011 ZCN 1 CCCC nnnnnnnnn 011000011                SETDIVB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000100                SETSQRH        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000101                SETSQRL        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000110                SETQI        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000111                SETQZ        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011001000                QLOG        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001001                QEXP        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001010                SETF        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001011                SETTASK        D/#n

000011 ZCN 1 CCCC DDDDDDDnn 011001100                CFGDAC0        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001101                CFGDAC1        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001110                CFGDAC2        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001111                CFGDAC3        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011010000                SETDAC0        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010001                SETDAC1        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010010                SETDAC2        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010011                SETDAC3        D/#n

000011 ZCN 1 CCCC Dnnnnnnnn 011010100                CFGDACS        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010101                SETDACS        D/#n

000011 ZCN 1 CCCC DDnnnnnnn 011010110                GETP        D/#n                (pin into !z/c via wz/wc)
000011 ZCN 1 CCCC DDnnnnnnn 011010111                GETNP        D/#n                (pin into z/!c via wz/wc)

000011 ZCN 1 CCCC DDnnnnnnn 011011000                OFFP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011001                NOTP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011010                CLRP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011011                SETP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011100                SETPC        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011101                SETPNC        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011110                SETPZ        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011111                SETPNZ        D/#n

000011 ZCN 1 CCCC DDDDDnnnn 011100000                SETCOG        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011100001                SETMAP        D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011100010                SETQUAD        D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011100010                SETQUAZ        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100011                SETPORT        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100100                SETPORA        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100101                SETPORB        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100110                SETPORC        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100111                SETPORD        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011101000                SETXCH        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101001                SETXFR        D/#n
000011 ZCN 1 CCCC DDDDDDDDD 011101010                SETSER        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101011                SETSKIP        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101100                SETVID        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101101                SETVIDY        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101110                SETVIDI        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101111                SETVIDQ        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011110000                SETCTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110001                SETWAVA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110010                SETFRQA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110011                SETPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110100                ADDPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110101                SUBPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110110                SYNCTRA                        (waits for ctra)
000011 ZCN 1 CCCC nnnnnnnnn 011110111                CAPCTRA

000011 ZCN 1 CCCC nnnnnnnnn 011111000                SETCTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111001                SETWAVB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111010                SETFRQB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111011                SETPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111100                ADDPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111101                SUBPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111110                SYNCTRB                        (waits for ctrb)
000011 ZCN 1 CCCC nnnnnnnnn 011111111                CAPCTRB

000011 ZCR 1 CCCC DDDDDDDDD 1000bbbbb                ISOB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1001bbbbb                NOTB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1010bbbbb                CLRB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1011bbbbb                SETB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1100bbbbb                SETBC        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1101bbbbb                SETBNC        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1110bbbbb                SETBZ        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1111bbbbb                SETBNZ        D,b

000100 000 I CCCC DDDDDDDDD SSSSSSSSS                SETACCA        D,S
000100 010 I CCCC DDDDDDDDD SSSSSSSSS                SETACCB        D,S
000100 100 I CCCC DDDDDDDDD SSSSSSSSS                MACA        D,S
000100 110 I CCCC DDDDDDDDD SSSSSSSSS                MACB        D,S
000100 ZC1 I CCCC DDDDDDDDD SSSSSSSSS                MUL        D,S                (waits one clock)

000101 000 I CCCC DDDDDDDDD SSSSSSSSS                MOVF        D,S
000101 010 I CCCC DDDDDDDDD SSSSSSSSS                QSINCOS        D,S
000101 100 I CCCC DDDDDDDDD SSSSSSSSS                QARCTAN        D,S
000101 110 I CCCC DDDDDDDDD SSSSSSSSS                QROTATE        D,S
000101 ZC1 I CCCC DDDDDDDDD SSSSSSSSS                SCL        D,S                (waits one clock)

000110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ENC        D,S
000111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                JMPRET        D,S

001000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ROR        D,S
001001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ROL        D,S
001010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SHR        D,S
001011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SHL        D,S
001100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                RCR        D,S
001101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                RCL        D,S
001110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SAR        D,S
001111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                REV        D,S

010000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MINS        D,S
010001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MAXS        D,S
010010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MIN        D,S
010011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MAX        D,S
010100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVS        D,S
010101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVD        D,S
010110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVI        D,S
010111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                JMPRETD        D,S

011000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                AND        D,S
011001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ANDN        D,S
011010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                OR        D,S
011011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                XOR        D,S
011100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXC        D,S
011101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXNC        D,S
011110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXZ        D,S
011111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXNZ        D,S

100000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADD        D,S
100001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUB        D,S
100010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDABS        D,S
100011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBABS        D,S
100100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMC        D,S
100101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMNC        D,S
100110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMZ        D,S
100111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMNZ        D,S

101000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOV        D,S
101001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEG        D,S
101010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ABS        D,S
101011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ABSNEG        D,S
101100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGC        D,S
101101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGNC        D,S
101110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGZ        D,S
101111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGNZ        D,S

110000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPS        D,S
110001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPSX        D,S
110010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDX        D,S
110011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBX        D,S
110100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDS        D,S
110101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBS        D,S
110110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDSX        D,S
110111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBSX        D,S

111000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBR        D,S
111001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPSUB        D,S
111010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                INCMOD        D,S
111011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                DECMOD        D,S

111000 000 I BBAA DDDDDDDDD SSSSSSSSS                SETINDx        D,S                (SETINDA S / SETINDB D / SETINDS D,S)
111001 000 I 0B0A DDDDDDDDD SSSSSSSSS                FIXINDx        D,S                (FIXINDA D,S / FIXINDB D,S / FIXINDS D,S)
111010 000 I CCCC DDDDDDDDD SSSSSSSSS                CFGPINS        D,S                (waits for alt)
111011 000 I CCCC DDDDDDDDD SSSSSSSSS                WAITVID        D,S                (waits for vid)

111100 00R I CCCC DDDDDDDDD SSSSSSSSS                IJZ        D,S
111100 01R I CCCC DDDDDDDDD SSSSSSSSS                IJZD        D,S
111100 10R I CCCC DDDDDDDDD SSSSSSSSS                IJNZ        D,S
111100 11R I CCCC DDDDDDDDD SSSSSSSSS                IJNZD        D,S

111101 00R I CCCC DDDDDDDDD SSSSSSSSS                DJZ        D,S
111101 01R I CCCC DDDDDDDDD SSSSSSSSS                DJZD        D,S
111101 10R I CCCC DDDDDDDDD SSSSSSSSS                DJNZ        D,S
111101 11R I CCCC DDDDDDDDD SSSSSSSSS                DJNZD        D,S

111110 000 I CCCC DDDDDDDDD SSSSSSSSS                TJZ        D,S
111110 010 I CCCC DDDDDDDDD SSSSSSSSS                TJZD        D,S
111110 100 I CCCC DDDDDDDDD SSSSSSSSS                TJNZ        D,S
111110 110 I CCCC DDDDDDDDD SSSSSSSSS                TJNZD        D,S

111110 001 I CCCC DDDDDDDDD SSSSSSSSS                JP        D,S
111110 011 I CCCC DDDDDDDDD SSSSSSSSS                JPD        D,S
111110 101 I CCCC DDDDDDDDD SSSSSSSSS                JNP        D,S
111110 111 I CCCC DDDDDDDDD SSSSSSSSS                JNPD        D,S

111111 0CR I CCCC DDDDDDDDD SSSSSSSSS                WAITCNT        D,S                (waits for cnt32, +cnt64 if wc)
111111 1C0 I CCCC DDDDDDDDD SSSSSSSSS                WAITPEQ        D,S                (waits for pins, +cnt32 if wc)
111111 1C1 I CCCC DDDDDDDDD SSSSSSSSS                WAITPNE        D,S                (waits for pins, +cnt32 if wc)
-------------------------------------------------------------------------------------------------

ZCR                effects
-------------------------------------------------------------------------------------------------
000                nz, nc, nr
001                nz, nc, wr
010                nz, wc, nr
011                nz, wc, wr
100                wz, nc, nr
101                wz, nc, wr
110                wz, wc, nr
111                wz, wc, wr

CCCC        condition                (easier-to-read list)
-------------------------------------------------------------------------------------------------
0000        never                        1111        always                        (default)
0001        nc & nz                1100        if_c                                                if_b
0010        nc & z                0011        if_nc                                                if_ae
0011        nc                        1010        if_z                                                if_e
0100         c & nz                0101        if_nz                                                if_ne
0101        nz                        1000        if_c_and_z                if_z_and_c
0110         c <> z                0100        if_c_and_nz                if_nz_and_c
0111        nc | nz                0010        if_nc_and_z                if_z_and_nc
1000         c & z                0001        if_nc_and_nz                if_nz_and_nc                if_a
1001         c = z                1110        if_c_or_z                if_z_or_c                if_be
1010         z                        1101        if_c_or_nz                if_nz_or_c
1011        nc | z                1011        if_nc_or_z                if_z_or_nc
1100         c                        0111        if_nc_or_nz                if_nz_or_nc
1101         c | nz                1001        if_c_eq_z                if_z_eq_c
1110         c | z                0110        if_c_ne_z                if_z_ne_c
1111        always                        0000        never

CCCC        inda/indb - CCCC=1111 after first stage of pipeline if inda/indb used (indx=inda/indb)
-------------------------------------------------------------------------------------------------
xx00        source indx
xx01        source indx++
xx10        source indx--
xx11        source ++indx

00xx        destination indx
01xx        destination indx++
10xx        destination indx--
11xx        destination ++indx

I        SSSSSSSSS        source operand
-------------------------------------------------------------------------------------------------
0        SSSSSSSSS        register
1        #SSSSSSSSS        immediate, zero-extended

        DDDDDDDDD        destination operand
-------------------------------------------------------------------------------------------------
        DDDDDDDDD        register

Effects and Condition Codes

Every assembly instruction can conditionally update the Z and/or C flag with WC and WZ effects. Additionally, the result can conditionally be written using the NR and WR flags. In addition, instructions can be conditionally executed given the Z and/or C flag—see P8X32A.

Appendix A. Original Documentation Sources

Topic	URL
Hub Memory Instructions	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196 http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1148999#post1148999
Hub Control Instructions	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196 http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1148785#post1148785
COG RAM Indirection	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196
COG Stack RAM	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196
Multitasking	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196
Pipeline	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1148452#post1148452
DECODx Instructions	http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1148852#post1148852
QUAD Instructions	http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1149359#post1149359

Appendix Z. Style Guide and Templates

NOTE: This section is intended only for use by the editors of this document.

DOCUMENT TASK LIST

TASK	Documenters	Notes	Status
Assembly Language Reference	Seairth
Hardware associated doc	Peter Jakacki
P2 document updates from Chip
Scavenging useful notes and examples
Assembler Language summary	Cluso99	Similar to P1's summary

TO DO

TASK	Notes	Status