Updated: Sept 1015 - starting to make changes to document to reflect all that we know about P2-2015. Append with new information to be reintegrated into document.

As the document is updated the text will be changed from red to black to indicate its status.

Webpage URL:  This is the webpage version which is automatically generated from the document so refresh your browser often as updates are frequent

IMPORTANT NOTE: This is an unofficial document maintained by various forum/community members.


TABLE OF CONTENTS

Introduction

Features

General

Clock Speed

Performance Metrics

Memory

Power Specification

I/O

Counter Modules

Video Generation

Code Protection and Encryption

Supported Languages

Packaging

Diagram: Pinout

Diagram: Schematic Symbol (pbj)

Table: Pin Definitions

Diagram: P1 QFP vs P2 TQFP Footprint

Memory

Hub Memory

Cog Memory

Diagram 2: Cog Memory and Registers

Hub

Hub Memory Instructions

Table: Hub Memory Instructions

PTR Expressions:

Table: PTR Expressions

Examples: Using the PTR

Table:  Memory Addressing Example

PTRA/PTRB Instructions

Table: PTR Instructions

QUAD related Instructions

Read Cache

Mapping QUAD Registers

Hiding QUAD Registers

Table: QUAD related Instructions

Hub Control Instructions

COGINIT D,S

SETCOG S

COGINIT Process

Example: COGINIT

CLKSET  D

Table: CLKSET Fields

COGID   D

COGSTOP D

LOCKS

Table: Hub Control Instructions

Indirect Registers

Table: INDA/INDB Usage Scheme

Tabke: INDA/INDB Instructions

Example: Indirect Pointer Usage

Stack RAM

Table: Stack RAM Instructions

Instruction Pipeline

Example: Single-task self-modifying code

Example: Single-task delayed branch

Example: Two-task delayed branch (SETTASK #%%1010 timing)

Table: Branching Instructions

Instruction-Block Repeating

Example: Using REP instruction

Table: REP Instructions

Multi-tasking

Task Time Slots

SETTASK

JMPTASK

Example: Starting four tasks

Table: Task Instructions

Register Remapping

Example: Register Remapping

Tips for coding multi-tasking programs

Tasks and the Pipeline

Avoiding Pipeline Stall

Other instruction alternatives:

Instructions to avoid in multi-tasking

I/O Ports

Table 15: Port Access Instructions

Table 16: Pin State Access Instructions

External RAM

Table 17: External RAM Instruction

InterChip Communication

Table 18: InterChip Communication Instructions

Cog Memory Remapping

Table 19: Cog Memory Remapping Instruction

InterCog Communication

Table 20: InterCog Communication Instruction

Pin Modes

Table 21: Pin Mode Access Instructions

Figure 2: Pin Modes

Video Generator

Table 22: Video Generator Access Instructions

DAC Hardware

Table 23: DAC Hardware Access Instructions

Texture Mapping

Table 24: Texture Mapping Instructions

CLUT or Stack RAM

Table 6: CLUT Instructions

Math

Table 7: Math Operation Instructions

Miscellaneous Hardware

LFSR

System Counter

Table 8: System Counter Instructions

Multiply Accumulate

Table 9: Multiply and Accumulate Instructions

Miscellaneous Instructions

Table 10: Extended Miscellaneous Instructions

Table 11: Extended Miscellaneous Flag Manipulation Instructions

Table 12: Extended Miscellaneous Flow Control Instructions

Table 13: Miscellaneous Instructions

Register Map

Table 14: Register Map Setup

Counter Modules

Table 25: Counter Hardware Access Instructions

Byte/Word Field Mover

Table: Field mover configuration bits

Table: Byte/Word Field Mover Instructions

Hub Counter

Table: Hub Counter Instructions

Example: Hub Counter

Table: Instruction List

Effects and Condition Codes

Links

Assembler Reference Section

Assembler Instruction Summary Chart

Appendix A. Original Documentation Sources

Appendix Z. Style Guide and Templates

DOCUMENT TASK LIST

TO DO


Introduction

The Propeller 2 is a general-purpose 32-bit microcontroller with 8 symmetric processors called “cogs.” Each cog has 512 longs (2 KB) of memory from which it executes instructions.. Most instructions execute in a single clock cycle, with certain math intensive operations taking up to 31 clock cycles to complete. Additionally, there are a 4 stage pipeline, interrupt support and smart I/O pins that operate in a variety of modes.

The hub allows each cog round-robin access to the main hub RAM; depending on the hub’s access window relative to the cog, access to hub RAM can take up to 7 clocks (if the access window was just missed) or as little as 0 clocks (if the cog is next in line for the access window). Additionally, the developer has the ability to set a one-time settable encryption key in the chip to protect code downloaded to the chip. On system startup the chip will use this protected key to decrypt the encrypted program that is stored externally in non-volatile EEPROM/FLASH. The encryption key is not accessible by any user code.

If no encryption has been set, the Propeller 2 will boot from Serial, SPI Flash and finally present it’s monitor on pins 90(rx) and 91(tx).  

Features

General

  • 32-bit, general purpose multi-core microcontroller
  • 8 identical processors (cogs)
  • 128-pin TQFP package
  • 20 KHz and 20 MHz internal RC oscillator.
  • External oscillator or 10MHz to 20MHz crystal.
  • The chip is expected to be clocked at 160 MHz in normal operation, across the full industrial temperature range. With all eight cogs running at full capacity, 1,280 MIPS can be achieved.

Clock Speed

  • 160 MHz planned maximum clock speed
  • Internal RC: 20 kHz or 20 MHz (cannot use PLL)
  • External oscillator: DC to 160 MHz (without PLL) or 10 MHz to 32 MHz (with PLL) for system clock speed of 160 MHz maximum
  • PLL modes: 1x, 2x, 3x ... 15x, 16x input clock multiplier

Performance Metrics

  • 4-stage pipeline
  • Most instructions are single cycle
  • 1.28 BIPS (160 MIPS x 8 cogs) maximum instruction execution rate(1); assumes that all cogs are running, their pipelines are always full, and only single-cycle instructions are being executed

Memory

  • Main memory: 127,360B RAM + 3.7 KB ROM
  • Cog memory: 2 KB (512 longs) cog RAM + 256 long stack
  • Optional external 32-bit addressable SDRAM for run-time data workspace; code space is not extendable
  • Non-volatile application and data storage via xternal SPI EEPROM or SD card
  • Cogs can access Main Memory at each hub access window in units of 1 byte, 1 word, 1 long, or 4 contiguous quad-aligned longs.
  • Hub access window arrives for each cog in a round-robin fashion every 8 cycles.

Power Specification

  • Core voltage: 1.8 VDC
  • I/O pin voltage: 1.8 VDC–3.3 VDC
  • Current source or sink per I/O: 40 mA
  • Total current draw @ 1.8 VDC Core, 3.3 VDC I/O, 25° C: TBD
  • 1.8 V Core – 3.3/1.8 V I/O pins. (Each group of 8 I/O pins is powered by a VP pin and GP).

I/O

  • 92 I/O pins total: 84 fully general purpose I/O + 8 additional general purpose I/O available after boot-up
  • Each I/O pin is planned(1) to have internal:
  • Input ADC
  • Output DAC
  • True or inverted input/output
  • Differential input/output
  • Comparator
  • Schmitt input

Counter Modules

  • 2 counter modules, each with 2 integrated waveform generators, per cog

Video Generation

  • Each cog has independent video generation hardware capable of VGA, Standard PAL/NTSC, and HD up to 1080p (at 30 Hz)

Code Protection and Encryption

  • Propeller application and data optionally encrypted in non-volatile storage

Supported Languages

  • Propeller 2 Spin and Propeller 2 Assembly
  • Propeller 2 Assembly is not fully backwards compatible with Propeller 1 Assembly
  • Some Propeller 1 Spin code may need to be ported to the Propeller 2


Packaging

  • Package Type: (T)QFP-128
  • Package Size: 14mm
  • Pin pitch: 0.4mm
  • No center pad

Diagram: Pinout

Diagram: Schematic Symbol (pbj)


Table: Pin Definitions

PIN

NAME

TYPE

NOTES

01

GND

GND

02

P0

I/O

03

P1

I/O

04

GP0

I/O GND

05

P2

I/O

06

P3

I/O

07

P4

I/O

08

P5

I/O

09

VP0

I/O PWR

1.8V-3.3V

10

P6

I/O

11

P7

I/O

12

P8

I/O

13

P9

I/O

14

GP1

I/O GND

15

P10

I/O

16

P11

I/O

17

P12

I/O

18

P13

I/O

19

VP1

I/O PWR

1.8V-3.3V

20

P14

I/O

21

P15

I/O

22

P16

I/O

23

P17

I/O

24

GP2

I/O GND

25

P18

I/O

26

P19

I/O

27

P20

I/O

28

P21

I/O

29

VP2

I/O PWR

1.8V-3.3V

30

P22

I/O

31

P23

I/O

32

VDD

PWR

1.8V

PIN

NAME

TYPE

NOTES

33

GND

GND

34

P24

I/O

35

P25

I/O

36

GP3

I/O GND

37

P26

I/O

38

P27

I/O

39

P28

I/O

40

P29

I/O

41

VP3

I/O PWR

1.8V-3.3V

42

P30

I/O

43

P31

I/O

44

P32

I/O

45

P33

I/O

46

GP4

I/O GND

47

P34

I/O

48

P35

I/O

49

P36

I/O

50

P37

I/O

51

VP4

I/O PWR

1.8V-3.3V

52

P38

I/O

53

P39

I/O

54

P40

I/O

55

P41

I/O

56

GP5

I/O GND

57

P42

I/O

58

P43

I/O

59

P44

I/O

60

P45

I/O

61

VP5

I/O PWR

1.8V-3.3V

62

P46

I/O

63

P47

I/O

64

VDD

PWR

1.8V

PIN

NAME

TYPE

NOTES

65

GND

GND

66

P48

I/O

67

P49

I/O

68

GP6

I/O GND

69

P50

I/O

70

P51

I/O

71

P52

I/O

72

P53

I/O

73

VP6

I/O PWR

1.8V-3.3V

74

P54

I/O

75

P55

I/O

76

P56

I/O

77

P57

I/O

78

GP7

I/O GND

79

P58

I/O

80

P59

I/O

81

P60

I/O

82

P61

I/O

83

VP7

I/O PWR

1.8V-3.3V

84

P62

I/O

85

P63

I/O

86

P64

I/O

87

P65

I/O

88

GP8

I/O GND

89

P66

I/O

90

P67

I/O

91

P68

I/O

92

P69

I/O

93

VP8

I/O PWR

1.8V-3.3V

94

P70

I/O

95

P71

I/O

96

VDD

PWR

1.8V

PIN

NAME

TYPE

NOTES

97

GND

GND

98

P72

I/O

99

P73

I/O

100

GP9

I/O GND

101

P74

I/O

102

P75

I/O

103

P76

I/O

104

P77

I/O

105

VP9

I/O PWR

1.8V-3.3V

106

P78

I/O

107

P79

I/O

108

P80

I/O

109

P81

I/O

110

GP10

I/O GND

111

P82

I/O

112

P83

I/O

113

P84

I/O

114

P85

I/O

115

VP10

I/O PWR

1.8V-3.3V

116

P86

I/O

SPI DO in

117

P87

I/O

SPI DI out

118

P88

I/O

SPI CK out

119

P89

I/O

SPI CS out

120

GP11

I/O GND

121

P90

I/O

TXD out

122

P91

I/O

RXD in

123

BOEn

IN

Brown out En

124

RESn

I/O

Reset

125

VP11

I/O PWR

1.8V-3.3V

126

XO

OUT

Crystal out

127

XI

IN

Crystal/Osc in

128

VDD

PWR

1.8V

Note: All CPU and I/O GNDs must be connected to power common.

Diagram: P1 QFP vs P2 TQFP Footprint

Note: Relative size


Memory

There are two primary types of memory, a shared HUB memory and individual COG memory.


Hub Memory

   128K bytes of main memory shared by all cogs

Diagram 1: Hub Memory and Registers


Cog Memory

Each of the eight cogs contains 512 longs of register RAM and 256 longs of stack RAM.

The 512 longs of register RAM is comprised of:

Special function registers such PTRx and SPAx etc are accessed via special instructions and are not part of the memory map.

The 256 longs of stack RAM for data and video usage features:

Diagram 2: Cog Memory and Registers

// P2 MEMORY MAP 24SEP2015
//
//      addr            read            write           name
//      ---------------------------------------------------------------
// COG REGISTERS (9-bit addressable)
//
//      000             INA             -               INA / IJMP0
//      001             INB             -               INB / IRET0
//      002             RAM             RAM+OUTA        OUTA
//      003             RAM             RAM+OUTB        OUTB
//      004             RAM             RAM+DIRA        DIRA
//      005             RAM             RAM+DIRB        DIRB
//      006             PTRA            PTRA            PTRA
//      007             PTRB            PTRB            PTRB
//
//      008             RAM             RAM             user / ADRA
//      009             RAM             RAM             user / ADRB
//      00A             RAM             RAM             user / IJMP1
//      00B             RAM             RAM             user / IRET1
//      00C             RAM             RAM             user / IJMP2
//      00D             RAM             RAM             user / IRET2
//      00E             RAM             RAM             user / IJMP3
//      00F             RAM             RAM             user / IRET3
//
//      010-1FF         RAM             RAM             user
//      ---------------------------------------------------------------
// LUT
//      200-3FF         RAM             RAM             user / cog-exec
//
// LUT (possible expansion)
//      400-5FF         RAM             RAM             user / cog-exec
//      ---------------------------------------------------------------
// HUB
//      00000-7FFFF     RAM             RAM             user / hub-exec
//
// HUB (future expansion)
//      80000-FFFFF     RAM             RAM             user / hub-exec
//      ---------------------------------------------------------------
// HUB ROM
//      00000-03FFF     (not accessible)                boot
//      ---------------------------------------------------------------

Hub

Each cog now features two 17 bit pointer registers called PTRA and PTRB and a 16-byte/8-word/4-long read cache. The register pointers can be used for any hub memory read or write operation. They feature auto incrementing and decrementing with pre or post operation.

Hub Memory Instructions

These instructions read and write hub memory.

All instructions use D as the data conduit, except WRQUAD/RDQUAD/RDQUADC, which uses the four QUAD registers. The QUADs can be mapped into cog register space using the SETQUAD instruction or kept hidden, in which case they are still useful as data conduit and as a read cache. If mapped, the QUADs overlay four contiguous cog registers which can begin at any double-even address (%xxxxxxx00). These overlaid registers can be read and written as any other registers, as well as executed. Any write via D to the QUAD registers, when mapped, will affect the underlying cog registers, as well. A RDQUAD/RDQUADC will affect the QUAD registers, but not the underlying cog registers.

The cached reads RDBYTEC/RDWORDC/RDLONGC/RDQUADC will do a RDQUAD if the current read address is outside of the 4-long window of the prior RDQUAD. Otherwise, they will immediately return cached data. The CACHEX instruction invalidates the cache, forcing a fresh RDQUAD next time a cached read executes.

Hub memory instructions must wait for their cog's hub cycle, which comes once every 8 clocks. The timing relationship between a cog's instruction stream and its hub cycle is generally indeterminant, causing these instructions to take varying numbers of clocks. Timing can be made determinant, though, by intentionally spacing these instructions apart so that after the first in a series executes, the subsequent hub memory instructions fall on hub cycles, making them take the minimal numbers of clocks. The trick is to write useful code to go in between them.

After a RDQUAD, the QUAD registers are accessible via D and S on the 3rd clock and executable on the 5th clock.

Table: Hub Memory Instructions

INSTRUCTION

DESCRIPTION

WRBYTE  D,S

Write lower byte of D to hub memory at S

RDBYTE  D,S

Read byte from hub memory at S into D

RDBYTEC D,S

Read cached byte at S into D

WRWORD  D,S

Write lower word of D to hub memory at S

RDWORD  D,S

Read word from hub memory at S into D

RDWORDC D,S

Read cached word at S into D

WRLONG  D,S

Write D to hub memory at S

RDLONG  D,S

Read long from hub memory at S into D

RDLONGC D,S

Read cached long at S into D

WRQUAD  D

Write QUADs to hub memory at D

RDQUAD  D

Read into QUADs from hub memory at D

RDQUADC D

Conditionally read into QUADs from hub memory at D

PTR Expressions:

INDEX

-32..+31

Simple offset

INDEX

0..31

++ Auto-increments range

INDEX

0..32

-- Auto-decrement range

SCALE

1

BYTE

SCALE

2

WORD

SCALE

4

LONG

SCALE

16

QUAD

Table: PTR Expressions

 SUPNNNNNN     PTR expression

 000000000     PTRA              'use PTRA

 100000000     PTRB              'use PTRB

 011000001     PTRA++            'use PTRA,                PTRA += SCALE

 111000001     PTRB++            'use PTRB,                PTRB += SCALE

 011111111     PTRA--            'use PTRA,                PTRA -= SCALE

 111111111     PTRB--            'use PTRB,                PTRB -= SCALE

 010000001     ++PTRA            'use PTRA + SCALE,        PTRA += SCALE

 110000001     ++PTRB            'use PTRB + SCALE,        PTRB += SCALE

 010111111     --PTRA            'use PTRA - SCALE,        PTRA -= SCALE

 110111111     --PTRB            'use PTRB - SCALE,        PTRB -= SCALE

 000NNNNNN     PTRA[INDEX]       'use PTRA + INDEX*SCALE

 100NNNNNN     PTRB[INDEX]       'use PTRB + INDEX*SCALE

 011NNNNNN     PTRA++[INDEX]     'use PTRA,                PTRA += INDEX*SCALE

 111NNNNNN     PTRB++[INDEX]     'use PTRB,                PTRB += INDEX*SCALE

 011nnnnnn     PTRA--[INDEX]     'use PTRA,                PTRA -= INDEX*SCALE

 111nnnnnn     PTRB--[INDEX]     'use PTRB,                PTRB -= INDEX*SCALE

 010NNNNNN     ++PTRA[INDEX]     'use PTRA + INDEX*SCALE,  PTRA += INDEX*SCALE

 110NNNNNN     ++PTRB[INDEX]     'use PTRB + INDEX*SCALE,  PTRB += INDEX*SCALE

 010nnnnnn     --PTRA[INDEX]     'use PTRA - INDEX*SCALE,  PTRA -= INDEX*SCALE

 110nnnnnn     --PTRB[INDEX]     'use PTRB - INDEX*SCALE,  PTRB -= INDEX*SCALE

   S = 0 for PTRA, 1 for PTRB

   U = 0 to keep PTRx same, 1 to update PTRx

   P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify)

   NNNNNN = INDEX

   nnnnnn = -INDEX

Examples: Using the PTR

000000 Z01 1 CCCC DDDDDDDDD 000000000     RDBYTE  D,PTRA         'read byte at PTRA into D

000001 000 1 CCCC DDDDDDDDD 111000001     WRWORD  D,PTRB++       'write lower word in D at PTRB,      PTRB += 2

000010 Z01 1 CCCC DDDDDDDDD 011111111     RDLONG  D,PTRA--       'read long at PTRA into D,           PTRA -= 4

000011 001 1 CCCC 110000001 010110001     RDQUAD  ++PTRB         'read quad at PTRB+16 into QUADs,    PTRB += 16

000000 000 1 CCCC DDDDDDDDD 010111111     WRBYTE  D,--PTRA       'write lower byte in D at PTRA-1,    PTRA -= 1

000001 000 1 CCCC DDDDDDDDD 100000111     WRWORD  D,PTRB[7]      'write lower word in D to PTRB+7*2

000010 Z11 1 CCCC DDDDDDDDD 011001111     RDLONGC D,PTRA++[15]   'read cached long at PTRA into D,    PTRA += 15*4

000011 001 1 CCCC 111111101 010110000     WRQUAD  PTRB--[3]      'write QUADs at PTRB,                PTRB -= 3*16

000000 000 1 CCCC DDDDDDDDD 010000110     WRBYTE  D,++PTRA[6]    'write lower byte in D to PTRA+6*1,  PTRA += 6*1

000001 Z01 1 CCCC DDDDDDDDD 110110110     RDWORD  D,--PTRB[10]   'read word at PTRB-10*2 into D,      PTRB -= 10*2

Bytes, words, longs, and quads are addressed as follows:

   for WRBYTE/RDBYTE/RDBYTEC, address = %XXXXXXXXXXXXXXXXX (bits 16..0 are used)

   for WRWORD/RDWORD/RDWORDC, address = %XXXXXXXXXXXXXXXX- (bits 16..1 are used)

   for WRLONG/RDLONG/RDLONGC, address = %XXXXXXXXXXXXXXX-- (bits 16..2 are used)

   for WRQUAD/RDQUAD/RDQUADC, address = %XXXXXXXXXXXXX---- (bits 16..4 are used)

Table:  Memory Addressing Example

address  byte  word    long        quad

00000-   50   *7250   *706F7250   *0C7CCC030C7C200020302E32706F7250

00001-   72    7250    706F7250    0C7CCC030C7C200020302E32706F7250

00002-   6F   *706F    706F7250    0C7CCC030C7C200020302E32706F7250

00003-   70    706F    706F7250    0C7CCC030C7C200020302E32706F7250

00004-   32   *2E32   *20302E32    0C7CCC030C7C200020302E32706F7250

00005-   2E    2E32    20302E32    0C7CCC030C7C200020302E32706F7250

00006-   30   *2030    20302E32    0C7CCC030C7C200020302E32706F7250

00007-   20    2030    20302E32    0C7CCC030C7C200020302E32706F7250

00008-   00   *2000   *0C7C2000    0C7CCC030C7C200020302E32706F7250

00009-   20    2000    0C7C2000    0C7CCC030C7C200020302E32706F7250

0000A-   7C   *0C7C    0C7C2000    0C7CCC030C7C200020302E32706F7250

0000B-   0C    0C7C    0C7C2000    0C7CCC030C7C200020302E32706F7250

0000C-   03   *CC03   *0C7CCC03    0C7CCC030C7C200020302E32706F7250

0000D-   CC    CC03    0C7CCC03    0C7CCC030C7C200020302E32706F7250

0000E-   7C   *0C7C    0C7CCC03    0C7CCC030C7C200020302E32706F7250

0000F-   0C    0C7C    0C7CCC03    0C7CCC030C7C200020302E32706F7250

00010-   45   *FE45   *0DC1FE45   *0D7CC6010C7CC6010CFCB6E30DC1FE45

00011-   FE    FE45    0DC1FE45    0D7CC6010C7CC6010CFCB6E30DC1FE45

00012-   C1   *0DC1    0DC1FE45    0D7CC6010C7CC6010CFCB6E30DC1FE45

00013-   0D    0DC1    0DC1FE45    0D7CC6010C7CC6010CFCB6E30DC1FE45

00014-   E3   *B6E3   *0CFCB6E3    0D7CC6010C7CC6010CFCB6E30DC1FE45

00015-   B6    B6E3    0CFCB6E3    0D7CC6010C7CC6010CFCB6E30DC1FE45

00016-   FC   *0CFC    0CFCB6E3    0D7CC6010C7CC6010CFCB6E30DC1FE45

00017-   0C    0CFC    0CFCB6E3    0D7CC6010C7CC6010CFCB6E30DC1FE45

00018-   01   *C601   *0C7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

00019-   C6    C601    0C7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001A-   7C   *0C7C    0C7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001B-   0C    0C7C    0C7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001C-   01   *C601   *0D7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001D-   C6    C601    0D7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001E-   7C   *0D7C    0D7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

0001F-   0D    0D7C    0D7CC601    0D7CC6010C7CC6010CFCB6E30DC1FE45

* new word/long/quad


PTRA/PTRB Instructions

Each cog has two 17-bit pointers, PTRA and PTRB, which can be read, written, modified, and used to access hub memory.

At cog startup, the PTRA and PTRB registers are initialized as follows:

   PTRA = %X_XXXXXXXX_XXXXXXXX, data from launching cog, usually a pointer

   PTRB = %X_XXXXXXXX_XXXXXX00, long address in hub where cog code was loaded from

Table: PTR Instructions

INSTRUCTION

DESCRIPTION

CLOCK

GETPTRA D

get PTRA into D, C = PTRA[16]

1

GETPTRB D

get PTRB into D, C = PTRB[16]

1

SETPTRA D

SETPTRA #n

set PTRA to D

set PTRA to 0..511

1

1

SETPTRB D

SETPTRB #n

set PTRB to D

set PTRB to 0..511

1

1

ADDPTRA D

ADDPTRA #n

add D into PTRA

add 0..511 into PTRA

1

1

ADDPTRB D

ADDPTRB #n

add D into PTRB

add 0..511 into PTRB

1

1

SUBPTRA D

SUBPTRA #n

subtract D from PTRA

subtract 0..511 from PTRA

1

1

SUBPTRB D

SUBPTRB #n

subtract D from PTRB

subtract 0..511 from PTRB

1

1


QUAD related Instructions

Each cog has four QUAD registers which form a 128-bit conduit between the hub memory and the cog. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions.

Read Cache

It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC.

Mapping QUAD Registers

Each COG has four QUAD registers which form a 128-bit conduit between the HUB memory and the COG. This conduit can transfer four longs every 8 clocks via the WRQUAD/RDQUAD instructions. It can also be used as a 4-long/8-word/16-byte read cache, utilized by RDBYTEC/RDWORDC/RDLONGC/RDQUADC .

Initially hidden, these QUAD registers are mappable into COG register space by using the SETQUAD instruction to set an address where the base register is to appear, with the other three registers following.

SETQUAZ works just like SETQUAD, but also clears the four QUAD registers.

Hiding QUAD Registers

To hide the QUAD registers, use SETQUAD to set an address which is $1F8, or higher.

Table: QUAD related Instructions

INSTRUCTION

DESCRIPTION

CLOCK

CACHEX

Invalidate QUAD cache

1

GETTOPS D

Get top bytes of QUADs into D

1

SETQUAD D

Set QUAD base address to D

1

SETQUAD #n

Set QUAD base address to 0..511

1

SETQUAZ D

set QUAD base address to D and clears the QUAD registers

1

SETQUAZ #n

set QUAD base address to 0..511 and clears the QUAD registers

1

Hub Control Instructions

These instructions are used to control hub circuits and cogs.

Hub instructions must wait for their cog's hub cycle, which comes once every 8 clocks. In cases where there is no result to wait for (ZCR = %000), these instructions complete on the hub cycle, making them take 1..8 clocks, depending on where the hub cycle is in relation to the instruction. In cases where a result is anticipated (ZCR <> %000), these instructions complete on the 1st clock after the hub cycle, making them take 2..9 clocks.

COGINIT D,S

COGINIT is used to start cogs. Any cog can be (re)started, whether it is idle or running. A cog can even execute a COGINIT to restart itself with a new program.

COGINIT uses D to specify a long address in hub memory that is the start of the program that is to be loaded into a cog, while S is a 17-bit parameter (usually an address) that will be conveyed to PTRA of the started cog. PTRB of the started cog will be set to the start address of its program that was loaded from hub memory.

SETCOG S

SETCOG must be executed before COGINIT to set the number of the cog to be started (0..7). If SETCOG sets a value with bit 3 set (%1xxx), this will cause the next idle cog to be started when COGINIT is executed, with the number of the cog started being returned in D, and the C flag returning 0 if okay, or 1 if no idle cog was available. At cog startup, SETCOG is initialized to %0000.

COGINIT Process

When a cog is started, $1F8 contiguous longs are read from hub memory (internally using RDLONGC) and written to cog registers $000..$1F7. The cog will then begin execution at $000. This process takes 1,016 clocks. (That's only 6.35us at 160MHz).

Example: COGINIT

       COGID   COGNUM           'what cog am I?

       SETCOG  COGNUM           'set my cog number

       COGINIT COGPGM,COGPTR    'restart me with the ROM Monitor

       COGPGM  LONG    $0070C           'address of the ROM Monitor

       COGPTR  LONG    90<<9 + 91       'tx = P90, rx = P91

       COGNUM  RES     1

CLKSET  D

CLKSET writes the lower 9 bits of D to the hub clock register:

Table: CLKSET Fields

Bit 8

Bits 7..4

Bits 3..2

Bits 1..0

RESET

PLL MULTIPLIER FOR XI PIN INPUT*

XI / XO PIN MODE

CLOCK SELECTOR

0: continued operation

0000: PLL disabled

00: XI reads low, XO floats

00: RCFAST (~20MHz)

1: hardware reset

0001: 2x multiplier

01: XI input, XO floats

01: RCSLOW (~20KHz)

0010: 3x multiplier

10: XI/XO crystal oscillator with 15pF internal loading and 1M-ohm feedback

10: XTAL (10MHz-20MHz)

...

11: XI/XO crystal oscillator with 30pF internal loading and 1M-ohm feedback

11: PLL

1110: 15x multiplier

1111: 16x multiplier

* XI/XO Pin Mode must be set for XI input or XI/XO crystal oscillator to use PLL.

Because the the clock register is cleared to %0_0000_00_00 on reset, the chip starts up in RCFAST mode with both the crystal oscillator and the PLL disabled. Before switching to XTAL or PLL mode from RCFAST or RCSLOW, the crystal oscillator must be enabled and given 10ms to stabilize. The PLL stabilizes within 10us, so it can be enbled at the sime time as the crystal oscillator. Once the crystal is stabilized, you can switch between XTAL and RCFAST/RCSLOW without any stability concerns. If the PLL is also enabled, you can switch freely among PLL, XTAL, and RCFAST/RCSLOW modes. You can change the PLL multiplier while being in PLL mode, but beware that some frequency overshoot and undershoot will occur as the PLL settles to its new frequency. This only poses a hardware problem if you are switching upwards and the resulting overshoot might exceed the speed limit of the chip.

COGID   D

COGID returns the number of the cog (0..7) into D.

COGSTOP D

COGSTOP stops the cog specified in D (0..7).

LOCKS

LOCKNEW D

LOCKRET D

LOCKSET D

LOCKCLR D

There are eight semaphore locks available in the chip which can be borrowed with LOCKNEW, returned with LOCKRET, set with LOCKSET, and cleared with LOCKCLR.

While any cog can set or clear any lock without using LOCKNEW or LOCKRET, LOCKNEW and LOCKRET are provided so that cog programs have a dynamic and simple means of acquiring and relinquishing the locks at run-time.

When a lock is set with LOCKSET, its state is set to 1 and its prior state is returned in C. LOCKCLR works the same way, but clears the lock's state to 0. By having the hub perform the atomic operation of setting/clearing and reporting the prior state, cogs can utilize locks to insure that only one cog has permission to do something at once. If a lock starts out cleared and multiple cogs vie for the lock by doing a 'LOCKSET locknum  wc', the cog to get C=0 back 'wins' and he can have exclusive access to some shared resource while the other cogs get C=1 back. When the winning cog is done, he can do a 'LOCKCLR locknum' to clear the lock and give another cog the opportunity to get C=0 back.

LOCKNEW returns the next available lock into D, with C=1 if no lock was free.

LOCKRET frees the lock in D so that it can be checked out again by LOCKNEW.

LOCKSET sets the lock in D and returns its prior state in C.

LOCKCLR clears the lock in D and returns its prior state in C.

CLKSET, COGID, COGINIT, COGSTOP, and the LOCKxxx instructions will take 1..8 clocks if their Z/C/R bits are all 0, meaning they don't have to wait for anything back from the hub (no Z, C, or D result). If they are going to receive some result back, they must wait for the next cycle to receive it. Hence, those instructions which get results back take 2..9 clocks.

Table: Hub Control Instructions

INSTRUCTION

DESCRIPTION

CLOCK

SETCOG D/#n

Set cog to be used by COGINIT, b3 = use next available

COGINIT D,S

launch cog at D, cog PTRA = S

1..0

CLKSET  D

set clock to D

1..8

COGID   D

get cog number into D

2..9

COGSTOP D

stop cog in D

1..8

LOCKNEW D

get new lock into D, C = busy

2..9

LOCKRET D

return lock in D

1..8

LOCKSET D

set lock in D, C = prev state

2..9

LOCKCLR D

clear lock in D, C = prev state

2..9


Indirect Registers

Each cog has two indirect “registers”: INDA and INDB. INDA and INDB each consist of three hidden 9-bit registers: the pointer, the bottom limit, and the top limit. The bottom and top limits are inclusive values which set automatic wrapping boundaries for the pointer. This way, circular buffers can be established within cog RAM and accessed using simple INDA/INDB references.

INDA shares address $1F6 and INDB shares address $1F7. When either of these addresses is encountered in the D or S field, the value of the associated INDx register is used for the register address in place of the $1F6 or $1F7.

NOTE: It is still possible to access the actual registers at $1F6 and $1F7 (as opposed to the INDA and INDB registers) via the D or S field.  To accomplish this, set INDA to $1F6 and set $INDB to $1F7.  These will still be considered indirect instructions.  Operations on these registers do not affect the hidden pointer registers.

NOTE: The registers at $1F6 and $1F7 are treated the same as all other registers when interpreted as an instruction (i.e. executed).

SETINDA/SETINDB/SETINDS is used to set or adjust the pointer value(s) while forcing the associated bottom and top limits to $000 and $1FF, respectively.

FIXINDA/FIXINDB/FIXINDS sets the pointer(s) to an inital value, while setting the bottom limit(s) to the lower of the initial and terminal values and the top limit(s) to the higher.

At cog startup, INDA and INDB are configured as if these instructions had been executed:

FIXINDA $1F6,$1F6    // Set pointer to $1F6, bottom to $1F6, top to $1F6

FIXINDB $1F7,$1F7    // Set pointer to $1F7, bottom to $1F7, top to $1F7

Because indirect addressing occurs very early in the pipeline and indirect pointers are affected earlier than the final stage where the conditional bit field (CCCC) normally comes into use, the CCCC field is repurposed for indirect operations. The top two bits of CCCC are used for indirect D and the bottom two bits are used for indirect S.

Unconditional Execution

All instructions which use indirect registers will execute unconditionally, regardless of the CCCC bits.

Here is the INDA/INDB usage scheme which repurposes the CCCC field:

Table: INDA/INDB Usage Scheme

OOOOOO ZCR I CCCC DDDDDDDDD SSSSSSSSS

xxxxxx xxx x 00xx 111110110 xxxxxxxxx        D = INDA        'use INDA

xxxxxx xxx x 00xx 111110111 xxxxxxxxx        D = INDB        'use INDB

xxxxxx xxx x 01xx 111110110 xxxxxxxxx        D = INDA++      'use INDA,      INDA += 1

xxxxxx xxx x 01xx 111110111 xxxxxxxxx        D = INDB++      'use INDB,      INDB += 1

xxxxxx xxx x 10xx 111110110 xxxxxxxxx        D = INDA--      'use INDA,      INDA -= 1

xxxxxx xxx x 10xx 111110111 xxxxxxxxx        D = INDB--      'use INDB       INDB -= 1

xxxxxx xxx x 11xx 111110110 xxxxxxxxx        D = ++INDA      'use INDA+1,    INDA += 1

xxxxxx xxx x 11xx 111110111 xxxxxxxxx        D = ++INDB      'use INDB+1,    INDB += 1

xxxxxx xxx 0 xx00 xxxxxxxxx 111110110        S = INDA        'use INDA

xxxxxx xxx 0 xx00 xxxxxxxxx 111110111        S = INDB        'use INDB

xxxxxx xxx 0 xx01 xxxxxxxxx 111110110        S = INDA++      'use INDA,      INDA += 1

xxxxxx xxx 0 xx01 xxxxxxxxx 111110111        S = INDB++      'use INDB,      INDB += 1

xxxxxx xxx 0 xx10 xxxxxxxxx 111110110        S = INDA--      'use INDA,      INDA -= 1

xxxxxx xxx 0 xx10 xxxxxxxxx 111110111        S = INDB--      'use INDB       INDB -= 1

xxxxxx xxx 0 xx11 xxxxxxxxx 111110110        S = ++INDA      'use INDA+1,    INDA += 1

xxxxxx xxx 0 xx11 xxxxxxxxx 111110111        S = ++INDB      'use INDB+1,    INDB += 1

If both D and S are the same indirect register, the two 2-bit fields in CCCC are OR'd together to get the post-modifier effect:

101000 001 0 0011 111110110 111110110        MOV INDA,++INDA    'Move @INDA+1 into @INDA,   INDA += 1

100000 001 0 1100 111110111 111110111        ADD ++INDB,INDB    'Add @INDB into @INDB+1,    INDB += 1

Note that only '++INDx,INDx' or 'INDx,++INDx' combinations can address different registers from the same INDx.

Here are the instructions which are used to set the pointer and limit values for INDA and INDB:

Tabke: INDA/INDB Instructions

ENCODING

INSTRUCTION

DESCRIPTION

CLOCK

111000 000 0 0001 000000000 AAAAAAAAA

111000 000 0 0011 000000000 aaaaaaaaa

SETINDA #A

SETINDA a

Sets INDA pointer to 0..511*

Increments/decrements INDA pointer -256..+255*

1

1

111000 000 0 0100 BBBBBBBBB 000000000

111000 000 0 1100 bbbbbbbbb 000000000

SETINDB #B

SETINDB b

Sets INDB pointer to 0..511*

Increments/decrements INDB pointer -256..+255*

1

1

111000 000 0 0101 BBBBBBBBB AAAAAAAAA

111000 000 0 0111 BBBBBBBBB aaaaaaaaa

111000 000 0 1101 bbbbbbbbb AAAAAAAAA

111000 000 0 1111 bbbbbbbbb aaaaaaaaa

SETINDS #B,#A

SETINDS #B,a

SETINDS b,#A

SETINDS b,a

Sets INDB pointer to 0..511 and sets INDA pointer 0..511*

Sets INDB pointer to 0..511 and increments/decrements INDA pointer -256..+255*

Sets INDB pointer -256..++255 and increments/decrements INDA pointer to 0..511*

Sets INDB pointer -256..++255 and increments/decrements INDA pointer -256..+255*

1

1

1

1

111001 000 0 0001 TTTTTTTTT IIIIIIIII

FIXINDA #T,#I

Sets the INDA pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher.

1

111001 000 0 0100 TTTTTTTTT IIIIIIIII

FIXINDB #T,#I

Sets the INDB pointer to an inital value, while setting the bottom limit to the lower of the initial and terminal values and the top limit to the higher.

1

111001 000 0 0101 TTTTTTTTT IIIIIIIII

FIXINDS #T,#I

Sets the INDA and INDB pointers to an inital value, while setting the bottom limits to the lower of the initial and terminal values and the top limits to the higher.

1

* All SETINDx operations reset the associated bottom and top limit to $000 and $1FF, respectively

Example: Indirect Pointer Usage

111000 000 0 0001 000000000 000000101        SETINDA #5        'INDA = 5, bottom = 0, top = 511

111000 000 0 0011 000000000 000000011        SETINDA ++3       'INDA += 3, bottom = 0, top = 511

111000 000 0 1100 111111100 000000000        SETINDB --4       'INDB -= 4, bottom = 0, top = 511

111000 000 0 0111 000000111 000001000        SETINDS #7,++8    'INDB = 7, INDA += 8, bottoms = 0, tops = 511

111001 000 0 0001 000001111 000001000        FIXINDA #15,#8    'INDA = 8, bottom = 8, top = 15

111001 000 0 0100 000010000 000011111        FIXINDB #16,#31   'INDB = 31, bottom = 16, top = 31

111001 000 0 0101 001100011 000110010        FIXINDS #99,#50   'INDA/INDB = 50, bottoms = 50, tops = 99


Stack RAM

Each cog has a 256-long stack RAM that is accessible via push and pop operations. Its contents are not initialized at either reset or cog startup. So, at cog startup, it will contain whatever it happened to power up with, or whatever was last written.

There are two stack pointers called SPA and SPB which are used to address the stack memory. Aside from automatically incrementing and decrementing via pushes and pops, SPA and SPB can be set, modified, read back, and checked:

SETSPA  D/#n      set SPA

SETSPB  D/#n      set SPB

ADDSPA  D/#n      add to SPA

ADDSPB  D/#n      add to SPB

SUBSPA  D/#n      subtract from SPA

SUBSPB  D/#n      subtract from SPB

GETSPA  D         get SPA, SPA==0 into Z, SPA.7 into C

GETSPB  D         get SPB, SPB==0 into Z, SPB.7 into C

GETSPD  D         get SPA minus SPB, SPA==SPB into Z, SPA<SPB into C

Data can be pushed and popped in both normal and reverse directions:

PUSHA   D/#n      push using SPA

PUSHB   D/#n      push using SPB

PUSHAR  D/#n      push using SPA, use pop addressing

PUSHBR  D/#n      push using SPB, use pop addressing

POPA    D         pop using SPA

POPB    D         pop using SPB

POPAR   D         pop using SPA, use push addressing

POPBR   D         pop using SPB, use push addressing

Aside from data, the program counter and flags can be pushed and popped using calls and returns:

CALLA   D/#n      call using SPA

CALLB   D/#n      call using SPB

CALLAD  D/#n      call using SPA, delay branch until three trailing instructions executed

CALLBD  D/#n      call using SPB, delay branch until three trailing instructions executed

RETA              return using SPA

RETB              return using SPB

RETAD             return using SPA, delay branch until three trailing instructions executed

RETBD             return using SPB, delay branch until three trailing instructions executed

Table: Stack RAM Instructions

instructions (stack RAM access is shown as [SPx++] and [--SPx])                            clocks    adj

000011 ZC1 1 CCCC DDDDDDDDD 000010101        GETSPD  D       'SPA-SPB into D, Z/C as CHKSPD     1

000011 ZC1 1 CCCC DDDDDDDDD 000010110        GETSPA  D       'SPA into D, Z/C as CHKSPA         1

000011 ZC1 1 CCCC DDDDDDDDD 000010111        GETSPB  D       'SPB into D, Z/C as CHKSPB         1

000011 ZC1 1 CCCC DDDDDDDDD 000011000        POPAR   D       'read [SPA++] into D, MSB into C   1

000011 ZC1 1 CCCC DDDDDDDDD 000011001        POPBR   D       'read [SPB++] into D, MSB into C   1

000011 ZC1 1 CCCC DDDDDDDDD 000011010        POPA    D       'read [--SPA] into D, MSB into C   1

000011 ZC1 1 CCCC DDDDDDDDD 000011011        POPB    D       'read [--SPB] into D, MSB into C   1

000011 ZC0 1 CCCC 000000000 000011100        RETA            'read [--SPA] into Z/C/PC*         4

000011 ZC0 1 CCCC 000000000 000011101        RETB            'read [--SPB] into Z/C/PC*         4

000011 ZC0 1 CCCC 000000000 000011110        RETAD           'read [--SPA] into Z/C/PC*         1

000011 ZC0 1 CCCC 000000000 000011111        RETBD           'read [--SPB] into Z/C/PC*         1

000011 000 1 CCCC DDDDDDDDD 010100010        SETSPA  D       'set SPA to D                      1

000011 001 1 CCCC 0nnnnnnnn 010100010        SETSPA  #n      'set SPA to n                      1

000011 000 1 CCCC DDDDDDDDD 010100011        SETSPB  D       'set SPB to D                      1

000011 001 1 CCCC 0nnnnnnnn 010100011        SETSPB  #n      'set SPB to n                      1

000011 000 1 CCCC DDDDDDDDD 010100100        ADDSPA  D       'add D into SPA                    1

000011 001 1 CCCC 0nnnnnnnn 010100100        ADDSPA  #n      'add n into SPA                    1

000011 000 1 CCCC DDDDDDDDD 010100101        ADDSPB  D       'add D into SPB                    1

000011 001 1 CCCC 0nnnnnnnn 010100101        ADDSPB  #n      'add n into SPB                    1

000011 000 1 CCCC DDDDDDDDD 010100110        SUBSPA  D       'subtract D from SPA               1

000011 001 1 CCCC 0nnnnnnnn 010100110        SUBSPA  #n      'subtract n from SPA               1

000011 000 1 CCCC DDDDDDDDD 010100111        SUBSPB  D       'subtract D from SPB               1

000011 001 1 CCCC 0nnnnnnnn 010100111        SUBSPB  #n      'subtract n from SPB               1

000011 000 1 CCCC DDDDDDDDD 010101000        PUSHAR  D       'write D into [--SPA]              1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101000        PUSHAR  #n      'write n into [--SPA]              1 **  +1

000011 000 1 CCCC DDDDDDDDD 010101001        PUSHBR  D       'write D into [--SPB]              1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101001        PUSHBR  #n      'write n into [--SPB]              1 **  +1

000011 000 1 CCCC DDDDDDDDD 010101010        PUSHA   D       'write D into [SPA++]              1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101010        PUSHA   #n      'write n into [SPA++]              1 **  +1

000011 000 1 CCCC DDDDDDDDD 010101011        PUSHB   D       'write D into [SPB++]              1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101011        PUSHB   #n      'write n into [SPB++]              1 **  +1

000011 000 1 CCCC DDDDDDDDD 010101100        CALLA   D       'write Z/C/PC* into [SPA++], PC=D  4 **  +1

000011 001 1 CCCC nnnnnnnnn 010101100        CALLA   #n      'write Z/C/PC* into [SPA++], PC=n  4 **  +1

000011 000 1 CCCC DDDDDDDDD 010101101        CALLB   D       'write Z/C/PC* into [SPB++], PC=D  4 **  +1

000011 001 1 CCCC nnnnnnnnn 010101101        CALLB   #n      'write Z/C/PC* into [SPB++], PC=n  4 **  +1

000011 000 1 CCCC DDDDDDDDD 010101110        CALLAD  D       'write Z/C/PC* into [SPA++], PC=D  1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101110        CALLAD  #n      'write Z/C/PC* into [SPA++], PC=n  1 **  +1

000011 000 1 CCCC DDDDDDDDD 010101111        CALLBD  D       'write Z/C/PC* into [SPB++], PC=D  1 **  +1

000011 001 1 CCCC nnnnnnnnn 010101111        CALLBD  #n      'write Z/C/PC* into [SPB++], PC=n  1 **  +1

* bit 10 is Z, bit 9 is C, bits 8..0 are PC, upper bits are ignored or cleared

** if a stack RAM write is immediately followed by a stack RAM read, add one clock


Instruction Pipeline

forum link

forum link


Each cog has a 4-stage pipeline which all instructions progress through, in order to execute:

  1st stage            - Read instruction from cog register RAM

  2nd stage            - Determine any indirect or remapped D and S addresses, update INDA and INDB

  3rd stage            - Read D and S from cog register RAM

  4th stage            - Execute instruction, write D to cog register RAM, update Z/C/PC and any other results

On every clock cycle, the instruction data in each stage advances to the next stage, unless the instruction executing in the 4th stage is stalling the pipeline because it's waiting for something (WRBYTE waits for the hub).

To keep D and S data current within the pipeline, the resultant D from the 4th stage is passed back to the 3rd stage to substitute for any obsoleted D or S data currently being read from the cog register RAM. The same is done for instruction data currently being read in the 1st stage, but this still leaves a two-stage gap between when a register is modified and when it can be executed:

Example: Single-task self-modifying code

        MOVD    :inst,top9         '(initially 4th stage) modify instruction

        NOP                        '(initially 3rd stage) 1...

        NOP                        '(initially 2nd stage) 2... at least two instructions in-between

:inst   ADD     A,B                '(initially 1st stage) modified instruction executes

Tasks that execute no more frequently than every 3rd time slot don't need to observe this 2-instruction spacer rule when executing self-modifying code, because their instructions will always be sufficiently spread apart in the pipeline by other tasks' instructions, enabling a just-modified instruction to be properly read and executed in that task's next time slot. If less than two spacers are afforded to a modify-execute sequence, the old instruction will be read and executed, instead of the new one. This can be used to advantage for efficient overlapped modify-execute sequences.

When a branch instruction executes, that task's program counter is abruptly changed from what had been a steadily incrementing course, requiring that the pipeline be reloaded, beginning at the new program counter address. This can leave up to three instructions in the pipeline which were trailing the branch instruction and belong to the same task as the branch.

Normally, these trailing instructions are incidental data which are not intended for execution, and therefore must be cancelled within the pipeline, so that they pass through without doing anything. However, in some cases, it may be desirable to allow those instrucions to execute, without cancellation, to increase pipeline efficiency.

To accommodate both cancelling and non-cancelling branches, branch instructions have two versions. The ones that end in the letter 'D' for 'delayed' are non-cancelling and take only one clock, but will execute any trailing pipelined instructions which belong to the branch's same task.

In a single-task program, three trailing instructions are executed before the delayed branch seems to take effect:

Example: Single-task delayed branch

        JMPD    #somewhere      '(initially 4th stage) do a delayed jmp, then toggle P0 and cycle P1

        NOTP    #0              '(initially 3rd stage)

        NOTP    #1              '(initially 2nd stage)

        NOTP    #1              '(initially 1st stage) next instruction is loaded from 'somewhere'

In a two-task program with simple time slot allocation, only one trailing instruction is executed before the delayed branch seems to take effect:

Example: Two-task delayed branch (SETTASK #%%1010 timing)

        JMPD    #somewhere      '(initially 4th stage) do a delayed jmp to 'somewhere', then toggle P0

        NOTP    #0              '(initially 2nd stage) next instruction is loaded from 'somewhere'

The branch instructions that don't end in the letter 'D' are what would be considered 'normal' branches, where the next instruction to execute after the branch would be the instruction which was branched to.

Table: Branching Instructions

Normal

cancelling

Delayed non-cancelling

Normal

cancelling

Delayed non-cancelling

JMP

JMPD

IJNZ

IJNZD

CALL

CALLD

DJZ

DJZD

RET

RETD

DJNZ

DJNZD

JMPRET

JMPRETD

TJZ

TJZD

CALLA

CALLAD

TJNZ

TJNZD

CALLB

CALLBD

JP

JPD

RETA

RETAD

JNP

JNPD

RETB

RETBD

PASSCNT

IJZ

IJZD

JMPTASK


Instruction-Block Repeating

forum link

Each cog has an instruction-block repeater that can variably repeat up to 64 instructions without any clock-cycle overhead.

REPD and REPS are used to initiate block repeats. These instructions specify how many times the trailing instruction block will be executed and how many instructions are in the block:

REPD    #i       - execute 1..64 instructions infinitely, requires 3 spacer instructions *
REPD    D,#i     - execute 1..64 instructions D+1 times, requires 3 spacer instructions *
REPD    #n,#i    - execute 1..64 instructions 1..512 times, requires 3 spacer instructions *

REPS    #n,#i    - execute 1..64 instructions 1..16384 times, requires 1 spacer instruction *

REPS differs from REPD by executing at the 2nd stage of the pipeline, instead of the 4th. By executing two stages early, it needs only one spacer instruction *.

Because of its earliness, no conditional execution is possible, so it always executes, allowing the CCCC bits to be repurposed, along with Z, to provide a 14-bit constant for the repeat count.

The instruction-block repeater will quit repeating the block if a branch instruction executes within the block. This rule does not currently apply to a JMPTASK which affects the task using the
repeater - this will be fixed at the earliest opportunity.

There is only one REPS/REPD circuit, so REPS/REPD's cannot be nested.  
<forum>


* Spacer instructions are required in 1-task applications to allow the pipeline to prime before repeating can commence. If REPD is used by a task that uses no more than every 4th time slot, no
spacers are needed, as three intervening instructions will be provided by the other task(s). If REPS is used by a task that uses no more than every 2nd time slot, no spacers are needed.

Example: Using REP instruction

Example (1-task):

       REPD    D,#1            'execute 1 instruction D+1 times

       NOP                     '3 spacer instructions needed (could do something useful)
       NOP
       NOP

       NOTP    #0              'toggle P0, block repeats every 1 clock


Example (1-task):

       REPS    #20_000,#4      'execute 4 instructions 20,000 times

       NOP                     '1 spacer instruction needed (make the most of it)

       NOTP    #0              'toggle P0
       NOTP    #1              'toggle P1
       NOTP    #2              'toggle P2
       NOTP    #3              'toggle P3, block repeats every 4 clocks


Example (4-task, SETTASK #%%3210 timing):

task0   REPD    #1             'task0 will own the block repeater (no need for spacers)
       NOTP    #0             'toggle P0 every 4 clocks

task1   NOTP    #1             'toggle P1 every 8 clocks
       JMP     #task1

task2   NOTP    #2             'toggle P2 every 8 clocks
       JMP     #task2

task3   NOTP    #3             'toggle P3 every 8 clocks
       JMP     #task3

Table: REP Instructions

Mnemonic

Operand

Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1)

Clocks

REPD

 #i

execute 1..64 inst's infintely

1

REPD

 D,#i

execute 1..64 inst's D+1 times

1

REPD

#n,#i

execute 1..64 inst's 1x..512x

1

REPS

#n,#i

execute 1..64 inst's 1x..16384x

1

Note that the %iiiiii field represents 1..64 instructions, not the encoded 0..63. The %nnnnnnnnn/%n___nnnn_nnnnnnnnn fields are +1-based, too.


 Multi-tasking

Each cog has four sets of flags and program counters (Z/C/PC), constituting four unique tasks that can execute and switch on each instruction cycle.

At cog startup, the tasks are initialized as follows:

task Z  C  PC

0    0  0  $000

1    0  0  $001

2    0  0  $002

3    0  0  $003

There are 16 rotating time slots in the TASK register that determine task sequence. Initially, all time slots are set to 0, causing task 0 to execute exclusively, starting at address $000:

Task Time Slots

16 TIME SLOTS

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

TASK REGISTER

b31..b00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

The two LSB's of TASK always determine which task will next be queued in the pipeline for execution. After each instruction cycle, the TASK register is rotated right by two bits, recycling slot 0 to slot 15 and getting the next task into the 2 LSB's.  

SETTASK

To enable other tasks, SETTASK is used to set the TASK register:

SETTASK D               write D to the TASK register

SETTASK #n              write {n[7:0], n[7:0], n[7:0], n[7:0]} to the TASK register

If a task is given no time slot, it doesn't execute and its flags and PC stay at initial values. If a task is given a time slot, it will execute and its flags and PC will be updated at every instruction, or time slot. If an active task's time slots are all taken away, that task's flags and PC remain in the state where they left off, until it is given another time slot.

When SETTASK issues a new time slot pattern, there are already three instructions in the pipeline, so the 4th instruction after SETTASK will be from the task specified in the two LSB's of the SETTASK operand.

JMPTASK

To immediately force any of the four PC's to a new address, JMPTASK can be used. JMPTASK uses a 4-bit mask to select which PC's are going to be written. Mask bits 0..3 represent PC's 0..3. The mask value %1010 would write PC 3 and PC 1, while %0100 would write PC 2, only.

JMPTASK D,#mask         force PC's in mask to D

JMPTASK #addr,#mask     force PC's in mask to #addr

For every PC/task affected by a JMPTASK instruction, all affected-task instructions currently in the pipeline are cancelled. This insures that once JMPTASK executes, the next instruction from each affected task will be from the new address.

Here is an example in which all four tasks are started and each task toggles an I/O pin at a different rate:

Example: Starting four tasks

       ORG

       JMP     #task0          'task 0 begins here when the cog starts (this JMP takes 4 clocks)

       JMP     #task1          'task 1 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

       JMP     #task2          'task 2 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

       JMP     #task3          'task 3 begins here after task 0 executes SETTASK (this JMP takes 1 clock)

ctwardell suggests a correction

JMPTASK #task1,#%0010 'task 1 begins here after task 0 executes SETTASK (%0010 = Set PC1)

JMPTASK #task2,#%0100 'task 2 begins here after task 0 executes SETTASK (%0010 = Set PC2)

JMPTASK #task3,#%1000 'task 3 begins here after task 0 executes SETTASK (%0010 = Set PC3)

task0  SETTASK #%%3210         'enable all tasks (TASK = %11_10_01_00_11_10_01_00_11_10_01_00_11_10_01_00)

:loop  NOTP    #0              'task 0, toggle pin 0       (loops every 8 clocks)

       JMP     #:loop          '(this JMP takes 1 clock)

task1  NOTP    #1              'task 1, toggle pin 1       (loops every 12 clocks)

       NOP

       JMP     #task1          '(this JMP takes 1 clock)

task2  NOTP    #2              'task 2, toggle pin 2       (loops every 16 clocks)

       NOP                    

       NOP

       JMP     #task2          '(this JMP takes 1 clock)

task3  NOTP    #3              'task 3, toggle pin 3       (loops every 20 clocks)

       NOP

       NOP

       NOP

       JMP     #task3          '(this JMP takes 1 clock)

-*


Note: When a normal branch instruction (JMP, CALL, RET, etc.) executes in the fourth and final stage of the pipeline, all instructions progressing through the lower three stages, which belong to the same task as the branch instruction, are cancelled. This inhibits execution of incidental data that was trailing the branch instruction.

The delayed branch instructions (JMPD, CALLD, RETD, etc.) don't do any pipeline instruction cancellation and exist to provide 1-clock branches to single-task programs, where the three instructions following the branch are allowed to execute before the new instruction stream begins to execute.

For single-task programs, normal branches take 4 clocks: 1 clock for the branch and 3 clocks for the cancelled instructions to come through the pipeline before the new instruction stream begins to execute.

For multi-tasking programs that use all four tasks in sequence (ie SETTASK #%%3210), there are never any same-task instructions in the pipeline that would require cancellation due to branching, so all branches take just 1 clock.

Table: Task Instructions

ENCODING

INSTRUCTION

DESCRIPTION

CLOCK

000011 000 1 CCCC DDDDDDDDD 01001mmmm

000011 001 1 CCCC nnnnnnnnn 01001mmmm

JMPTASK D,#mask

JMPTASK #n,#mask

Set PC's in mask to D

Set PC's in mask to 0..511

1

1

000011 000 1 CCCC DDDDDDDDD 011001011

000011 001 1 CCCC nnnnnnnnn 011001011

SETTASK D

SETTASK #n

Set TASK to D

Set TASK to n[7:0] copied 4x

1

1

Register Remapping

<forum>

Here's a little program that kicks off four tasks running the same code, but with different variable sets.

Register remapping is set up to remap 4 sets of 4 registers, according to the task executing. For tasks 0..3, hard addresses 0..3 remap to 0..3, 4..7, 8..11, or 12..15.

Example: Register Remapping

dat
       org                        'longs are like nop's, get skipped

pin    long        0                'task 0 data
count  long        1
delay  long        0
extra  long        0

      long        1                'task 1 data
      long        5
      long        0
      long        0

      long        2                'task 2 data
      long        13
      long        0
      long        0

      long        3                'task 3 data
      long        29
      long        0
      long        0

      setmap        #%1_010_010        'remap registers by task, 4 sets, 4 registers each
      settask        #%%3210                'enable all tasks
      jmptask        #loop,#%1111        'before any newly-started tasks get to execute stage, jump all tasks to loop

loop   notp        pin                'toggle task x pin
      mov        delay,count        'get task x delay
      djnz        delay,#$        'count down delay
      jmp        #loop                'loop (count + 3 clocks)

Tips for coding multi-tasking programs

While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict, remember that there's only one of each of the following cog resources and only one task can use it at a time:

Tasks and the Pipeline

When writing multi-task programs, be aware that instructions that take multiple clocks will stall the pipeline and have a ripple effect on the tasks' timing. This may be impossible to avoid, as some task might need to access hub memory, and those instructions are not single-clock.

Avoiding Pipeline Stall

The WAITCNT/WAITPEQ/WAITPNE instructions should be coded discretely using 1-clock instructions, to avoid stalling the pipeline for excessive amounts of time.

The following instructions (WC versions) will take 1 clock, instead of potentially many, and return 1 in C if they were successful:

SNDSER  D  WC

RCVSER  D  WC

GETMULL D  WC

GETMULH D  WC

GETDIVQ D  WC

GETDIVR D  WC

GETSQRT D  WC

GETQX   D  WC

GETQY   D  WC

GETQZ   D  WC

attempt to send serial

attempt to receive serial

attempt to get lower multiplier result

attempt to get upper multiplier result

attempt to get divider quotient result

attempt to get divider remainder result

attempt to get square root result

attempt to get CORDIC X result

attempt to get CORDIC Y result

attempt to get CORDIC Z result

Other instruction alternatives:

POLCTRA    WC      

POLCTRB    WC      

POLVID     WC      

PASSCNT D          

JP/JNP  D,S        

DJNZ    D,#$      

returns 1 in C if CTRA rolled over, use instead of SYNCTRA

returns 1 in C if CTRB rolled over, use instead of SYNCTRB

returns 1 in C if WAITVID is ready, use to execute WAITVID without stalling

jumps to itself if some amount of time has not passed, use instead of WAITCNT

jumps based on pin states, use instead of WAITPEQ/WAITPNE

loops until done, use instead of NOP D/#n

 

Instructions to avoid in multi-tasking

The following instructions will not work in a multi-tasking program:

REPS/REPD

GETPIX

operate by subtracting a value from the PC every n clocks - single-task only

needs steady pipeline delays for perspective divider time - single-task only



I/O Ports

There are now 4 I/O ports built into the system – 3 are physical 32-bit I/O ports and 1 is an internal 32-bit I/O port. The I/O pins connected to each port can be configured separately.

Table 15: Port Access Instructions

Mnemonic

Operand

Operation

SETPORTA

D/#n

Assign PORTA to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.

SETPORTB

D/#n

Assign PORTB to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.

SETPORTC

D/#n

Assign PORTC to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.

SETPORTD

D/#n

Assign PORTD to physical I/O ports (0-2) or internal I/O port 3 given register “D (0-511)” or number “n (0-3)”.

Table 16: Pin State Access Instructions

Mnemonic

Operand

Operation

GETP 

D/#n

Get pin number given by register “D (0-511)” or “n (0-127)”into !Z or C flags.

GETPN 

D/#n

Get pin number given by register “D (0-511)” or “n (0-127)”into Z or !C flags.

OFFP

D/#n

Toggle pin number given by register “D (0-511)” or “n (0-127)” off or on. DIR

NOTP

D/#n

Invert pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT

CLRP

D/#n

Clear pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT

SETP

D/#n

Set pin number given by the value in register “D (0-511)” or “n (0-127)”. OUT

SETPC

D/#n

Set pin number given by the value in register “D (0-511)” or “n (0-127)” to C

SETPNC

D/#n

Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !C

SETPZ

D/#n

Set pin number given by the value in register “D (0-511)” or “n (0-127)” to Z

SETPNZ

D/#n

Set pin number given by the value in register “D (0-511)” or “n (0-127)” to !Z

External RAM

Each cog now features the ability, with the help of the I/O pins, to quickly stream parallel data in or out of the I/O pins aligned to a clock source. Data is streamed to/from the CLUT or WRQUAD

overlay. From there it can be quickly feed to the video generator or to the internal HUB RAM. XFR feeds data 16 Bits or 32 Bits at a time at the system clock speed.

Table 17: External RAM Instruction

Mnemonic

Operand

Operation

SETXFR

D/#n

Setup the direction of the data stream, the source and destination of the data stream, and the size of the data stream given D or “n (0-63)”.

InterChip Communication

Each cog now also features high-speed serial transfer and receive hardware for interchip communication. The hardware requires three I/O pins (SO, SI, CLK).

Table 18: InterChip Communication Instructions

Mnemonic

Operand

Operation

SNDSER

D

Sends a long (D) out of the special chip-to-chip serial port. Blocks until the long is sent. Use C flag to avoid blocking.

RCVSER

D

Receives a long (D) in from the special chip-to-chip serial port. Blocks until the long is received. Use C flag to avoid blocking.

SETSER

D/#n

Sets up the serial port I/O pins to use for SO, SI, and CLK given D or “n (0-63)”.

Cog Memory Remapping

Cogs now have the ability to remap their internal memory to help facilitate context switching between register banks. Instead of having to save a bunch of internal register to switch running

programs all references to a set of register can be changed instantaneously.

Table 19: Cog Memory Remapping Instruction

Mnemonic

Operand

Operation

SETMAP

D/#n

Remap one cog register space to another cog register space given D or n.

InterCog Communication

Cogs now have the ability to communicate directly to each other using the internal I/O Port D, which connects each cog to every other cog.

Table 20: InterCog Communication Instruction

Mnemonic

Operand

Operation

SETXCH

D/#n

Reconfigure Port D I/O masks given D or n to select which cogs to listen to.

Pin Modes

Each I/O pin is now capable of setting itself into many different modes to more easily interface with the analog world. By default, each I/O starts up in the basic robust digital I/O state. However,

once configured the I/O pin can be used for external RAM memory transfer, as an ADC, as a DAC, a Schmitt trigger, or a comparator, etc. See Figure 2 for a table of pin modes and their associated properties.

Table 21: Pin Mode Access Instructions

Mnemonic

Operand

Operation

SETPORT  

D/#n

Assign which port the CFGPINS instruction will configure given register “D (0-511)” or number “n (0-3)”.

CFGPINS  

D,S

Setup pins masked by register “D (0-511)” to register “S (0-511)”. The pin configuration modes are below.

NOTE: PinA is the pin being set. PinB is its neighbor (All I/O pins have a cross coupled neighbor). Input is the Boolean statement for what the pin returns when read. Output is the statement for

what the pins outputs when it is an output (Some modes output their input to make feedback relaxation oscillators, etc). Each pin’s high and low drivers can be configured to work in many

different modes. Pins can also re-clock data sent to them locally to remove jitter in data. Every pin is setup by a 13-bit configuration value.

Figure 2: Pin Modes

Code

Mode

Input

PinA Output

PinB

Compare

HHH

LLL

DRIVE

0000_CIOHHHLLL

General I/O

PinA Logic

OUT

-

-

000

FAST

0001_CIOHHHLLL

General I/O

PinA Logic

INPUT

-

-

001

SLOW

0010_CIOHHHLLL

General I/O

PinB Logic

INPUT

-

-

010

1500Ω

0011_CIOHHHLLL

General I/O

PinB Logic

INPUT

1MΩ PinA

-

011

10kΩ

0100_CIOHHHLLL

General I/O

PinA Schmitt

OUT

-

-

100

100kΩ

0101_CIOHHHLLL

General I/O

PinA Schmitt

INPUT

-

-

101

100μA

0110_CIOHHHLLL

General I/O

PinB Schmitt

INPUT

-

-

110

10μA

0111_CIOHHHLLL

General I/O

PinB Schmitt

INPUT

1MΩ PinA

-

111

FLOAT

1000_CIOHHHLLL

General I/O

PinA > VIO/2

OUT

-

FAST

C

OUT/IN

1001_CIOHHHLLL

General I/O

PinA > VIO/2

INPUT

-

FAST

0

LIVE

1010_CIOHHHLLL

General I/O

PinB > VIO/2

INPUT

-

FAST

1

CLOCKED

1011_CIOHHHLLL

General I/O

PinB > VIO/2

INPUT

1MΩ PinA

FAST

I/O

IN/OUT

1100_CIOHHHLLL

General I/O

PinA > PinB

OUT

-

PRECISE

0

TRUE

1101_CIOHHHLLL

General I/O

PinA > PinB

INPUT

-

PRECISE

1

INVERTED

1110_CIOHHHLLL

General I/O

PinA > PinB

INPUT

1MΩ PinA

PRECISE

1111_0LLLLLLLL

Compare Level

PinA > VIO/256*L

-

-

PRECISE

1111_1000xxxxx

ADC Diff, 100kΩ

PinA >  VIO/2 10kΩ

100kΩ, !IN

10kΩ VIO/2

FAST

1111_10010xxxx

ADC Precise,  DIR/OUT = Cal

ADC

7MΩ

-

FAST

1111_10011xxxx

ADC FAST,  DIR/OUT = Cal

ADC

400kΩ

-

FAST

1111_101VxxCCC

DAC 75Ω, V=Video, C=Cog

1

75Ω

-

-

1111_110HHHLLL

SDRAM DATA I/O

PinA Logic

FAST, OUT

-

-

1111_111HHHLLL

SDRAM Clock Out

1

FAST, OUT=1

-

-

Video Generator

Each cog has a video generator capable of generating composite, component, s-video, and VGA video. The video generator is fed pixel data through the waitvid instruction and uses the pixel data to look up colors to output from the CLUT. The video generator understands R.G.B.A.X color grouping and can handle RGB565/555/444/etc formatted data.

Table 22: Video Generator Access Instructions

Mnemonic

Operand

Operation

SETVID

D/#n

Setup the video generator according to D or n to output video from the CLUT.

SETVIDY

D/#n

Setup the video generator color matrix transform term Y according to D or n.

SETVIDI  

D/#n

Setup the video generator color matrix transform term I according to D or n..

SETVIDQ

D/#n

Setup the video generator color matrix transform term Q according to D or n.

DAC Hardware

Each cog has four DACs capable of SIN/COS wave output, saw tooth wave output, triangle wave output, and square wave output. Additionally, the video generator, when operational, will use the four DACs to produce video output. Please refer to the information below.

o DAC0 = CTRASIN, DAC1 = CTRACOS, DAC2 = CTRBSIN, DAC3 = CTRBCOS

o DAC0/2 = CTRASIN + CTRBSIN, DAC1.3 = CTRACOS + CTRBCOS

o DAC0 = SYNC, DAC1 = Q/B, DAC2 = I/G, DAC3 = Y/R

Table 23: DAC Hardware Access Instructions

Mnemonic

Operand

Operation

CFGDAC0 

D/#n

Configure DAC0 to D or n. See above.

CFGDAC1

D/#n

Configure DAC1 to D or n. See above.

CFGDAC2

D/#n

Configure DAC2 to D or n. See above.

CFGDAC3

D/#n

Configure DAC3 to D or n. See above.

SETDAC0

D/#n

Set DAC0 to top 18 bits of D/n.

SETDAC1

D/#n

Set DAC1 to top 18 bits of D/n.

SETDAC2

D/#n

Set DAC2 to top 18 bits of D/n.

SETDAC3

D/#n

Set DAC3 to top 18 bits of D/n.

CFGDACS

D/#n

Configure DACs to D or n. See above

SETDACS

D/#n

Set DACs to top 18 bits of D/n

Texture Mapping

Each cog has texture mapping hardware to assist the video generator with displaying textures and performing color blending on screen.

Table 24: Texture Mapping Instructions

Mnemonic

Operand

Operation

GETPIX  

D

Store texture pointer address in D.

SETPIX

D/#n

Set texture size and address to D/n

SETPIXU

D/#n

Set texture pointer x address to D/n.

SETPIXV

D/#n

Set texture pointer y address to D/n.

SETPIXZ

D/#n

Set texture pointer z address to D/n.

SETPIXR

D/#n

Set texture pointer R blending to D/n

SETPIXG

D/#n

Set texture pointer G blending to D/n

SETPIXB

D/#n

Set texture pointer B blending to D/n

SETPIXA

D/#n

Set texture pointer A blending to D/n


CLUT or Stack RAM

Each cog now features a 256 Long Color Look Up Table (CLUT) designed for use with the video generator in each cog. While the video generator is in use each long in the CLUT holds R.G.B.A.Z information for the video generator to display video with. When the video generator is not in use the CLUT may be used as a general-purpose memory scratch space, or as a 256 Long FIFO buffer, or as a call stack and evaluation stack (at the same time). The CLUT has two pointers used to index it called SPA and SPB.

Table 6: CLUT Instructions

Mnemonic

OPR

CLK

Operation

GETSPD

D

1

SPA-SPB into D, Z/C as CHKSPD

GETSPA

D

1

SPA into D, Z/C as CHKSPA

GETSPB

D

1

SPB into D, Z/C as CHKSPB

POPAR

D

1

Store CLUT[SPA] in register “D (0-511)” and then increment SPA

POPBR

D

1

Store CLUT[SPA] in register “D (0-511)” and then increment SPB

POPA

D

1

Decrement SPA and then store CLUT[SPA] in register “D (0-511)”.

POPB

D

1

Decrement SPB and then store CLUT[SPB] in register “D (0-511)”.

RETA

Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF).

Flush pipeline before jump – results in a two-cycle loss.

RETB

Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF).

Flush pipeline before jump – results in a two-cycle loss.

RETAD

Decrement SPA and then jump to instruction (CLUT[SPA] & 0x1FF).

Do not flush pipeline before jump – must be executed two instructions before intended jump space.

RETBD

Decrement SPB and then jump to instruction (CLUT[SPB] & 0x1FF).

Do not flush pipeline before jump – must be executed two instructions before intended jump space.

SETSPA

D/#n

1

Set SPA to register “D (0-511)” or “n (0-511)”

SETSPB

D/#n

1

Set SPB to register “D (0-511)” or “n (0-511)”

ADDSPA

D/#n

1

Add to SPA register “D (0-511)” or “n (0-511)”.

ADDSPB

D/#n

1

Add to SPB register “D (0-511)” or “n (0-511)”.

SUBSPA

D/#n

1

Subtract from SPA register “D (0-511)” or “n (0-511)”.

SUBSPB

D/#n

1

Subtract from SPB register “D (0-511)” or “n (0-511)”.

PUSHAR

D/#n

1..2

Decrement SPA and then store register “D (0 511)” in CLUT[SPA].

PUSHBR

D/#n

1..2

Decrement SPB and then store register “D (0 511)” in CLUT[SPB].

PUSHA

D/#n

1..2

Store register “D (0-511)” in CLUT[SPA] and then increment SPA.

PUSHB

D/#n

1..2

Store register “D (0-511)” in CLUT[SPB] and then increment SPB.

CALLA

D/#n

4..5

Store Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address “n (0-511)”.

Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush.

CALLB

D/#n

4..5

Store  Z/C/PC* and then increment SPB and then jump to the address in register “D (0-511)” or address “n (0-511)”.

Flush pipeline before jump – results in a two-cycle loss. D version doesn’t flush.

CALLAD

D/#n

4..5

Store  Z/C/PC* in CLUT[SPA] and then increment SPA and then jump to the address in register “D (0-511)” or address   “n (0-511)”...

CALLBD

D/#n

4..5

Store  Z/C/PC* in CLUT[SPB] and then increment SPB and then jump to the address in register “D (0-511)” or address   “n (0-511)”...

GETSPD

D

4..5

Stores ((SPA - SPB) & 0x7F) in register “D (0-511)”. FOR FIFO MODE.

GETSPA

D

4..5

Stores SPA in register “D (0-511)”.

GETSPB

D

4..5

Stores SPB in register “D (0-511)”.


Math

Each cog now features the ability to perform 32-bit multi-cycle multiplies, 32-bit multi-cycle divides, 32-bit multi-cycle square roots, and 32-bit CORDIC transcendental operations. All of the

advanced multi-cycle math operations use separate state machines that run concurrently with processor execution.

Note: The CORDIC algorithm rotates a point in the XY plane by a given angle. Look at X/Y/A results for SIN/COS/TAN/ARCSIN/ARCOS/ARCTAN values of X/Y/A.

Table 7: Math Operation Instructions

Mnemonic

Operand

Operation

GETMULL

D

Store the bottom 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet.

GETMULH

D

Store the top 32 bits of the 32x32 bit multiply in register “D (0-511)”, waits for multiply FSM if not done yet.

GETDIVQ

D

Store the quotient of the divide in register “D (0-511)”, waits for divide FSM if not done yet.

GETDIVR

D

Store the remainder of the divide in register “D (0-511)”, waits for divide FSM if not done yet.

GETSQRT

D

Store the result of the square root in register “D (0-511)”, waits for square root FSM if not done yet.

GETQX

D

Store the result of the CORDIC X part in register “D (0-511)”, waits for CORDIC FSM if not done yet.

GETQY

D

Store the result of the CORDIC Y part in register “D (0-511)”, waits for CORDIC FSM if not done yet.

GETQZ

D

Store the result of the CORDIC A part in register “D (0-511)”, waits for CORDIC FSM if not done yet.

SETMULA

D/#n

Setup long A to be multiplied by long B given the value in register “D (0-511)” or number “n (0-511)”.

Will take 16 cycles.

SETMULB

D/#n

Setup long B to be multiplied by long A given the value in register “D (0-511)” or number “n (0-511)”.

Starts multiply.

SETDIVA

D/#n

Setup the dividend long given the value in register “D (0-511)” or number “n (0-511)”.

Will take 16 cycles.

SETDIVB

D/#n

Setup the divisor long given the value in register “D (0-511)” or number “n (0-511)”. Starts divide.

SETQI

D/#n

Set iteration override to 0..31 (otherwise, iteration counts are load-dependent)

SETQZ

D/#n

Setup the CORDIC state machine with the angle given by the value in register “D (0-511)” or number “n (0-511)”.

QROTATE

D,S

Start the CORDIC rotation operation given the value in register “D (0-511) or “S (0-31)” iterations.

QARCTAN

D,S

Start the CORDIC arc tangent operation given the value in register “D (0-511) or “S (0-31)” iterations.

QEXP

D/#n

Start the CORDIC exponential operation given the value in register “D (0-511) or “n (0-31)” iterations.

QLOG

D/#n

Start the CORDIC logarithmic operation given the value in register “D (0-511) or “n (0-31)” iterations.

QSINCOS

D,S

Get sine and cosine of angle D with magnitude S (use GETQX D & GETQY D after)

Miscellaneous Hardware

LFSR

Each cog has a free running LFSR (Linear Feedback Shift Register) and System Counter that change every clock cycle. Each access of the LFSR taps into a 32 bit wide sequence of numbers

that is traversed in a pseudo random order, for a 232 .

System Counter

The system counter counts the number of clock ticks since power up – it is a 64-bit counter, the LFSR is 32 Bits.

Table 8: System Counter Instructions

Mnemonic

Operand

Operation

GETCNT

D

Store the bottom 32 Bits of the System Counter (CNT) in register “D (0-511)”. If executed again(no instruction in between previous execution) store the top 32 Bits of the System Counter in register “D (0-511)”.

If a roll over occurs between accesses TOP-1 is stored.

SUBCNT

D

Subtracts the system count value when the GETCNT instruction was last executed from the current system count value. Results are stored in the register referenced by “D (0-511)”.

GETLFSR

D

Store the LSFR value in register “D (0-511)”.

Multiply Accumulate

Each cog additionally has a single cycle 24-bit hardware multiplier capable of unsigned and signed multiplications. The multiplication also adds into a 64-bit register ACCx for MAC ops.

Table 9: Multiply and Accumulate Instructions

Mnemonic

Operand

Operation

MACA

D,S

Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator A.

MACB

D,S

Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and add to the 64-bit accumulator B.

MUL

D,S

Multiply unsigned register “D (0-511)” and unsigned register “S (0-511)” or an immediate value (0-511) and store in register D.

SCL

D,S

Scale the result of the multiplication of two 24 bit numbers (D,S) to fit into the 32 bit destination register specified by “D (0-512)”.

CLRACCA

Zero Multiply Accumulator A (ACCA).

CLRACCB

Zero Multiply Accumulator B (ACCB).

CLRACCS

Zero both multiply accumulators (accumulator A and B).

GETACCA

D

Store the bottom 32 Bits of the A accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the A accumulator in register “D (0-511)”.

GETACCB

D

Store the bottom 32 Bits of the B accumulator in register “D (0-511)”. If executed again (no instruction in between previous execution) store the top 32 Bits of the B accumulator in register “D (0-511)”.

SETACCA

D,S

Sets the high and low values of the 64 bit accumulator A. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long.

SETACCB

D,S

Sets the high and low values of the 64 bit accumulator B. The value contained in register “D (0-511)” sets the low long while the value contained in “S (0-512)” sets the high long.

FITACCA

Shifts accumulator A’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator A’s high long is then replaced with the number of bit places required to MSB justify Accumulator A’s original value.

FITACCB

Shifts accumulator B’s high long right into the low long so that the high long is MSB justified (discarding the low bits). Accumulator B’s high long is then replaced with the number of bit places required to MSB justify Accumulator B’s original value.

FITACCS

Similar operation to FITACCA/FITACCB. Examines both accumulator A and B and right shifts both accumulators so that the greater value of the two accumulators is MSB justified. The number of bits shifted is written to both

accumulator’s high long. This has the effect of scaling both accumulators equally.

Miscellaneous Instructions

Each cog additionally features a number of new instructions to make many common operations much easier to perform than before. Most of the new instructions are in the extended instruction

set while a few of the new instruction are in the original set.

Table 10: Extended Miscellaneous Instructions

Mnemonic

Operand

Operation

DECOD5

D

Overwrite register “D (0-511)” with decoded D[4:0] repeated 1 time. (e.g. $00000001 << D[4:0])

DECOD4

D

Overwrite register “D (0-511)” with decoded D[3:0] repeated 2 times. (e.g. $00010001 << D[3:0])

DECOD3

D

Overwrite register “D (0-511)” with decoded D[2:0] repeated 4 times. (e.g. $01010101 << D[2:0])

DECOD2

D

Overwrite register “D (0-511)” with decoded D[1:0] repeated 8 times. (e.g. $11111111 << D[1:0])

BLMASK

D

Overwrite register “D (0-511)” with a bit length mask specified by D[5:0].

NOT

D

Overwrite register “D (0-511)” with the bitwise inverted register “D (0-511)”

ONECNT

D

Overwrite register “D (0-511)” with the count of ones in register D.

ZERCNT

D

Overwrite register “D (0-511)” with the count of zeros in register D.

INCPAT

D

Overwrite register “D (0-511)” with the next bit pattern that keeps the number of ones and zeros the same in register D.

DECPAT

D

Overwrite register “D (0-511)” with the previous bit pattern that keeps the number of ones and zeros the same in register D.

BINGRY

D

Overwrite the binary pattern in register “D (0-511)” with its gray code pattern.

GRYBIN

D

Overwrite the grey code pattern in register “D (0-511)” with its binary pattern.

MERGEW

D

Merge the high word and the low word of register “D (0-511)” into each other and overwrite register D with the new value. Bits of the low word occupy bit spaces 0, 2, 4, etc. Bits of the high word occupy bit spaces 1, 3, 5, etc. (Interleave)

SPLITW

D

Split the bits of register “D (0-511)” into a high word and low word and overwrite register D with the new value. Bits of the low word come from bit spaces 0, 2, 4, etc. Bits of the high word come from bit spaces 1, 3, 5, etc. (De-interleave)

SEUSSF

D

Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 forward iterations, the original bit pattern is returned.

SEUSSR

D

Overwrite register “D (0-511)” with a pseudo random bit pattern seeded from the value in register D. After 32 reversed iterations, the original bit pattern is returned.

ISOB

D.b

Isolate bit “b (0-31)” of register “D (0-511).”

NOTB

D.b

Invert bit “b (0-31)” of register “D (0-511).”

CLRB

D.b

Clear bit “b (0-31)” of register “D (0-511).”

SETB

D.b

Set bit “b (0-31)” of register “D (0-511).”

SETBC

D.b

Set bit “b (0-31)” of register “D (0-511) to C.”

SETBNC

D.b

Set bit “b (0-31)” of register “D (0-511) to NC.”

SETBZ

D.b

Set bit “b (0-31)” of register “D (0-511) to Z.”

SETBNZ

D.b

 Set bit “b (0-31)” of register “D (0-511) to NZ.”


Table 11: Extended Miscellaneous Flag Manipulation Instructions

Mnemonic

Operand

Operation

PUSHZC

D

Push the Z and C flags into D[1:0] and pop D[31:30] into Z and C through WZ and WC.

POPZC

D

Pop D[1:0] into the Z and C flags and push D[31:30] into Z and C through WZ and WC.

SETZC

D/#n, #i

Set the Z and C flags with D[1:0] through WZ and WC effects.

Table 12: Extended Miscellaneous Flow Control Instructions

Mnemonic

Operand

Operation

REPD

D/#n

Delayed repeat of the following “i (0-31)” instructions the value in register “D(0-511)” or “n(0-511)” times. The pipeline causes a delay of three instructions before the repeated set of instructions begins to execute.

NOPX

D/#n

Repeat the NOP instruction the value in register “D(0-511)” or “n(0-511)” times.

SETSKIP

D/#n

Executes up to the next 32 instructions as NOPs described by the set bit pattern of a register “D(0-511)” or literal “N(0-63)”.

Table 13: Miscellaneous Instructions

Mnemonic

Operand

Operation

ENC

D,S

Store encoded S in D.

JMPRET

D,S

See P8X32A – No instruction change.

ROR

D,S

See P8X32A – No instruction change.

ROL

D,S

See P8X32A – No instruction change.

SHR

D,S

See P8X32A – No instruction change.

SHL

D,S

See P8X32A – No instruction change.

RCR

D,S

See P8X32A – No instruction change.

RCL

D,S

See P8X32A – No instruction change.

SAR

D,S

See P8X32A – No instruction change.

REV

D,S

See P8X32A – No instruction change.

MINS

D,S

See P8X32A – No instruction change.

MAXS

D,S

See P8X32A – No instruction change.

MIN

D,S

See P8X32A – No instruction change.

MAX

D,S

See P8X32A – No instruction change.

MOVS

D,S

See P8X32A – No instruction change.

MOVD

D,S

See P8X32A – No instruction change.

MOVI

D,S

See P8X32A – No instruction change.

JMPRETD

D,S

See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space.

AND

D,S

See P8X32A – No instruction change.

ANDN

D,S

See P8X32A – No instruction change.

OR

D,S

See P8X32A – No instruction change.

XOR

D,S

See P8X32A – No instruction change.

MUXC

D,S

See P8X32A – No instruction change.

MUXNC

D,S

See P8X32A – No instruction change.

MUXZ

D,S

See P8X32A – No instruction change.

MUXNZ

D,S

See P8X32A – No instruction change.

ADD

D,S

See P8X32A – No instruction change.

SUB

D,S

See P8X32A – No instruction change.

ADDABS

D,S

See P8X32A – No instruction change.

SUBABS

D,S

See P8X32A – No instruction change.

SUMC

D,S

See P8X32A – No instruction change.

SUMNC

D,S

See P8X32A – No instruction change.

SUMZ

D,S

See P8X32A – No instruction change.

SUMNZ

D,S

See P8X32A – No instruction change.

MOV

D,S

See P8X32A – No instruction change.

NEG

D,S

See P8X32A – No instruction change.

ABS

D,S

See P8X32A – No instruction change.

ABSNEG

D,S

See P8X32A – No instruction change.

NEGC

D,S

See P8X32A – No instruction change.

NEGNC

D,S

See P8X32A – No instruction change.

NEGZ

D,S

See P8X32A – No instruction change.

NEGNZ

D,S

See P8X32A – No instruction change.

CMPS

D,S

See P8X32A – No instruction change.

CMPSX

D,S

See P8X32A – No instruction change.

ADDX

D,S

See P8X32A – No instruction change.

SUBX

D,S

See P8X32A – No instruction change.

ADDS

D,S

See P8X32A – No instruction change.

SUBS

D,S

See P8X32A – No instruction change.

ADDSX

D,S

See P8X32A – No instruction change.

SUBSX

D,S

See P8X32A – No instruction change.

SUBR

D,S

Subtract D from S and store in D

CMPSUB

D,S

See P8X32A – No instruction change.

INCMOD

D,S

Increment D between 0 and S. Wraps around to 0 when above S

DECMOD

D,S

Decrement D between S and 0. Wraps around to S when below 0.

IJZ

D,S

Increment D and jump to S if D is zero

IJZD

D,S

Increment D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.

IJNZ

D,S

Increment D and jump to S if D is not zero

IJNZD

D,S

Increment D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed two instructions before intended jump space.

DJZ

D,S

Decrement D and jump to S if D is zero

DJZD

D,S

Decrement D and jump to S if D is zero. Do not flush pipeline before jump – must be executed two  instructions before intended jump space.

DJNZ

D,S

Decrement D and jump to S if D is not zero.

DJNZD

D,S

Decrement D and jump to S if D is not zero. Do not flush pipeline before jump – must be executed  two instructions before intended jump space.

TJZ

D,S

See P8X32A – No instruction change.

TJZD

D,S

See P8X32A – No instruction change. Do not  flush pipeline before jump – must be executed two   instructions before intended jump space.

TJNZ

D,S

See P8X32A – No instruction change.

TJNZD

D,S

See P8X32A – No instruction change. Do not flush pipeline before jump – must be executed two instructions before intended jump space.

SETINDA

D,S

Setup indirection register address A bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range.

SETINDB

D,S

Setup indirection register address B bottom range and top range where D is the top of the range and S is the bottom range. The indirection register will allow access to cog registers in this range.

WAITVID

D,S

Wait to pass pixels to the video generator.

WAITCNT

D,S

Wait for the CNT[31:0] register to equal D and then add S to D and store in D. If WC is specified then wait for CNT[63:32] to equal D.

WAITPEQ

D,S

See P8X32A – No instruction change.

WAITPNE

D,S

See P8X32A – No instruction change.

Register Map

Each cog has 10 memory mapped registers that allow control over I/O pins and indirection. The OUTx and INx registers have now been combined to form the PIN registers. The IND registers

allow indirect register access to avoid self-modifying code. All other REGs are free.

Table 14: Register Map Setup

Register

Location

Operation

INDA

$1F6

When read or written writes to the cog memory address set my SETINDA. After being accessed auto increments. Condition codes are not allowed to be used with INDA register access.

INDB

$1F7

When read or written writes to the cog memory address set my SETINDB. After being accessed auto increments. Condition codes are not allowed to be used with INDB register access.

PINA

$1F8

When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINA.

PINB

$1F9

When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINB.

PINC

$1FA

When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PINC.

PIND

$1FB

When written changes the state of the I/O pin attached to port A. When read, returns the state of the I/O port attached to PIND.

DIRA

$1FC

Enables or disables the output functionally of PORTA. Input reading is never disabled.

DIRB

$1FD

Enables or disables the output functionally of PORTB. Input reading is never disabled.

DIRC

$1FE

Enables or disables the output functionally of PORTC. Input reading is never disabled.

DIRD

$1FF

Enables or disables the output functionally of PORTD. Input reading is never disabled.

Counter Modules

Each cog has two counter modules – CTRA and CTRB. Each counter module has a FRQ, PHS, SIN, and COS register. The counter modules control the SIN and COS registers to track the phase and power of a signal. The FRQ and PHS registers work the same. Each counter module also has logic modes, which allow it to accumulate given different logic equations involving a selected pin A and pin B – see P8X32A. The counter modes now also feature quadrature encoder accumulation and automatic PWM generation.

Table 25: Counter Hardware Access Instructions

Mnemonic

Operand

Operation

GETPHSA

D

Store PHSA in D

GETPHZA

D

Store PHSA in D and zero PHSA.

GETCOSA

D

Store COSA in D

GETSINA

D

Store SINA in D

GETPHSB

D

Store PHSB in D

GETPHZB

D

Store PHSB in D and zero PHSB

GETCOSB

D

Store COSB in D

GETSINB

D

Store SINB in D

SETCTRA

D/#n

Set CTRA mode to D/n.

SETWAVA

D/#n

Set CTRA wave mode to D/n.

SETFRQA

D/#n

Set FRQA to D/n

SETPHSA

D/#n

Set PSHA to D/n.

ADDPHSA

D/#n

Add D/n to PSHA

SUBPHSA

D/#n

Subtract D/n from PSHA.

SYNCTRA

Wait for PHSA to overflow

CAPCTRA

Remove current sum from PHSA

SETCTRB

D/#n

Set CTRB mode to D/n

SETWAVB

D/#n

Set CTRB wave mode to D/n.

SETFRQB

D/#n

Set FRQB to D/n

SETPHSB

D/#n

Set PSHB to D/n

ADDPHSB

D/#n

Add D/n to PSHB

SUBPHSB

D/#n

Subtract D/n from PSHB

SYNCTRB

Wait for PHSB to overflow

CAPCTRB

Remove current sum from PHSB


Byte/Word Field Mover

<forum>

Each cog has a field mover that can move a byte or word from any field in S into any field in D. To use the field mover, you must first configure it using SETF. Then, you can use MOVF to perform the moves.

SETF uses a 9-bit value %W_DDdd_SSss to configure the field mover:

Table: Field mover configuration bits

W

word/byte

DD

D field mode

dd

D field pointer

SS

S field mode

ss

S field pointer

0

byte mode

%00

D field pointer stays same after MOVF

%00

byte 0 / word 0

%00

S field pointer stays same after MOVF

%00

byte 0 / word 0

1

word mode

%01

D field pointer stays same after MOVF,

D rotates left by byte/word

%01

byte 1 / word 0

%01

S field pointer stays same after MOVF

%01

byte 1 / word 0

%10

D field pointer increments after MOVF

%10

byte 2 / word 1

%10

S field pointer increments after MOVF

%10

byte 2 / word 1

%11

D field pointer deccrements after MOVF

%11

byte 3 / word 1

%11

S field pointer deccrements after MOVF

%11

byte 3 / word 1

On cog startup, SETF is initialized to %0_0100_0000, so that MOVF will rotate D left by 8 bits and then fill the bottom byte with the lower byte in S.

Table: Byte/Word Field Mover Instructions

Mnemonic

Operand

Operation

Clocks

SETF

D

Configure field mover with D

1

SETF

#n

Configure field mover with 0..511

1

MOVF

D,S

Move field from S into D

1

MOVF

D,#n

Move field from 0..511 into D

1


Hub Counter

The hub contains a 64-bit counter called CNT that increments on each clock cycle. Each cog can use CNT to mark time in various ways. On chip reset, the ROM Booter initializes CNT to $00000000_00000000. For the purpose of describing the cog instructions which relate to CNT, the lower long of CNT is alternately

called CNTL and the upper long, delayed by one clock cycle, is called CNTH. The one-clock delay of CNTH enables proper reading of the entire CNT value when two instructions must be used in sequence to access its bottom and top longs.

Table: Hub Counter Instructions

Mnemonic

Operand

Operation (iiiiii = #i-1, nnnnnnnnn/n___nnnn_nnnnnnnnn = #n-1)

Clocks

SUBCNT

D

Subtracts D from CNTL, then CNTH

Get CNTL minus D into D. If another SUBCNT is executed in the next clock cycle by the same task, it gets CNTH minus D minus carry from previous SUBCNT into D. In either case, the logical not of the MSB of the D result (not the carry) goes into C, indicating by C=1 if CNTL (or CNT) has exceeded the original D value(s).

1

CMPCNT

D

Compares D to CNTL, then CNTH

Same as SUBCNT, but doesn't store the D result(s). Useful for periodic checking if a time target has been reached yet.

1

PASSCNT

D

Loops until CNTL passes D

Jump to self if MSB of CNTL minus D is 1. In other words, loop until CNTL exceeds D. This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs.

1*

GETCNT

D

Get CNTL into D. If another GETCNT is executed in the next clock cycle by the same task, it gets CNTH into D.

1

WAITCNT

D,S

Wait for CNTL or CNT (WC), D += S

Wait for CNTL to be equal to D. Adds S/#n into D.

?

WAITPEQ

D,S WC

Wait for (pins & S) = D with timeout

Like WAITPEQ without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPEQ condition was met, or 1 if the timeout occurred first.

?

WAITPNE

D,S WC

Wait for (pins & S) = D with timeout

Like WAITPNE without WC, except the last-written D value becomes a CNTL timeout target, with C returning 0 if the WAITPNE condition was met, or 1 if the timeout occurred first.

?

* 1 clock if task uses no more than every 4th time slot (4 clocks in single-task)

Example: Hub Counter

        'Measure time using lower 32 bits of CNT

        GETCNT  ticks           'get CNTL into ticks

        <somecode>              'execute some code

        SUBCNT  ticks           'get CNTL minus ticks into ticks, <somecode> took ticks-1 to execute

        'Measure time using full 64 bits of CNT (single task)

        GETCNT  ticks_low       'get CNT into {ticks_high, ticks_low}

        GETCNT  ticks_high

        <somecode>              'execute some code

        SUBCNT  ticks_low       'get CNT minus {ticks_high, ticks_low} into {ticks_high, ticks_low}

        SUBCNT  ticks_high

        'Do something for some time

        GETCNT  ticks           'get CNTL

        ADD     ticks,#500      'add 500

loop    <somecode>              'execute some code

        CMPCNT  ticks       WC  'check if 500 clocks have elapsed yet

if_nc   JMP     #loop           'if not, loop

        'Do something every Nth clock (multi-task)

        GETCNT  ticks           'get CNTL

loop    ADD     ticks,#500      'add 500

        PASSCNT ticks           'wait for next 500th clock

        <somecode>              'execute some code

        jmp     #loop           'loop

        'Do something every Nth clock (single-task)

        GETCNT  ticks           'get CNTL

        ADD     ticks,#500      'add initial 500

loop    WAITCNT ticks,#500      'wait for next 500th clock, add next 500

        <somecode>              'execute some code

        jmp     #loop           'loop

        'Wait for pins to equal a value, with time-out

        GETCNT  ticks           'get CNTL

        ADD     ticks,#200      'allow 200 clock cycles for WAITPEQ

        WAITPEQ value,mask  WC  'wait for (pins & mask) = value

if_c    JMP     #timeout        'if C=1 then timeout


Table: Instruction List

instruction                                                    mnem             operand
-------------------------------------------------------------------------------------------------
000000 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRBYTE        D,S/PTR                (waits for hub)
000000 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDBYTE        D,S/PTR                (waits for hub)
000000 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDBYTEC        D,S/PTR                (waits for hub if cache miss)

000001 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRWORD        D,S/PTR                (waits for hub)
000001 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDWORD        D,S/PTR                (waits for hub)
000001 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDWORDC        D,S/PTR                (waits for hub if cache miss)

000010 ZC0 I CCCC DDDDDDDDD SUPIIIIII                WRLONG        D,S/PTR                (waits for hub)
000010 Z01 I CCCC DDDDDDDDD SUPIIIIII                RDLONG        D,S/PTR                (waits for hub)
000010 Z11 I CCCC DDDDDDDDD SUPIIIIII                RDLONGC        D,S/PTR                (waits for hub if cache miss)

000011 ZCR 0 CCCC DDDDDDDDD SSSSSSSSS                COGINIT        D,S                (waits for hub)


000011 ZCR 1 CCCC DDDDDDDDD 000000000                CLKSET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000001                COGID        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000010              ( COGINIT        D )                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000011                COGSTOP        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000100                LOCKNEW        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000101                LOCKRET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000110                LOCKSET        D                (waits for hub)
000011 ZCR 1 CCCC DDDDDDDDD 000000111                LOCKCLR        D                (waits for hub)

000011 ZCR 1 CCCC 000000000 000001000                CACHEX
000011 ZCR 1 CCCC 000000001 000001000                CLRACCA
000011 ZCR 1 CCCC 000000010 000001000                CLRACCB
000011 ZCR 1 CCCC 000000011 000001000                CLRACCS
000011 ZCR 1 CCCC 000000101 000001000                FITACCA                        (waits for mac)
000011 ZCR 1 CCCC 000000110 000001000                FITACCB                        (waits for mac)
000011 ZCR 1 CCCC 000000111 000001000                FITACCS                        (waits for mac)

000011 ZC0 1 CCCC DDDDDDDDD 000001001                SNDSER        D                (waits for tx if !wc)
000011 ZC1 1 CCCC DDDDDDDDD 000001001                RCVSER        D                (waits for rx if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000001010                PUSHZC        D
000011 ZCR 1 CCCC DDDDDDDDD 000001011                POPZC        D

000011 ZCR 1 CCCC DDDDDDDDD 000001100                SUBCNT        D                (subtracts D from cnt[31:0], then cntl if same thread)
000011 ZC0 1 CCCC DDDDDDDDD 000001101                PASSCNT        D                (loops if (cnt[31:0] - D) msb set)
000011 ZC1 1 CCCC DDDDDDDDD 000001101                GETCNT        D                (gets cnt[31:0], then cntl if same thread)
000011 ZCR 1 CCCC DDDDDDDDD 000001110                GETACCA        D                (gets acca[31:0], then acca[63:32], waits for mac)
000011 ZCR 1 CCCC DDDDDDDDD 000001111                GETACCB        D                (gets accb[31:0], then accb[63:32], waits for mac)

000011 ZCR 1 CCCC DDDDDDDDD 000010000                GETLFSR        D
000011 ZCR 1 CCCC DDDDDDDDD 000010001                GETTOPS        D                (GETTOPS wc,nr = POLVID wc)
000011 ZCR 1 CCCC DDDDDDDDD 000010010                GETPTRA        D
000011 ZCR 1 CCCC DDDDDDDDD 000010011                GETPTRB        D

000011 ZCR 1 CCCC DDDDDDDDD 000010100                GETPIX        D                (waits two clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000010101                GETSPD        D
000011 ZCR 1 CCCC DDDDDDDDD 000010110                GETSPA        D
000011 ZCR 1 CCCC DDDDDDDDD 000010111                GETSPB        D

000011 ZCR 1 CCCC DDDDDDDDD 000011000                POPAR        D
000011 ZCR 1 CCCC DDDDDDDDD 000011001                POPBR        D
000011 ZCR 1 CCCC DDDDDDDDD 000011010                POPA        D
000011 ZCR 1 CCCC DDDDDDDDD 000011011                POPB        D
000011 ZCR 1 CCCC 000000000 000011100                RETA
000011 ZCR 1 CCCC 000000000 000011101                RETB
000011 ZCR 1 CCCC 000000000 000011110                RETAD
000011 ZCR 1 CCCC 000000000 000011111                RETBD

000011 ZCR 1 CCCC DDDDDDDDD 000100000                DECOD2        D
000011 ZCR 1 CCCC DDDDDDDDD 000100001                DECOD3        D
000011 ZCR 1 CCCC DDDDDDDDD 000100010                DECOD4        D
000011 ZCR 1 CCCC DDDDDDDDD 000100011                DECOD5        D
000011 ZCR 1 CCCC DDDDDDDDD 000100100                BLMASK        D
000011 ZCR 1 CCCC DDDDDDDDD 000100101                NOT        D
000011 ZCR 1 CCCC DDDDDDDDD 000100110                ONECNT        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000100111                ZERCNT        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101000                INCPAT        D                (waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101001                DECPAT        D                (waits three clocks)
000011 ZCR 1 CCCC DDDDDDDDD 000101010                BINGRY        D
000011 ZCR 1 CCCC DDDDDDDDD 000101011                GRYBIN        D                (waits one clock)
000011 ZCR 1 CCCC DDDDDDDDD 000101100                MERGEW        D
000011 ZCR 1 CCCC DDDDDDDDD 000101101                SPLITW        D
000011 ZCR 1 CCCC DDDDDDDDD 000101110                SEUSSF        D
000011 ZCR 1 CCCC DDDDDDDDD 000101111                SEUSSR        D

000011 ZCR 1 CCCC DDDDDDDDD 000110000                GETMULL        D                (waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110001                GETMULH        D                (waits for mul if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110010                GETDIVQ        D                (waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110011                GETDIVR        D                (waits for div if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110100                GETSQRT        D                (waits for sqrt if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110101                GETQX        D                (waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110110                GETQY        D                (waits for cordic if !wc)
000011 ZCR 1 CCCC DDDDDDDDD 000110111                GETQZ        D                (waits for cordic if !wc)

000011 ZCR 1 CCCC DDDDDDDDD 000111000                GETPHSA        D                (GETPHSA wc,nr = POLCTRA wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111001                GETPHZA        D                (clears phsa)
000011 ZCR 1 CCCC DDDDDDDDD 000111010                GETCOSA        D
000011 ZCR 1 CCCC DDDDDDDDD 000111011                GETSINA        D

000011 ZCR 1 CCCC DDDDDDDDD 000111100                GETPHSB        D                (GETPHSB wc,nr = POLCTRB wc)
000011 ZCR 1 CCCC DDDDDDDDD 000111101                GETPHZB        D                (clears phsb)
000011 ZCR 1 CCCC DDDDDDDDD 000111110                GETCOSB        D
000011 ZCR 1 CCCC DDDDDDDDD 000111111                GETSINB        D

000011 Z00 1 CCCC 111111111 001iiiiii                REPD        #i                (infinite repeat)
000011 Z0N 1 CCCC nnnnnnnnn 001iiiiii                REPD        D/#n,#i
000011 n11 1 nnnn nnnnnnnnn 001iiiiii                REPS        #n,#i

000011 ZCN 1 CCCC nnnnnnnnn 01000----                <empty>

000011 ZCN 1 CCCC nnnnnnnnn 01001tttt                JMPTASK        D/#n,#t

000011 ZCN 1 CCCC nnnnnnnnn 010100000                NOPX        D/#n                (waits)
000011 ZCN 1 CCCC nnnnnnnnn 010100001                SETZC        D/#n                (d[1:0] into z/c via wz/wc)
000011 ZCN 1 CCCC Dnnnnnnnn 010100010                SETSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100011                SETSPB        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100100                ADDSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100101                ADDSPB        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100110                SUBSPA        D/#n
000011 ZCN 1 CCCC Dnnnnnnnn 010100111                SUBSPB        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010101000                PUSHAR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101001                PUSHBR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101010                PUSHA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101011                PUSHB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101100                CALLA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101101                CALLB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101110                CALLAD        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010101111                CALLBD        D/#n

000011 ZCN 1 CCCC SUPIIIIII 010110000                WRQUAD        D/PTR                (waits for hub)
000011 Z0N 1 CCCC SUPIIIIII 010110001                RDQUAD        D/PTR                (waits for hub)
000011 Z1N 1 CCCC SUPIIIIII 010110001                RDQUADC        D/PTR                (waits for hub if cache miss)
000011 ZCN 1 CCCC nnnnnnnnn 010110010                SETPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110011                SETPTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110100                ADDPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110101                ADDPTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110110                SUBPTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010110111                SUBPTRB        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 010111000                SETPIX        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111001                SETPIXU        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111010                SETPIXV        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111011                SETPIXZ        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111100                SETPIXA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111101                SETPIXR        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111110                SETPIXG        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 010111111                SETPIXB        D/#n

000011 Z0N 1 CCCC nnnnnnnnn 011000000                SETMULU        D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011000000                SETMULA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000001                SETMULB        D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011000010                SETDIVU        D/#n                (loads [31:0], then [63:32])
000011 Z1N 1 CCCC nnnnnnnnn 011000010                SETDIVA        D/#n                (loads [31:0], then [63:32])
000011 ZCN 1 CCCC nnnnnnnnn 011000011                SETDIVB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000100                SETSQRH        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000101                SETSQRL        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000110                SETQI        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011000111                SETQZ        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011001000                QLOG        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001001                QEXP        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001010                SETF        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011001011                SETTASK        D/#n

000011 ZCN 1 CCCC DDDDDDDnn 011001100                CFGDAC0        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001101                CFGDAC1        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001110                CFGDAC2        D/#n
000011 ZCN 1 CCCC DDDDDDDnn 011001111                CFGDAC3        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011010000                SETDAC0        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010001                SETDAC1        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010010                SETDAC2        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010011                SETDAC3        D/#n

000011 ZCN 1 CCCC Dnnnnnnnn 011010100                CFGDACS        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011010101                SETDACS        D/#n

000011 ZCN 1 CCCC DDnnnnnnn 011010110                GETP        D/#n                (pin into !z/c via wz/wc)
000011 ZCN 1 CCCC DDnnnnnnn 011010111                GETNP        D/#n                (pin into z/!c via wz/wc)

000011 ZCN 1 CCCC DDnnnnnnn 011011000                OFFP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011001                NOTP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011010                CLRP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011011                SETP        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011100                SETPC        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011101                SETPNC        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011110                SETPZ        D/#n
000011 ZCN 1 CCCC DDnnnnnnn 011011111                SETPNZ        D/#n

000011 ZCN 1 CCCC DDDDDnnnn 011100000                SETCOG        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011100001                SETMAP        D/#n
000011 Z0N 1 CCCC nnnnnnnnn 011100010                SETQUAD        D/#n
000011 Z1N 1 CCCC nnnnnnnnn 011100010                SETQUAZ        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100011                SETPORT        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100100                SETPORA        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100101                SETPORB        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100110                SETPORC        D/#n
000011 ZCN 1 CCCC DDnnDDDDD 011100111                SETPORD        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011101000                SETXCH        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101001                SETXFR        D/#n
000011 ZCN 1 CCCC DDDDDDDDD 011101010                SETSER        D/#n
000011 ZCN 1 CCCC DDDnnnnnn 011101011                SETSKIP        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101100                SETVID        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101101                SETVIDY        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101110                SETVIDI        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011101111                SETVIDQ        D/#n

000011 ZCN 1 CCCC nnnnnnnnn 011110000                SETCTRA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110001                SETWAVA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110010                SETFRQA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110011                SETPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110100                ADDPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110101                SUBPHSA        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011110110                SYNCTRA                        (waits for ctra)
000011 ZCN 1 CCCC nnnnnnnnn 011110111                CAPCTRA

000011 ZCN 1 CCCC nnnnnnnnn 011111000                SETCTRB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111001                SETWAVB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111010                SETFRQB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111011                SETPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111100                ADDPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111101                SUBPHSB        D/#n
000011 ZCN 1 CCCC nnnnnnnnn 011111110                SYNCTRB                        (waits for ctrb)
000011 ZCN 1 CCCC nnnnnnnnn 011111111                CAPCTRB

000011 ZCR 1 CCCC DDDDDDDDD 1000bbbbb                ISOB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1001bbbbb                NOTB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1010bbbbb                CLRB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1011bbbbb                SETB        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1100bbbbb                SETBC        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1101bbbbb                SETBNC        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1110bbbbb                SETBZ        D,b
000011 ZCR 1 CCCC DDDDDDDDD 1111bbbbb                SETBNZ        D,b

000100 000 I CCCC DDDDDDDDD SSSSSSSSS                SETACCA        D,S
000100 010 I CCCC DDDDDDDDD SSSSSSSSS                SETACCB        D,S
000100 100 I CCCC DDDDDDDDD SSSSSSSSS                MACA        D,S
000100 110 I CCCC DDDDDDDDD SSSSSSSSS                MACB        D,S
000100 ZC1 I CCCC DDDDDDDDD SSSSSSSSS                MUL        D,S                (waits one clock)

000101 000 I CCCC DDDDDDDDD SSSSSSSSS                MOVF        D,S
000101 010 I CCCC DDDDDDDDD SSSSSSSSS                QSINCOS        D,S
000101 100 I CCCC DDDDDDDDD SSSSSSSSS                QARCTAN        D,S
000101 110 I CCCC DDDDDDDDD SSSSSSSSS                QROTATE        D,S
000101 ZC1 I CCCC DDDDDDDDD SSSSSSSSS                SCL        D,S                (waits one clock)

000110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ENC        D,S
000111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                JMPRET        D,S

001000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ROR        D,S
001001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ROL        D,S
001010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SHR        D,S
001011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SHL        D,S
001100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                RCR        D,S
001101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                RCL        D,S
001110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SAR        D,S
001111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                REV        D,S

010000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MINS        D,S
010001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MAXS        D,S
010010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MIN        D,S
010011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MAX        D,S
010100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVS        D,S
010101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVD        D,S
010110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOVI        D,S
010111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                JMPRETD        D,S

011000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                AND        D,S
011001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ANDN        D,S
011010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                OR        D,S
011011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                XOR        D,S
011100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXC        D,S
011101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXNC        D,S
011110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXZ        D,S
011111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MUXNZ        D,S

100000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADD        D,S
100001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUB        D,S
100010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDABS        D,S
100011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBABS        D,S
100100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMC        D,S
100101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMNC        D,S
100110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMZ        D,S
100111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUMNZ        D,S

101000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                MOV        D,S
101001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEG        D,S
101010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ABS        D,S
101011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ABSNEG        D,S
101100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGC        D,S
101101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGNC        D,S
101110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGZ        D,S
101111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                NEGNZ        D,S

110000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPS        D,S
110001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPSX        D,S
110010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDX        D,S
110011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBX        D,S
110100 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDS        D,S
110101 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBS        D,S
110110 ZCR I CCCC DDDDDDDDD SSSSSSSSS                ADDSX        D,S
110111 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBSX        D,S

111000 ZCR I CCCC DDDDDDDDD SSSSSSSSS                SUBR        D,S
111001 ZCR I CCCC DDDDDDDDD SSSSSSSSS                CMPSUB        D,S
111010 ZCR I CCCC DDDDDDDDD SSSSSSSSS                INCMOD        D,S
111011 ZCR I CCCC DDDDDDDDD SSSSSSSSS                DECMOD        D,S

111000 000 I BBAA DDDDDDDDD SSSSSSSSS                SETINDx        D,S                (SETINDA S   / SETINDB D   / SETINDS D,S)
111001 000 I 0B0A DDDDDDDDD SSSSSSSSS                FIXINDx        D,S                (FIXINDA D,S / FIXINDB D,S / FIXINDS D,S)
111010 000 I CCCC DDDDDDDDD SSSSSSSSS                CFGPINS        D,S                (waits for alt)
111011 000 I CCCC DDDDDDDDD SSSSSSSSS                WAITVID        D,S                (waits for vid)

111100 00R I CCCC DDDDDDDDD SSSSSSSSS                IJZ        D,S
111100 01R I CCCC DDDDDDDDD SSSSSSSSS                IJZD        D,S
111100 10R I CCCC DDDDDDDDD SSSSSSSSS                IJNZ        D,S
111100 11R I CCCC DDDDDDDDD SSSSSSSSS                IJNZD        D,S

111101 00R I CCCC DDDDDDDDD SSSSSSSSS                DJZ        D,S
111101 01R I CCCC DDDDDDDDD SSSSSSSSS                DJZD        D,S
111101 10R I CCCC DDDDDDDDD SSSSSSSSS                DJNZ        D,S
111101 11R I CCCC DDDDDDDDD SSSSSSSSS                DJNZD        D,S

111110 000 I CCCC DDDDDDDDD SSSSSSSSS                TJZ        D,S
111110 010 I CCCC DDDDDDDDD SSSSSSSSS                TJZD        D,S
111110 100 I CCCC DDDDDDDDD SSSSSSSSS                TJNZ        D,S
111110 110 I CCCC DDDDDDDDD SSSSSSSSS                TJNZD        D,S

111110 001 I CCCC DDDDDDDDD SSSSSSSSS                JP        D,S
111110 011 I CCCC DDDDDDDDD SSSSSSSSS                JPD        D,S
111110 101 I CCCC DDDDDDDDD SSSSSSSSS                JNP        D,S
111110 111 I CCCC DDDDDDDDD SSSSSSSSS                JNPD        D,S

111111 0CR I CCCC DDDDDDDDD SSSSSSSSS                WAITCNT        D,S                (waits for cnt32, +cnt64 if wc)
111111 1C0 I CCCC DDDDDDDDD SSSSSSSSS                WAITPEQ        D,S                (waits for pins, +cnt32 if wc)
111111 1C1 I CCCC DDDDDDDDD SSSSSSSSS                WAITPNE        D,S                (waits for pins, +cnt32 if wc)
-------------------------------------------------------------------------------------------------


ZCR                effects
-------------------------------------------------------------------------------------------------
000                nz, nc, nr
001                nz, nc, wr
010                nz, wc, nr
011                nz, wc, wr
100                wz, nc, nr
101                wz, nc, wr
110                wz, wc, nr
111                wz, wc, wr


CCCC        condition                (easier-to-read list)
-------------------------------------------------------------------------------------------------
0000        never                        1111        always                        (default)
0001        nc  &  nz                1100        if_c                                                if_b
0010        nc  &  z                0011        if_nc                                                if_ae
0011        nc                        1010        if_z                                                if_e
0100         c  &  nz                0101        if_nz                                                if_ne
0101        nz                        1000        if_c_and_z                if_z_and_c
0110         c  <> z                0100        if_c_and_nz                if_nz_and_c
0111        nc  |  nz                0010        if_nc_and_z                if_z_and_nc
1000         c  &  z                0001        if_nc_and_nz                if_nz_and_nc                if_a
1001         c  =  z                1110        if_c_or_z                if_z_or_c                if_be
1010         z                        1101        if_c_or_nz                if_nz_or_c
1011        nc  |  z                1011        if_nc_or_z                if_z_or_nc
1100         c                        0111        if_nc_or_nz                if_nz_or_nc
1101         c  |  nz                1001        if_c_eq_z                if_z_eq_c
1110         c  |  z                0110        if_c_ne_z                if_z_ne_c
1111        always                        0000        never

CCCC        inda/indb - CCCC=1111 after first stage of pipeline if inda/indb used (indx=inda/indb)
-------------------------------------------------------------------------------------------------
xx00        source indx
xx01        source indx++
xx10        source indx--
xx11        source ++indx

00xx        destination indx
01xx        destination indx++
10xx        destination indx--
11xx        destination ++indx


I        SSSSSSSSS        source operand
-------------------------------------------------------------------------------------------------
0        SSSSSSSSS        register
1        #SSSSSSSSS        immediate, zero-extended


        DDDDDDDDD        destination operand
-------------------------------------------------------------------------------------------------
        DDDDDDDDD        register


Effects and Condition Codes

Every assembly instruction can conditionally update the Z and/or C flag with WC and WZ effects. Additionally, the result can conditionally be written using the NR and WR flags. In addition, instructions can be conditionally executed given the Z and/or C flag—see P8X32A.



Links

 Assembler Reference Section

Assembler Instruction Summary Chart

Appendix A. Original Documentation Sources

Topic

URL

Hub Memory Instructions

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196

http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1148999#post1148999

Hub Control Instructions

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196

http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1148785#post1148785

COG RAM Indirection

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196

COG Stack RAM

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196

Multitasking

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1146196#post1146196

Pipeline

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1148452#post1148452

DECODx Instructions

http://forums.parallax.com/showthread.php?144199-Propeller-II-Emulation-of-the-P2-on-DE0-NANO-amp-DE2-115-FPGA-boards&p=1148852#post1148852

QUAD Instructions

http://forums.parallax.com/showthread.php?144432-The-unofficial-P2-documentation-project&p=1149359#post1149359


Appendix Z. Style Guide and Templates

NOTE: This section is intended only for use by the editors of this document.


DOCUMENT TASK LIST

TASK

Documenters

Notes

Status

Assembly Language Reference

Seairth

Hardware associated doc

Peter Jakacki

P2 document updates from Chip

Scavenging useful notes and examples

Assembler Language summary

Cluso99

Similar to P1's summary

TO DO

TASK

Notes

Status