1 of 11

Solving the�“Grecian Computer Puzzle”�using IBM z/Architecture Vector Instructions

1

2 of 11

Rotor Values

Rotor 1 (top)

Rotor 2

Rotor 4

Rotor 5 (base)

Rotor 3

8

3

6

15

7

10

6

11

9

11

11

14

6

7

15

17

12

7

4

3

6

5

13

9

7

13

21

17

6

15

4

9

18

10

8

7

8

12

21

9

5

9

1

14

16

4

26

11

22

10

8

3

17

14

11

14

13

12

9

6

7

3

6

2

12

9

12

6

9

7

9

20

3

26

1

2

16

9

12

3

19

10

8

14

14

11

14

11

14

11

10

11

12

13

14

15

4

14

14

21

21

9

9

4

10

7

16

8

7

8

8

11

11

14

11

14

7

8

9

5

6

6

6

3

3

4

12

2

5

3

4

2

3 of 11

Initial Rotor VR Specification

Initial ds 0L

* Rotor 1 (top, 1 track, with holes)

dc al1(03,00,06,00,10,00,07,00,15,00,08,00,00,00,00,00)

* Rotor 2 (2 tracks, with holes)

dc al1(03,00,06,00,11,11,06,11,00,06,17,07,00,00,00,00)

dc al1(00,07,15,00,00,14,00,09,00,12,00,04,00,00,00,00)

* Rotor 3 (3 tracks, with holes)

dc al1(04,05,00,07,08,09,13,09,07,13,21,17,00,00,00,00)

dc al1(26,14,01,12,00,21,06,15,04,09,18,11,00,00,00,00)

dc al1(00,16,00,09,00,05,00,10,00,08,00,22,00,00,00,00)

* Rotor 4 (4 tracks, with holes)

dc al1(02,07,00,09,00,07,14,11,00,08,00,16,00,00,00,00)

dc al1(00,09,20,12,03,06,00,14,12,03,08,09,00,00,00,00)

dc al1(12,03,26,06,00,02,13,09,00,17,19,03,00,00,00,00)

dc al1(00,01,00,09,00,12,00,06,00,10,00,10,00,00,00,00)

* Rotor 5 (base, 4 tracks, no holes)

dc al1(11,14,11,14,14,11,14,11,14,11,11,14,00,00,00,00)

dc al1(07,08,09,10,11,12,13,14,15,04,05,06,00,00,00,00)

dc al1(06,03,03,14,14,21,21,09,09,04,04,06,00,00,00,00)

dc al1(12,02,05,10,07,16,08,07,08,08,03,04,00,00,00,00)

Hole designated by zero.

Rightmost 4 elements ignored.

3

4 of 11

Some Handy Equates for VRs

* Equates for vector registers

vzeroes equ 0 Register of zeroes.

vtemp equ 1 Work register

vanswer equ 2 Answer register

vmask equ 3 VPERM 4th operand.

a1 equ 10 Addend for track 1.

a2 equ 11 Addend for track 2.

a3 equ 12 Addend for track 3.

a4 equ 13 Addend for track 4.

r1t1 equ 16 Rotor 1, track 1.

r2t1 equ 17 Rotor 2, track 1.

r2t2 equ 18 Rotor 2, track 2.

r3t1 equ 19 Rotor 3, track 1.

r3t2 equ 20 Rotor 3, track 2.

r3t3 equ 21 Rotor 3, track 3.

r4t1 equ 22 Rotor 4, track 1.

r4t2 equ 23 Rotor 4, track 2.

r4t3 equ 24 Rotor 4, track 3.

r4t4 equ 25 Rotor 4, track 4.

r5t1 equ 26 Base, track 1.

r5t2 equ 27 Base, track 2.

r5t3 equ 28 Base, track 3.

r5t4 equ 29 Base, track 4.

4

5 of 11

Loading the Initial State

vzero zeroes Load up some handy zeroes.

vl vanswer,forty_two,4 Load up correct answers.

vl vmask,mask,4 Load array to rotate bytes.

vlm r1t1,r5t4,initial,4 Load up tracks into VRs.

ds 0L Align to quadword.

zeroes dc al1(00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00)

forty_two dc al1(42,42,42,42,42,42,42,42,42,42,42,42,00,00,00,00)

mask dc al1(01,02,03,04,05,06,07,08,09,10,11,00,12,13,14,15)

loops dc f’0’

5

6 of 11

Setting Up Loops to Rotate the Rotors

* Load the counters for the individual rotation loops.

lhi 4,12

rotor_loop4 ds 0h

lhi 3,12

rotor_loop3 ds 0h

lhi 2,12

rotor_loop2 ds 0h

lhi 1,12

rotor_loop1 ds 0h

asi loops,1 Increment loop counter.

6

7 of 11

Extracting the Addend for Track 1

* Extract addend for track 1 (a1)

vchb vtemp,r1t1,vzeroes

vsel a1,r1t1,r2t1,vtemp

vchb vtemp,a1,vzeroes

vsel a1,a1,r3t1,vtemp

vchb vtemp,a1,vzeroes

vsel a1,a1,r4t1,vtemp

vchb vtemp,a1,vzeroes

vsel a1,a1,r5t1,vtemp

Compare rotor 1 track 1 to zeroes.

If a byte is greater than zero, the corresponding byte of vtemp is set to X’FF’; otherwise, the corresponding byte of vtemp is set to zeroes.

Use the bytes of vtemp as selector to initially set addend1 (a1). Pick bytes from rotor 1 track 1 if nonzero, or pick bytes from rotor 2 track 1 if zero.

Repeat the process of inspecting the addend for remaining bytes of zeroes. Where zeros are found, VSEL will pick from the track at the next lower rotor.

7

8 of 11

Extracting the Addends Tracks 2-4

* Extract addend for track 2 (a2).

vchb vtemp,r2t2,vzeroes

vsel a2,r2t2,r3t2,vtemp

vchb vtemp,a2,vzeroes

vsel a2,a2,r4t2,vtemp

vchb vtemp,a2,vzeroes

vsel a2,a2,r5t2,vtemp

* Extract addend for track 3 (a3).

vchb vtemp,r3t3,vzeroes

vsel a3,r3t3,r4t3,vtemp

vchb vtemp,a3,vzeroes

vsel a3,a3,r5t3,vtemp

* Extract addend for track 4 (a4).

vchb vtemp,r4t4,vzeroes

vsel a4,r4t4,r5t4,vtemp

8

9 of 11

Found the Answer to�Life, the Universe, and Everything?

* Elements all add up to 42?

vab vtemp,a1,a2

vab vtemp,vtemp,a3

vab vtemp,vtemp,a4

vceqbs vtemp,vtemp,vanswer

je gotta_winner

9

10 of 11

Rotating the Rotors

* Rotate rotor 1.

rotate ds 0h

vperm r1t1,r1t1,vzeroes,vmask

jct 1,rotor_loop1

* Rotate rotor 2.

vperm r2t1,r2t1,vzeroes,vmask

vperm r2t2,r2t2,vzeroes,vmask

jct 2,rotor_loop2

* Rotate rotor 3.

vperm r3t1,r3t1,vzeroes,vmask

vperm r3t2,r3t2,vzeroes,vmask

vperm r3t3,r3t3,vzeroes,vmask

jct 3,rotor_loop3

* Rotate rotor 4.

vperm r4t1,r4t1,vzeroes,vmask

vperm r4t2,r4t2,vzeroes,vmask

vperm r4t3,r4t3,vzeroes,vmask

vperm r4t4,r4t4,vzeroes,vmask

jct 4,rotor_loop4

* All combinations tried; we’re done!

10

11 of 11

Lessons Learned

  • SIMD processing:
    • Parallel extraction of sectors on the same radii of a rotor to form addends
    • Parallel addition of the addends for the 12 sectors
    • Parallel comparison for the solution
  • Branchless computing
    • Certain instructions provide true/false indications for each element in the VR compared or searched.
    • VECTOR SELECT (VSEL) can be fed these true/false indications to conditionally load elements … without branching!
  • VECTOR PERMUTE (VPERM)
    • Powerful translation mechanism
    • Used in this problem to form a 12-byte rotator
    • Potentially useful in crypto algorithms (e.g., AES S-box processing).

11