1 of 21

Vector Floating Point Parallel Operation Co-processor

Group 5

Rebecca Chow, Erik VanderWerf, Elliot Edmunds, Xiang Li

1

2 of 21

Vector Operation Floating Point Co-Processor

  • Co-processor is an offload for the processor
    • Will allow processor
  • Receives command from processor
    • Retrieve a vector from RAM and store it in internal register file
    • Operate on internal register file
    • Store a vector from internal register file to RAM
  • Main components/concepts:
    • AHB interface
    • Internal register file
    • Parallel floating pt. operations
      • Each element in an vector of length 1-8 will be operated on in parallel

2

3 of 21

System Usage Diagram

3

  • Co-processor uses AMBA AHB protocol to communicate with the system. (AMBA AHB supports multiple masters v.s AHB Lite only supports 1 master) �
  • CPU will write commands into the slave component.�
  • When needed, our unit will request master access to fetch from/write to memory from RAM

4 of 21

Architecture Diagram

4

  • Data coming from processor is sent to different controllers��
  • Computation Units contain registers and ALUs��
  • IO and CU controllers can operate concurrently�

5 of 21

Loading Instructions and Distributing Commands

5

6 of 21

Processing Commands and Storing Results

6

7 of 21

Success Criteria Status

7

Criteria

Status

Demonstrate by simulation of Verilog test benches that the complete design is able to complete vector operation(s) and store the computed result in RAM.

Complete

Demonstrate by simulation of Verilog test benches that the complete design is able to add, subtract in parallel

Complete

Demonstrate by simulation of Verilog test benches that the complete design is able to multiply in parallel

Complete

Demonstrate by simulation of Verilog test benches that the complete design is able to load vectors from RAM using the AHB interface

Complete

Demonstrate by simulation of Verilog test benches that the complete design is able to write vectors to RAM using the AHB interface

Complete

8 of 21

Distributing Commands

8

LDD

ADD

SUB

Stored command

9 of 21

Loading values from RAM

Loading values 1.00, 1.04, 1.08, 1.12, 1.16, 1.2, -12.345, and 3.14 into the co-processor with starting address 100.

9

Load into reg0 a vector of size 8

Values received from RAM

Command from processor received

10 of 21

Operating on results

ADD 0,1 -> 2

SUB 0,1 -> 3

MUL 0,1 -> 4

10

ADD 0, 1 -> 2

SUB 0, 1 -> 3

MUL 0, 1 -> 4

11 of 21

Writing values to RAM

11

Store val from reg2 of size 8

Storing results from addition to RAM with starting address 400.

12 of 21

Layout

12

IC Layout

Budget area: 15 mm^2

Synthesis area: 23 mm^2

Budget timing: 8.80 ns

Synthesis timing: 9.28 ns

cu_add block delay

Clock rate:

Goal: 100 Mhz

Synthesis: 333 Mhz

13 of 21

Conclusions

  • Biggest challenges in our design process:
    • Distributor logic to not allow conflicting commands
    • Conflicting clock cycles between blocks
    • Header file issues
    • AHB protocol
    • Debugging CU - IEEE 754 format is difficult to debug
  • Improvements to our design given more time:
    • Instead of giving multi-clock cycle waits for our computations
      • Use pipelining with multiple registers to improve speed
    • FIFO is currently implemented as a shift register
      • We should implement as a circular buffer to reduce power consumption

13

14 of 21

Questions?

14

15 of 21

15

DECODER

16 of 21

16

CU

17 of 21

17

AHB Slave

18 of 21

Distributor

18

19 of 21

IO Controller

19

20 of 21

CU Controller

20

21 of 21

FIFO

21