1 of 21

Vector Floating Point Parallel Operation Co-processor

Group 5

Rebecca Chow, Erik VanderWerf, Elliot Edmunds, Xiang Li

2 of 21

Vector Operation Floating Point Co-Processor

Co-processor is an offload for the processor

Will allow processor

Receives command from processor

Retrieve a vector from RAM and store it in internal register file
Operate on internal register file
Store a vector from internal register file to RAM

Main components/concepts:

AHB interface
Internal register file
Parallel floating pt. operations

Each element in an vector of length 1-8 will be operated on in parallel

3 of 21

System Usage Diagram

Co-processor uses AMBA AHB protocol to communicate with the system. (AMBA AHB supports multiple masters v.s AHB Lite only supports 1 master) �
CPU will write commands into the slave component.�
When needed, our unit will request master access to fetch from/write to memory from RAM

4 of 21

Architecture Diagram

Data coming from processor is sent to different controllers��
Computation Units contain registers and ALUs��
IO and CU controllers can operate concurrently�

5 of 21

Loading Instructions and Distributing Commands

6 of 21

Processing Commands and Storing Results

7 of 21

Success Criteria Status

Criteria	Status
Demonstrate by simulation of Verilog test benches that the complete design is able to complete vector operation(s) and store the computed result in RAM.	Complete
Demonstrate by simulation of Verilog test benches that the complete design is able to add, subtract in parallel	Complete
Demonstrate by simulation of Verilog test benches that the complete design is able to multiply in parallel	Complete
Demonstrate by simulation of Verilog test benches that the complete design is able to load vectors from RAM using the AHB interface	Complete
Demonstrate by simulation of Verilog test benches that the complete design is able to write vectors to RAM using the AHB interface	Complete

8 of 21

Distributing Commands

LDD

ADD

SUB

Stored command

9 of 21

Loading values from RAM

Loading values 1.00, 1.04, 1.08, 1.12, 1.16, 1.2, -12.345, and 3.14 into the co-processor with starting address 100.

Load into reg0 a vector of size 8

Values received from RAM

Command from processor received

10 of 21

Operating on results

ADD 0,1 -> 2

SUB 0,1 -> 3

MUL 0,1 -> 4

ADD 0, 1 -> 2

SUB 0, 1 -> 3

MUL 0, 1 -> 4

11 of 21

Writing values to RAM

Store val from reg2 of size 8

Storing results from addition to RAM with starting address 400.

12 of 21

Layout

IC Layout

Budget area: 15 mm^2

Synthesis area: 23 mm^2

Budget timing: 8.80 ns

Synthesis timing: 9.28 ns

cu_add block delay

Clock rate:

Goal: 100 Mhz

Synthesis: 333 Mhz

13 of 21

Conclusions

Biggest challenges in our design process:

Distributor logic to not allow conflicting commands
Conflicting clock cycles between blocks
Header file issues
AHB protocol
Debugging CU - IEEE 754 format is difficult to debug

Improvements to our design given more time:

Instead of giving multi-clock cycle waits for our computations

Use pipelining with multiple registers to improve speed

FIFO is currently implemented as a shift register

We should implement as a circular buffer to reduce power consumption

1 of 21

2 of 21

3 of 21

4 of 21

5 of 21

6 of 21

7 of 21

8 of 21

9 of 21

10 of 21

11 of 21

12 of 21

13 of 21

14 of 21

15 of 21

16 of 21

17 of 21

18 of 21

19 of 21

20 of 21

21 of 21