1 of 35

Measuring and Evaluating Performance

Chapter 1

Section: 1.6

Prof. Iyad Jafar

https://sites.google.com/view/iyadjafar

iyad.jafar@ju.edu.jo

2 of 35

Outline

  • Introduction
  • Measuring Execution Time
  • Examples
  • Performance, Energy, and Cost Trade-offs
  • Determinants of Performance
  • Make the Common Case Fast

2

Prof. Iyad Jafar

3 of 35

Introduction

3

Prof. Iyad Jafar

4 of 35

Introduction

  •  

4

 

 

Relative Performance

Prof. Iyad Jafar

5 of 35

Measuring Execution Time

5

Prof. Iyad Jafar

6 of 35

Measuring Execution Time

  • Almost all modern computers are based on a clock.
  • The clock is a periodic square wave with known period (cycle time).

  • Hence, the base unit in measuring time is the cycle time:

6

 

one clock cycle

 

 

Prof. Iyad Jafar

7 of 35

Measuring Execution Time

  •  

7

 

 

 

 

 

 

Prof. Iyad Jafar

8 of 35

The Performance Equation

  •  

8

 

 

 

 

 

Prof. Iyad Jafar

9 of 35

Examples

9

Prof. Iyad Jafar

10 of 35

Example 1

In a certain program, 1000 instructions were executed on a CPU running at 1 GHz. If the instruction counts and CPIs for each class are given below, then how long does it take to execute the program?

10

Instruction Class

Instruction Count

Class CPI

1

200

2

2

300

3

3

500

1

Effective CPI = (200x2+300x3+500x1)/1000 = 1.8 cycles/inst

 

 

Time = 1000 x 1.8 / 1×109 = 1.8 us

Prof. Iyad Jafar

11 of 35

Example 2

Suppose that computer A has clock cycle of 250 ps and CPI of 2.0 for some program, while computer B has clock cycle time of 500 ps and CPI of 1.2 for the same program, then which computer is faster? Assume same compiler and same ISA.

11

TimeA = ICA x 2 x 250 ps = 500 ICA ps

TimeB = ICB x 1.2 x 500 ps = 600 ICB ps

PerformanceA TimeB 600 IC

PerformanceB TimeA 500 IC

-------------------- = ---------- = --------- = 1.2

Computer A is 1.2 faster than B

Prof. Iyad Jafar

12 of 35

Example 3

A program was compiled using two compiles; C1 and C2, on a computer with CPI for different classes of instructions as given in the table. If the instruction counts for the compiled program using the two compilers is given the table, then which compiler is better?

12

 

Class

A

B

C

CPI for class

1

2

3

IC by C1

2

1

2

IC by C2

4

1

1

 

 

Prof. Iyad Jafar

13 of 35

Example 4

A certain processor has four instruction classes is to be modified using different approaches. The instruction mix of the program used in evaluating the different approaches is given the table below. Which approach is better?

    • The original processor
    • Approach 1. reduces the average load time to 2 cycles.
    • Approach 2. reduces the branch time by 1 cycle.
    • Approach 3. executes two ALU instructions at once.

13

Class

Frequency

CPI

ALU

50%

1

Load

20%

5

Store

10%

3

Branch

20%

2

CPIk x F

0.5

1.0

0.3

0.4

CPIk x F

0.5

0.4

0.3

0.4

CPIk x F

0.5

1.0

0.3

0.2

CPIk x F

0.25

1.0

0.3

0.4

2.2

1.6

2.0

1.95

Effective CPI

Original

App1

App2

App3

Speed up

1.375

1.10

1.128

CPIk x F

All

Prof. Iyad Jafar

14 of 35

Example 5

A program contains 106 instructions with the following instruction mix:

    • Class A: 10%
    • Class B: 20%
    • Class C: 50%
    • Class D: 20%

This program is executed on two different processors with CPI values and clock rates that are provided in the table. Which processor delivers better performance for this program?

14

CPI

Processor

CR

(GHz)

A

B

C

D

1

1.5

1

2

3

4

2

2

2

2

2

2

Prof. Iyad Jafar

15 of 35

Example 6

The table below shows the instruction mix and CPI values for a program running on a given processor. Suppose the processor is modified so that the CPI of Class-2 instructions is reduced to 2. Should this modification be adopted if it results in a 10% increase in the clock cycle time?

15

Classk

CPIk

Frequencyk

1

2

0.3

2

5

0.2

3

3

0.5

Prof. Iyad Jafar

16 of 35

Performance, Energy, and Cost Trade-offs

16

Prof. Iyad Jafar

17 of 35

Beyond Execution Time

  • Performance alone is not enough:
    • Execution time captures speed, but ignores energy usage and economic constraints.
  • Energy and power matter:
    • Affect battery life, thermal design, cooling needs, and long-term operating costs.
  • Cost influences feasibility:
    • Hardware price impacts scalability, deployment, and total cost of ownership.
  • Modern CPU selection must balance performance, energy efficiency, and cost.
    • Composite metrics are needed.
    • Energy, energy–delay, or cost–performance better reflect real-world decisions.

17

Prof. Iyad Jafar

18 of 35

Energy and Power

  •  

18

Prof. Iyad Jafar

19 of 35

Cost

  •  

19

 

Overall gain:

Prof. Iyad Jafar

20 of 35

Example 7

Computers A and B that implement the same ISA were evaluated using some program with an effective CPI of 2.5 and 1.8, respectively. The two computers run at the same clock, however:

    • Computer B hardware cost is 50% more than computer A
    • Computer B consumes 30% more power than computer A

Determine which computer is better if we consider:

    • Execution time only
    • CTP only
    • EDP only
    • Overall efficiency

Ignore the energy cost to run the program.

20

Prof. Iyad Jafar

21 of 35

Example 7 - Solution

  •  

21

 

 

 

 

Computer B is better

Computer A is better

Prof. Iyad Jafar

22 of 35

Example 7 - Solution

  • Considering EDP gain only:

  • Overall gain:

22

 

 

 

 

Computer B is better

Computer B is better

Summary

Computer B is faster and more energy-efficient despite higher power consumption, but Computer A provides better cost–performance. When all factors are combined, Computer B offers higher overall efficiency.

Prof. Iyad Jafar

23 of 35

Determinants of Performance

23

Prof. Iyad Jafar

24 of 35

Determinants of Performance

24

 

HW or SW Component

Affects?

How?

Algorithm

IC, CPI

  • Algorithm determines the number of source program instructions executed.
  • The algorithm may also favor slower or faster instructions.

Programming Language

IC, CPI

  • Type of supported statements in language
  • Higher abstraction require indirect calls

Compiler

IC, CPI

  • Compiler decides what instruction to use

ISA

IC, CPI, CC

  • Type and complexity of supported instructions in ISA

Processor Organization

CPI, CC

  • Single-cycle, multi-cycle, pipelined …

Technology

CC

  • Determines the propagation delay in hardware

The performance of a program depends on the algorithm, the language, the compiler, the architecture, and the actual hardware

Prof. Iyad Jafar

25 of 35

Make the Common Case Fast

25

Prof. Iyad Jafar

26 of 35

Make the Common Case Fast

  • A fundamental guideline in computer architecture and performance engineering.
  • Idea:
    • Not all operations in a computer system occur with the same frequency.
    • Some instructions, paths, or events occur very frequently, while others happen rarely.
    • To improve overall performance, optimize the operations that occur most often
      • Even if it makes the uncommon operations slightly slower or more complex.

26

Prof. Iyad Jafar

27 of 35

Example 8

  • Assume a processor that executes addition (90% of time) and division (10% of time)
    • Delay of addition is 2 ns
    • Delay of division of 15 ns
  • Two design options
    • Option1: Improve the addition to 1 ns
    • Option2: Improve the division to 10 ns

  • Time of original computer = 0.9 x 2 + 0.1 x 15 = 3.3 ns
  • Time if option1 = 0.9 x 1 + 0.1 x 15 = 2.4 ns
  • Time if option2 = 0.9 x 2 + 0.1 x 10 = 2.8 ns

27

Prof. Iyad Jafar

28 of 35

Amdahl's Law

  •  

28

 

 

 

Prof. Iyad Jafar

29 of 35

Amdahl's Law

  •  

29

 

Prof. Iyad Jafar

30 of 35

Example 9

Suppose 40% of a program can be optimized to run 3× faster. What is the overall speedup according to Amdahl’s Law?

By how much this fraction should be enhanced to achieve an overall speedup of 2?

30

 

 

 

 

It is impossible to achieve an overall speedup of 2 if only 40% of the program can be optimized, regardless of how fast that part becomes.

Maximum overall speedup is 1.67!

Prof. Iyad Jafar

31 of 35

Exercises

31

Prof. Iyad Jafar

32 of 35

Exercises

  1. A program is compiled using three different compilers with the following instruction mix and CPI per instruction type. All processors run at 2.5 GHz and execute 10⁹ instructions.
    • Compute the average CPI for each compiler.
    • Calculate the total CPU time.
    • Rank the compilers by performance.

32

Prof. Iyad Jafar

33 of 35

Exercises

  1. A CPU designer can choose between two options:
    • Design A: 3 GHz clock, average CPI = 2.0
    • Design B: 2.5 GHz clock, average CPI = 1.5

You expect to run a workload of 2 billion instructions.

    • Which design gives better execution time?
    • By what percentage is it better?
    • What if Design A’s CPI could be improved by optimizing 25% of instructions to execute with CPI = 1?

33

Prof. Iyad Jafar

34 of 35

Exercises

  1. You run a program on a multicore CPU. 75% of the program can be parallelized. Compute the expected speedup with 1, 2, 4, and 8 cores using Amdahl’s Law.

34

Prof. Iyad Jafar

35 of 35

Suggested Problems

  • Solve problems 1.6, 1.7 and 1.8 from Chapter 1 in the textbook.

35

Prof. Iyad Jafar