1 of 31

Measuring and Reporting Performance

Chapter 1

1

Prof. Iyad Jafar

https://sites.google.com/view/iyadjafar

iyad.jafar@ju.edu.jo

2 of 31

Outline

  • Quantitative Principles of Computer Design
  • Measuring Execution Time
  • Amdhal’s Law
  • Reliability and Dependability

2

Prof. Iyad Jafar

3 of 31

Quantitative Principles of Computer Design

3

Prof. Iyad Jafar

4 of 31

Principles of Computer Design

  • Taking advantage of parallelism
    • System level 🡪 multiple disks and processors
    • Processor level 🡪 pipelining (ILP)
    • Logic design 🡪 CLA and banked caches
  • Principle of Locality
    • Programs tend to reuse data and instructions they have used recently.
    • Programs tend to use data and instructions that are near to previously accessed data and instructions.
    • Temporal and Spatial!
  • Focus on the Common Case
    • In making a design trade-off, favor the frequent case over the infrequent case
    • The frequent case is often simpler and can be done faster.

4

Prof. Iyad Jafar

5 of 31

Measuring Execution Time

5

Prof. Iyad Jafar

6 of 31

Introduction

  •  

6

 

 

Relative Performance

Prof. Iyad Jafar

7 of 31

Measuring Execution Time

  • Almost all modern computers are based on a clock
  • The clock is a periodic square wave with known period (cycle time)

  • Hence, the base unit in measuring time is the cycle time

7

 

one clock cycle

 

 

Prof. Iyad Jafar

8 of 31

Measuring Execution Time

  •  

8

 

 

 

 

 

 

Prof. Iyad Jafar

9 of 31

The Performance Equation

  •  

9

 

 

 

 

Prof. Iyad Jafar

10 of 31

Example 1

  • In a certain program, 1000 instructions were executed on a CPU running at 1 GHz. If the instruction counts and CPI for each class are given below, how long does it take to execute the program?

10

Instruction Class

Instruction Count

Class CPI

1

200

2

2

300

3

3

500

1

Effective CPI = (200x2+300x3+500x1)/1000 = 1.8

 

 

Time = 1000 x 1.8 / 1×109 = 1.8 us

Prof. Iyad Jafar

11 of 31

Example 2

  • Suppose that computer A has clock cycle of 250 ps and CPI 2.0 for some program, and computer B has clock cycle time of 500 ps and CPI of 1.2 for the same program, then which computer is faster ?

11

TimeA = IC x 2 x 250 ps = 500 IC ps

TimeB = IC x 1.2 x 500 ps = 600 IC ps

PerformanceA TimeB 600 IC

PerformanceB TimeA 500 IC

-------------------- = ---------- = --------- = 1.2

Computer A is 1.2 faster than B

Prof. Iyad Jafar

12 of 31

Example 3

A compiler designer is trying to decide between two code sequences for a computer with CPI for different classes of instructions is given in the table. The two code sequences require the given instruction counts.

Which code will be faster to execute?

12

 

Class

A

B

C

CPI for class

1

2

3

IC in sequence 1

2

1

2

IC in sequence 2

4

1

1

 

Prof. Iyad Jafar

13 of 31

Determinants of Performance

13

 

HW or SW Component

Affects?

How?

Algorithm

IC, CPI

  • Algorithm determines the number of source program instructions executed.
  • The algorithm may also favor slower or faster instructions.

Programming Language

IC, CPI

  • Type of supported statements in language
  • Higher abstraction require indirect calls

Compiler

IC, CPI

  • Compiler decides what instruction to use

ISA

IC, CPI, CC

  • Type and complexity of supported instructions in ISA

Processor Organization

CPI, CC

  • Single-cycle, multi-cycle, pipelined …

Technology

CC

  • Determines the propagation delay in hardware

The performance of a program depends on the algorithm, the language, the compiler, the architecture, and the actual hardware

Prof. Iyad Jafar

14 of 31

Example 4

  • A certain processor has four instruction classes is to be modified using different approaches. The details of the program used in evaluating different approaches are given in the table below. What is CPI for:
    • The original processor
    • Approach 1. A cache is added and it reduces the average load time to 2 cycles.
    • Approach 2. A branch prediction scheme is used and it cuts the branch time by 1 cycle.
    • Approach 3. A second ALU is added to execute two ALU instructions at once.

14

Class

Frequency

CPI

ALU

50%

1

Load

20%

5

Store

10%

3

Branch

20%

2

CPIk x F

0.5

1.0

0.3

0.4

CPIk x F

0.5

0.4

0.3

0.4

CPIk x F

0.5

1.0

0.3

0.2

CPIk x F

0.25

1.0

0.3

0.4

2.2

1.6

2.0

1.95

Effective CPI

Original

App1

App2

App3

Speed up

1.375

1.10

1.128

Prof. Iyad Jafar

15 of 31

Example 5

  • Given a program with 106 instructions with the following mix: 10% class A, 20% class B, 50% class C, and 20% class D. If this program is executed on two different processors with the specifications given in the table, then
    • What is the effective CPI for the program on each processor?
    • Which implementation is faster?
    • What is the speedup?

15

CPI

Processor

CR (GHz)

A

B

C

D

1

1.5

1

2

3

4

2

2

2

2

2

2

Prof. Iyad Jafar

16 of 31

Example 6

The information for some program that is executed on some processor is given in the table.

If the processor is modified such that the CPI for Class 2 instructions is reduced to 2, then would it be beneficial to adopt this modification if this modification requires increasing the clock cycle by 10%?

16

Classi

CPIi

Frequencyi

1

2

0.3

2

5

0.2

3

3

0.5

Prof. Iyad Jafar

17 of 31

Measuring Performance

  • How to evaluate?
    • End user experience?!

  • Using benchmarks!
    • Kernels (small pieces of real applications, e.g. matrix multiply)
    • Toy programs (100-line simple programs, e.g. sorting)
    • Synthetic benchmarks (fake programs that simulate the behavior of real applications, e.g. Dhrystone)
    • Benchmark suites (e.g. SPEC06fp, TPC-C) (Fig. 1.17)

  • The guiding principle of reporting performance measurements should be reproducibility

17

Prof. Iyad Jafar

18 of 31

Measuring Performance

  • Benchmarks have multiple programs with varying execution time!

  • Compute SPECratio
    • To report the performance of X, we use a reference computer, hence spec ratio.

18

Prof. Iyad Jafar

19 of 31

Measuring Performance

19

Prof. Iyad Jafar

20 of 31

Amdhal’s Law

20

Prof. Iyad Jafar

21 of 31

Amdahl's Law

  •  

21

 

 

 

Prof. Iyad Jafar

22 of 31

Amdahl's Law

  •  

22

 

Prof. Iyad Jafar

23 of 31

Example

  • Check other examples in the book

23

Prof. Iyad Jafar

24 of 31

Reliability and Dependability

24

Prof. Iyad Jafar

25 of 31

System Operation

  • Historically, ICs are one of the most reliable computer components
    • Challenge is increasing with 16 nm feature size

  • When a system is operating properly?

  • Systems alternate between two states of service:
    • Service accomplishment
    • Service interruption

  • Measures of Dependability
    • Module reliability
    • Module availability

25

Service accomplishment

Service delivered�as specified

Service interruption

Deviation from�specified service

Failure

Restoration

Prof. Iyad Jafar

26 of 31

Reliability and Dependability

  • Module reliability
    • Reflects continuous service accomplishment from a reference initial instant (time to fail) and is measured by mean time to failure (MTTF)
    • Rate failures (FIT) is the reciprocal of MTTF.
    • Rate of failures is generally reported per billion hours

FIT = 109/MTTF

    • Service interruption is measured as mean time to repair (MTTR).
    • Mean time between failures (MTBF) is simply the sum of MTTF + MTTR.
  • Module availability
    • A measure of the service accomplishment with respect to the alternation between accomplishment and interruption

26

Prof. Iyad Jafar

27 of 31

Example

  • Assume a disk subsystem with the following components and MTTF
    • 10 disks, each rated at 1,000,000-hour MTTF
    • 1 ATA controller, 500,000-hour MTTF
    • 1 power supply, 200,000-hour MTTF
    • 1 fan, 200,000-hour MTTF
    • 1 ATA cable, 1,000,000-hour MTTF

Assume independent failures and exponentially distributed life times

27

If a collection of modules has exponentially distributed lifetimes, the overall failure rate of the collection is the sum of the failure rates of the modules.

Prof. Iyad Jafar

28 of 31

Exercises

28

Prof. Iyad Jafar

29 of 31

Exercises

A program is compiled using three different compilers with the following instruction mix and CPI per instruction type. All processors run at 2.5 GHz and execute 10⁹ instructions.

    • Compute the average CPI for each compiler.
    • Calculate the total CPU time.
    • Rank the compilers by performance.

29

Prof. Iyad Jafar

30 of 31

Exercises

A CPU designer can choose between two options:

    • Design A: 3 GHz clock, average CPI = 2.0
    • Design B: 2.5 GHz clock, average CPI = 1.5

You expect to run a workload of 2 billion instructions.

    • Which design gives better execution time?
    • By what percentage is it better?
    • What if Design A’s CPI could be improved by optimizing 25% of instructions to execute with CPI = 1?

30

Prof. Iyad Jafar

31 of 31

Exercises

  1. Suppose 40% of a program can be optimized to run 3× faster. What is the overall speedup according to Amdahl’s Law?

  • You run a program on a multicore CPU. 75% of the program can be parallelized. Compute the expected speedup with 1, 2, 4, and 8 cores using Amdahl’s Law.

31

Prof. Iyad Jafar