Measuring and Evaluating Performance
Chapter 1
Section: 1.6
Outline
2
Prof. Iyad Jafar
Introduction
3
Prof. Iyad Jafar
Introduction
4
Relative Performance
Prof. Iyad Jafar
Measuring Execution Time
5
Prof. Iyad Jafar
Measuring Execution Time
6
one clock cycle
Prof. Iyad Jafar
Measuring Execution Time
7
Prof. Iyad Jafar
The Performance Equation
8
Prof. Iyad Jafar
Examples
9
Prof. Iyad Jafar
Example 1
In a certain program, 1000 instructions were executed on a CPU running at 1 GHz. If the instruction counts and CPIs for each class are given below, then how long does it take to execute the program?
10
Instruction Class | Instruction Count | Class CPI |
1 | 200 | 2 |
2 | 300 | 3 |
3 | 500 | 1 |
Effective CPI = (200x2+300x3+500x1)/1000 = 1.8 cycles/inst
Time = 1000 x 1.8 / 1×109 = 1.8 us
Prof. Iyad Jafar
Example 2
Suppose that computer A has clock cycle of 250 ps and CPI of 2.0 for some program, while computer B has clock cycle time of 500 ps and CPI of 1.2 for the same program, then which computer is faster? Assume same compiler and same ISA.
11
TimeA = ICA x 2 x 250 ps = 500 ICA ps
TimeB = ICB x 1.2 x 500 ps = 600 ICB ps
PerformanceA TimeB 600 IC
PerformanceB TimeA 500 IC
-------------------- = ---------- = --------- = 1.2
Computer A is 1.2 faster than B
Prof. Iyad Jafar
Example 3
A program was compiled using two compiles; C1 and C2, on a computer with CPI for different classes of instructions as given in the table. If the instruction counts for the compiled program using the two compilers is given the table, then which compiler is better?
12
Class | A | B | C |
CPI for class | 1 | 2 | 3 |
IC by C1 | 2 | 1 | 2 |
IC by C2 | 4 | 1 | 1 |
Prof. Iyad Jafar
Example 4
A certain processor has four instruction classes is to be modified using different approaches. The instruction mix of the program used in evaluating the different approaches is given the table below. Which approach is better?
13
Class | Frequency | CPI |
ALU | 50% | 1 |
Load | 20% | 5 |
Store | 10% | 3 |
Branch | 20% | 2 |
CPIk x F |
0.5 |
1.0 |
0.3 |
0.4 |
CPIk x F |
0.5 |
0.4 |
0.3 |
0.4 |
CPIk x F |
0.5 |
1.0 |
0.3 |
0.2 |
CPIk x F |
0.25 |
1.0 |
0.3 |
0.4 |
2.2 |
1.6 |
2.0 |
1.95 |
Effective CPI
Original
App1
App2
App3
Speed up
1.375 |
1.10 |
1.128 |
CPIk x F |
|
|
|
|
All
|
|
Prof. Iyad Jafar
Example 5
A program contains 106 instructions with the following instruction mix:
This program is executed on two different processors with CPI values and clock rates that are provided in the table. Which processor delivers better performance for this program?
14
| | CPI | |||
Processor | CR (GHz) | A | B | C | D |
1 | 1.5 | 1 | 2 | 3 | 4 |
2 | 2 | 2 | 2 | 2 | 2 |
Prof. Iyad Jafar
Example 6
The table below shows the instruction mix and CPI values for a program running on a given processor. Suppose the processor is modified so that the CPI of Class-2 instructions is reduced to 2. Should this modification be adopted if it results in a 10% increase in the clock cycle time?
15
Classk | CPIk | Frequencyk |
1 | 2 | 0.3 |
2 | 5 | 0.2 |
3 | 3 | 0.5 |
Prof. Iyad Jafar
Performance, Energy, and Cost Trade-offs
16
Prof. Iyad Jafar
Beyond Execution Time
17
Prof. Iyad Jafar
Energy and Power
18
Prof. Iyad Jafar
Cost
19
Overall gain:
Prof. Iyad Jafar
Example 7
Computers A and B that implement the same ISA were evaluated using some program with an effective CPI of 2.5 and 1.8, respectively. The two computers run at the same clock, however:
Determine which computer is better if we consider:
Ignore the energy cost to run the program.
20
Prof. Iyad Jafar
Example 7 - Solution
21
Computer B is better
Computer A is better
Prof. Iyad Jafar
Example 7 - Solution
22
Computer B is better
Computer B is better
Summary
Computer B is faster and more energy-efficient despite higher power consumption, but Computer A provides better cost–performance. When all factors are combined, Computer B offers higher overall efficiency.
Prof. Iyad Jafar
Determinants of Performance
23
Prof. Iyad Jafar
Determinants of Performance
24
HW or SW Component | Affects? | How? |
Algorithm | IC, CPI |
|
Programming Language | IC, CPI |
|
Compiler | IC, CPI |
|
ISA | IC, CPI, CC |
|
Processor Organization | CPI, CC |
|
Technology | CC |
|
The performance of a program depends on the algorithm, the language, the compiler, the architecture, and the actual hardware
Prof. Iyad Jafar
Make the Common Case Fast
25
Prof. Iyad Jafar
Make the Common Case Fast
26
Prof. Iyad Jafar
Example 8
27
Prof. Iyad Jafar
Amdahl's Law
28
Prof. Iyad Jafar
Amdahl's Law
29
Prof. Iyad Jafar
Example 9
Suppose 40% of a program can be optimized to run 3× faster. What is the overall speedup according to Amdahl’s Law?
By how much this fraction should be enhanced to achieve an overall speedup of 2?
30
It is impossible to achieve an overall speedup of 2 if only 40% of the program can be optimized, regardless of how fast that part becomes.
Maximum overall speedup is 1.67!
Prof. Iyad Jafar
Exercises
31
Prof. Iyad Jafar
Exercises
32
Prof. Iyad Jafar
Exercises
You expect to run a workload of 2 billion instructions.
33
Prof. Iyad Jafar
Exercises
34
Prof. Iyad Jafar
Suggested Problems
35
Prof. Iyad Jafar