1 of 37

Code-generation

Chapter 6

2 of 37

Code generation

  • Syllabus

1.Code generator – introduction

2.Issues in the design of a code generator

3.The Target machine Language

4.Basic blocks and flow graphs

5.Adresseses in the target code

6.Optimization of basic blocks

7.A simple code generator

3 of 37

source front intermediate code intermediate code target

program end code optimizer code generator program

symbol

table

4 of 37

1.Issues in the design of a code generator

1.Input to the code generator

2.Target Program

3.Memory management

4.Instruction selection

5.Register allocation

6.Choices of Evaluation order

5 of 37

1.Input to the code generator

  • It consists of the intermediate representation of the source program produced by the front end and symbol table.
  • Linear representation: postfix notation
  • Three address representation :quadruples, triples, Indirect triples etc.
  • Virtual machine representation: stack machine code
  • Graphical representation: syntax trees ,dags

6 of 37

Input to the code generator

  • Assumption:

1.We assume that prior to code generating the front end has scanned ,parsed, and translated the source program into a reasonably detailed intermediate representation ,

2.Type checking has taken place

3.The code generation phase can therefore be proceed on the assumption that its input is free of errors.

7 of 37

2.Issue:Target program

  • This output may be of different form

a.Absolute machine language: It can be placed in a fixed location

in memory and immediately executed

b. Relocatable machine language : Allows subprograms to be

compiled separately. A set of relocatable object modules can be

linked together and loaded for execution by a linking loader.

c. Assembly language: Generates symbolic instructions and use the

macro facilities of the assembler to help generate code

8 of 37

3rd Issue:Memory Management�

  • Mapping names in the source program to addresses of data objects in runtime memory is done co-operatively by the front end and the code generator.
  • A name in a three address statement refers to a symbol table entry for the name.
  • From the symbol table information a relative address can be determined for the name in a data area for the procedure

9 of 37

4th Issue: Instruction selection

  • The uniformity and completeness of the instruction set are important factors
  • Deciding which machine code sequence is best for a given three address construct .

10 of 37

5th Issue:Register Allocation

  • Instructions involving register operands are usually shorter than those involving operands in memory.
  • The efficient utilization of registers is particularly important in generating good code.
  • The use of register is often subdivided into two subproblems

1.During register allocation we select the set of variables that will reside in register at a point in the program.

2.During a subsequent register assignment phase we pick the specific register that a variable will reside in.

11 of 37

Example 1:

  • Consider the three address instruction

t = a+ b

t = t*c

t = t/d

Optimal machine code sequence

L R1,a R1 contains a

A R1,b R1 contains a+b

M R0,c R0 contains (a+b)*c

D R0,d R0 contains remainder of (a+b)*c/d

ST R1,t R1 contains quotient of (a+b)*c/d and the value of R1 is stored in t.

12 of 37

Example2

  • Consider the three -address code sequence

t = a+b

t=t+c

t = t/d

Optimal machine code sequence

L R0,a R0 contains a

A R0,b Ro contains a+b

A R0,c R0 contains a+b+c

D R0,d R0 contains the remainder of (a+b+c)/d

13 of 37

6th Issue:Choice of Evaluation order.

  • The order in which computations are prepared can affect the efficiency of the target code.
  • Some computations order requires fewer register to hold intermediate results than other.
  • Picking a best order is another difficult problem.

14 of 37

The Target Machine (Language)

  • Familiarity with the target machine and its instruction set is a prerequisite for designing a good code generator.
  • Target machine is a byte addressable machine with four bytes to a word and n-general purpose registers R0,R1,---Rn-1.
  • It has two address instructions of the form
  • op source destination

15 of 37

The Target Machine

  • Here op is an opcode and source and destination are data fields.It has the following op-codes among other

  • MOV ( move source to destination)
  • ADD ( add source to destination)
  • SUB (subtract source from destination)

16 of 37

The address modes together with assembly language form and associated costs are

MODE FORM ADDRESS ADDED COST

absolute M M 1

register R R 0

indexed c(R) c+contents(R) 1

indirect register *R contents(R) 0

indirect indexed *c(R) contents(c+contents(R) 1

17 of 37

Examples:

1.MOV R0,M

2.MOV 4(R0),M :stores contents (4+contents(R0)) into memory location M

3.MOV *4(R0),M store the value contents(contents(4+contents(R0)))

A final address mode allows the source to be a construct:

MODE FORM CONSTANT ADDED COST

literal #c c 1

Thus the instruction

MOV #1,RO loads the constant 1 into register R0.

18 of 37

Instruction Cost

  • It corresponds to the length of (in words ) of the instruction.

  • Address modes involving registers have cost zero.
  • While those with a memory location or literal in them have cost one .

19 of 37

Examples:

1. MOV R0,R1 cost is 1

2.MOV R5,M cost is 2

3.ADD #1, R3 cost is 2

20 of 37

What is the cost?(a=b+c)

  • MOV b ,R0

ADD c,R0

MOV R0,a

21 of 37

Ans

  • 6

22 of 37

What is the cost?

MOV b,a

ADD c,a

23 of 37

Ans

  • 6

24 of 37

What is the cost�?

MOV *R1,*R0

ADD *R2,*R0

Here R0,R1 and R2 contains the address of a,b and c respectively.

25 of 37

Ans

  • 2

26 of 37

What is the cost?

ADD R2,R1

MOV R1,a

here R1 and R2 contains the values of b and c respectively.

27 of 37

Conclusion about Target machine

  • In order to generate good code for this machine we must utilize its addressing capabilities efficiently.

28 of 37

Example 1:

  • Consider the three address instruction

t = a+ b

t = t*c

t = t/d

Optimal machine code sequence

L R1,a R1 contains a

A R1,b R1 contains a+b

M R0,c R0 contains (a+b)*c

D R0,d R0 contains remainder of (a+b)*c/d

ST R1,t R1 contains quotient of (a+b)*c/d and the value of R1 is stored in t.

29 of 37

Example2

  • Consider the three -address code sequence

t = a+b

t=t+c

t = t/d

Optimal machine code sequence

L R0,a R0 contains a

A R0,b Ro contains a+b

A R0,c R0 contains a+b+c

SRDA R0,32 a+b+c is transferred to R1 and R0 is cleared

D R0,d R0 contains the remainder of (a+b+c)/d

ST R1,t R1 contains the quotient of (a+b+c)/d and it is stored in t.

30 of 37

Basic Blocks and Flow graphs

  • Basic Blocks:

A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.

t1 = a*a

t2 = a*b

t3= 2*t2

t4= t1+t3

t5=b*b

t6=t4+t5

31 of 37

Algorithm:Partition into basic blocks

Input : A sequence of three -address statements

Output:A list of basic blocks with each three address statement in exactly one block.

32 of 37

Method

  • 1.We first determine the set of leaders the first statements of basic blocks.The rules we use are the following

i)The first statement is a leader

ii)Any statement that is the target of a conditional or unconditional goto is a leader

iii)Any statement that immediately follows a goto or conditional goto statement is leader.

2.For each leader ,its basic block consists of the leader and all statements up to but not including the next leader or the end of the program.

33 of 37

Example

  • Consider fragment of source code

begin

prod= 0;

i =1;

do begin

prod = prod +a[i]*b[i];

i=i+1;

end

while i<=20

end

34 of 37

Three address code computing dot product

1 prod=0

2 i=1

3 t1=4*i

4. t2=a[t1]

5. t3=4*i

6. t4=b[t3]

7. t5=t2*t4

8 .t6=prod+t5

9. prod=t6

10. t7=i+1

11 i = t7

12 if i<=20 goto(3)

35 of 37

Class work

  • Identify the different blocks in the above three-address code computing dot product.

36 of 37

Ans

  • Two blocks
  • one (1) (2)
  • two (3)……….(12)

37 of 37

prod=0 B1

i=1

t1=4*i

t2=a[t1]

t3=4*i

t4=b[t3]

t5=t2*t4

.t6=prod+t5 B2

prod=t6

t7=I+1

i = t7

if i<=20 goto(3)

Representation of Basic Blocks