Compiler Design
UNIT -I
1
Compiler Design by Dr. S.Siva Nageswara Rao
Introduction
Translator: It is a program that translates one language to another Language. Example: Compiler
Compiler: A compiler is a program that accepts a program written in a High Level Language and produces an object (low-level) program.
Properties of Compiler�1. The compiler itself must be bug-free.�2. It must generate correct machine code�3. The generated machine code must run fast�4. The compiler itself must run fast.�5. The compiler must be portable.�6. It must give good diagnostics and error messages.�7. The generated code must work well with existing debuggers�8. It must have Consistent optimization.
2
Compiler Design by Dr. S.Siva Nageswara Rao
3
Compiler Design by Dr. S.Siva Nageswara Rao
Language Processing System
4
Compiler Design by Dr. S.Siva Nageswara Rao
5
Compiler Design by Dr. S.Siva Nageswara Rao
COMPILER:
ASSEMBLER:
6
Compiler Design by Dr. S.Siva Nageswara Rao
Difference between Compiler & interpreter
7
Compiler Design by Dr. S.Siva Nageswara Rao
LOADER AND LINK-EDITOR:
8
Compiler Design by Dr. S.Siva Nageswara Rao
Phases of Compiler
A compiler operates in phases. A phase is a logically interrelated operation that takes source program in one representation and produces output in another representation.
There are two phases of compilation.
Compilation process is partitioned into no. of sub processes called ‘phases’.
9
Compiler Design by Dr. S.Siva Nageswara Rao
10
Compiler Design by Dr. S.Siva Nageswara Rao
Lexical analysis:
Tokens: It represents a logically cohesive sequence of characters such as keywords, operators, identifiers, special symbols etc.
Example: position=initial + rate*60
Here, posotion,initial,rate,+,*,=,60 are all separate tokens.
11
Compiler Design by Dr. S.Siva Nageswara Rao
Syntax Analysis:-
Example: For position=initial + rate*60, syntax tree is
12
Compiler Design by Dr. S.Siva Nageswara Rao
13
Compiler Design by Dr. S.Siva Nageswara Rao
Semantic analysis :
14
Compiler Design by Dr. S.Siva Nageswara Rao
Intermediate Code Generations:-
Example: t1=t2+t3
Code Optimization:- It is the fifth phase of the compiler.
15
Compiler Design by Dr. S.Siva Nageswara Rao
Optimization involves
16
Compiler Design by Dr. S.Siva Nageswara Rao
Code Generation:
- allocation of register and memory
- generation of correct references
- generation of correct data types
-generation of missing code.
17
Compiler Design by Dr. S.Siva Nageswara Rao
Symbol Table Management :-
Data Structures used on symbol table:
18
Compiler Design by Dr. S.Siva Nageswara Rao
Symbol table example
Operations: Operations on symbol table are categorized in to two types
Block storage Structure: More than one block in the program { } { } { }
Operations: Insert , Lookup, modify and delete
Non Block Storage Structure : Only One block in the entire program{ }
Operations :Insert , Lookup
19
Compiler Design by Dr. S.Siva Nageswara Rao
Error Handlers:-
Example: Int X; , inti x,y: , float m=5..7:
20
Compiler Design by Dr. S.Siva Nageswara Rao
21
Compiler Design by Dr. S.Siva Nageswara Rao
Lexical Analysis
The Role Of The Lexical Analyzer:
22
Compiler Design by Dr. S.Siva Nageswara Rao
Why Lexical Analyzer is separated from Syntax Analyzer?
• To improve the efficiency of the compiler.
• To enhance the computer portability.
23
Compiler Design by Dr. S.Siva Nageswara Rao
Token: A token is a sequence of characters that can de treated as a unit with logical meaning
Example:
Keywords: for, while, do, if
Identifier: Variable name, function name
Operator: +, ++, - ,/ ,
Separators: , . ;
24
Compiler Design by Dr. S.Siva Nageswara Rao
Lexeme: It is a sequence of characters in the source program that is matched by the pattern for a token
Example: “float”, “=”, “222” “;”
Pattern: It is a rule describing all those lexeme that can be represent a particular token in a source language
25
Compiler Design by Dr. S.Siva Nageswara Rao
Attributes for Tokens
i) Constant : value of the constant
ii) Identifiers: pointer to the corresponding symbol table entry.
Error Recovery Strategies In Lexical Analysis:
The following are the error-recovery actions in lexical analysis
1) Deleting an extraneous character.
2) Inserting a missing character.
3) Replacing an incorrect character by a correct character.
4) Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the token until error is resolved.
26
Compiler Design by Dr. S.Siva Nageswara Rao
Input Buffering�
27
Compiler Design by Dr. S.Siva Nageswara Rao
How Input Buffering Works
There are typically two types of buffers used:
Single Buffer: A single large block of memory that holds part of the source code.
28
Compiler Design by Dr. S.Siva Nageswara Rao
2. Double Buffer: Two blocks of memory, used alternately, to ensure that while one buffer is being processed, the other can be filled with new characters from the source file
29
Compiler Design by Dr. S.Siva Nageswara Rao
Sentinels
30
Compiler Design by Dr. S.Siva Nageswara Rao
Advantages of Input Buffering
1.Efficiency: By reading large blocks of data at once, input buffering reduces the number of input operations, making the process faster.
2. Reduced Latency: Double buffering ensures that while one buffer is being processed, the other is being filled, reducing waiting time and increasing the overall speed of the lexical analysis.
3. Smooth Processing: The use of sentinels helps in seamless buffer transitions, avoiding constant end-of-buffer checks.
31
Compiler Design by Dr. S.Siva Nageswara Rao
REGULAR EXPRESSIONS
Regular expression is a formula that describes a possible set of string.
Component of regular expression:
32
Compiler Design by Dr. S.Siva Nageswara Rao
Identifier = letter (letter | digit)*
Here are the rules that define the regular expression over alphabet .
If R and S are regular expressions, then
33
Compiler Design by Dr. S.Siva Nageswara Rao
REGULAR DEFINITIONS
Example:
Pascal identifier
34
Compiler Design by Dr. S.Siva Nageswara Rao
Recognition of tokens:
Starting point is the language grammar to understand the tokens
Stmt -> if expr then stmt
| If expr then else stmt
| є
Expr -> term relop term
| term
Term -> id
| number
For relop ,we use the comparison operations of languages like Pascal or SQL where = is “equals” and < > is “not equals” because it presents an interesting structure of lexemes.
35
Compiler Design by Dr. S.Siva Nageswara Rao
Recognition of tokens:
Now, We know how to specify the tokens for one language. But, how do we write a program to recognize them
To handle white spaces:
Ws->(blank | tab| newline)
36
Compiler Design by Dr. S.Siva Nageswara Rao
37
Compiler Design by Dr. S.Siva Nageswara Rao
38
Compiler Design by Dr. S.Siva Nageswara Rao
�TRANSITION DIAGRAM: �
39
Compiler Design by Dr. S.Siva Nageswara Rao
Components of Transition Diagram
40
Compiler Design by Dr. S.Siva Nageswara Rao
41
Compiler Design by Dr. S.Siva Nageswara Rao
42
Compiler Design by Dr. S.Siva Nageswara Rao
43
Compiler Design by Dr. S.Siva Nageswara Rao
Recognition of Reserved Words and Identifiers
There are two ways that we can handle reserved words that look like identifiers: (Method-I)
44
Compiler Design by Dr. S.Siva Nageswara Rao
Method-II
45
Compiler Design by Dr. S.Siva Nageswara Rao
Completion of the Running Example�
Figure: Transition diagram for unsigned numbers
46
Compiler Design by Dr. S.Siva Nageswara Rao
47
Compiler Design by Dr. S.Siva Nageswara Rao
Recognizing Whitespace
The diagram itself is quite simple reflecting the simplicity of the corresponding regular expression.
48
Compiler Design by Dr. S.Siva Nageswara Rao
LEX
First, a specification of a lexical analyzer is prepared by creating a program lex.l in the Lex language.
Then, lex.l is run through the Lex compiler to produce a C program lex.yy.c.
Finally, lex.yy.c is run through the C compiler to produce an object program a.out, which is the lexical analyzer that transforms an input stream into a sequence of tokens.
49
Compiler Design by Dr. S.Siva Nageswara Rao
{ definitions }
%%
{ rules }
%%
{ user subroutines }
50
Compiler Design by Dr. S.Siva Nageswara Rao
YACC-
51
Compiler Design by Dr. S.Siva Nageswara Rao
FINITE AUTOMATA
Types of Finite Automata There are tow types of Finite Automata
Qn – finite set of states Ʃ – finite set of input symbols
δ – transition function that maps state-symbol pairs to set of states q0 – starting state fn – final state
52
Compiler Design by Dr. S.Siva Nageswara Rao
Deterministic Finite Automata DFA is a special case of a NFA in which
i) no state has an ε-transition.
ii) there is at most one transition from each state on any input.
DFA has five tuples denoted by M = {Qd, Ʃ, δ, q0, fd}
Qd – finite set of states Ʃ – finite set of input symbols
δ – transition function that maps state-symbol pairs to set of states
q0 – starting state fd – final state
Construction of DFA from regular expression
The following steps are involved in the construction of DFA from regular expression:
i) Convert RE to NFA using Thomson’s rules
ii) Convert NFA to DFA
iii) Construct minimized DFA
53
Compiler Design by Dr. S.Siva Nageswara Rao