Lexical Analysis
Dr. Noman Islam
Introduction
Role of lexical analyzer
Other tasks of lexical analyzer
a) Scanning consists of the simple processes that do not require tokenization of the input, such as deletion of comments and compaction of consecutive whitespace characters into one.
b) Lexical analysis proper is the more complex portion, where the scanner produces the sequence of tokens as output.
Why a separate phase
Tokens, patterns and lexemes
Tokens example
1. One token for each keyword. The pattern for a keyword is the same as
the keyword itself.
2. Tokens for the1 operators, either individually or in classes such as the token comparison mentioned in Fig. 3.2.
3. One token representing all identifiers.
4. One or more tokens representing constants, such as numbers and literal
5. Tokens for each punctuation symbol, such as left and right parentheses,
comma, and semicolon.
Attributes for tokens
Lexical errors
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters
Input buffering
Sentinels
String and language
Operations on language
Regular expression
Regular definition
Recognition of tokens
Our goal
Transition diagram
Handling reserved words
Finite Automata
Non deterministic finite automata
Representing NFA
Example of an NFA
Transition table
The string aabb is accepted by the NFA of Fig. 3.24
Deterministic Finite Automata
Simulating a DFA
Operations on NFA states
Subset construction
From regular expression to NFA