Lex & Yacc
Text Book
*Levine, John R., Tony Mason and Doug Brown [1992]. Lex & Yacc. O’Reilly & Associates, Inc. Sebastopol, California.
2
Hathal & Ahmad
*
Compiler
Source program
Target program
3
Hathal & Ahmad
compiler
*
Interpreter
Source program Input
Output
4
Hathal & Ahmad
Interpreter
*
Structure of Compiler
5
Hathal & Ahmad
*
Lexical Analysis
For example :<token-name, attribute-value>
position = initial + rate * 60
token <id, 1>
token <=>
token <id, 2>
token <+>
token <*>
token <60>
<id,l> < = > <id, 2> <+> <id, 3> <*> <60>
6
Hathal & Ahmad
*
Lexical Analysis
7
Hathal & Ahmad
*
Syntax Analysis
8
Hathal & Ahmad
*
Lex and Yacc
9
Hathal & Ahmad
*
Lex and Yacc
%%
.|\n ECHO;
%%
10
Hathal & Ahmad
*
Lex syntax
……..Declaration and Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
Echo is an action and predefined macro in lex that writes code matched by the pattern.
11
Hathal & Ahmad
*
Lex syntax
%{
headerfiles, symbolic constants,
Declarations and comments ,
c/c++ function declarations or definitions
%}
Example::
%{
#include <stdio.h>
#define NUMBER 400
#define COMMENT 401
#define TEXT 402
#define COMMAND 403
%}
12
Hathal & Ahmad
*
Lex syntax
P1 {A1}
P2 {A2}
….
Pn {An}
%%
Example::
%%
[\t] ;
[0-9]+ printf(“found an intereger:%s\n”,yytext);
%%
13
Hathal & Ahmad
*
Lex syntax
%%
Void main()
{
yylex();
}
14
Hathal & Ahmad
*
Lex program
%{
/*
* this sample demonstrates (very) simple recognition:
* a verb/not a verb.
*/
%}
%%
[\t ]+ ;
is | am | are | were | was | be | being | been | do | does | did | will | would | should | can | could | has | have | had | go { printf("%s: is a verb\n", yytext); }
[a-zA-Z]+ { printf("%s: is not a verb\n", yytext); }
.|\n { ECHO; /* normal default anyway */ }
%%
main()
{
yylex() ;
}
15
Hathal & Ahmad
*
Lex program con…
Example 1-1
did I have fun?
did: is a verb
I: is not a verb
have: is a verb
fun: is not a verb ?
^D
%
Symbol Table
noun dog cat horse cow
verb chew eat lick
Reserved words
Add_word()
Lookup()_word()
16
Hathal & Ahmad
*
Symbol table example
%{
/*
*
Word recognizer with a symbol table.
*/
enum { LOOKUP =0, /* default - looking rather than defining. */
VERB,
ADJ,
ADV,
NOUN,
REP,
PRON,
CONJ
};
int state;
int add_word(int type, char *word);
int lookup_word(char *word);
%}
17
Hathal & Ahmad
*
Grammar
Set of actions.
Example
noun verb.
noun verb noun
For instance:
subject → noun | pronoun
object → noun
sentence → subject verb object
Parser-lexer communication
It calls the lexer yylex() whenever it needs a token from the input .
The lexer then scans through the input recognizing tokens
Not all tokens are of interest to the parser
# define NOUN 257
# define PRONOUN 258
# define VERB 259
# define ADVERB 260 …….
18
Hathal & Ahmad
*
The parts of speech lexer
19
Hathal & Ahmad
*
Rules section
20
Hathal & Ahmad
*
Running lex and Yacc
21
Hathal & Ahmad
*
Lex vs handwritten lex
22
Hathal & Ahmad
*
Lex vs handwritten lex
23
Hathal & Ahmad
*
Lex
24
Hathal & Ahmad
*
Lex
25
Hathal & Ahmad
*
Lex
Pattern Matching Primitives
26
Hathal & Ahmad
*
Regular Expression
27
Hathal & Ahmad
. | Any single character |
* | Zero or more copoies of preceding expression |
[ ] | Any character within the bracket |
^ | Beginning of line as first character |
$ | End of line as last character |
{ } | How many times previous pattern allowed |
\ | Escape metacharacters |
+ | One or more occurance |
? | Zero or more occurance |
| | Either preceding or following |
“…” | Marks literally |
/ | Preceding expression |
( ) | Groups a series of expression |
*
Example of Regular expression
28
Hathal & Ahmad
*
Lex syntax
……..Declaration and Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
29
Hathal & Ahmad
*
Lex syntax
%{
headerfiles,symbolic constants,
Declarations and comments ,
c/c++ function declarations or definitions
%}
Example::
%{
#include <stdio.h>
#define NUMBER 400
#define COMMENT 401
#define TEXT 402
#define COMMAND 403
%}
30
Hathal & Ahmad
*
Lex syntax
P1 {A1}
P2 {A2}
….
Pn {An}
%%
Example::
%%
[\t] ;
[0-9]+ printf(“found an intereger:%s\n”,yytext);
%%
31
Hathal & Ahmad
*
Lex syntax
%%
Void main()
{
yylex();
}
32
Hathal & Ahmad
*
Lex variables
33
Hathal & Ahmad
*
Lex Functions
34
Hathal & Ahmad
*
Lex
Lex predefined variables.
35
Hathal & Ahmad
*
The word counting program
The definition section for our word count example is:
%{
unsigned charCount = 0, wordCount = 0, lineCount = 0;
%}
word [^ \t\n]+
eol \n
Hathal & Ahmad
36
*
The word counting program
sample word count’s rules section:
%%
{word} { wordCount++; charCount += yyleng; }
{eol} { charCount++; lineCount++; }
. charCount++;
Hathal & Ahmad
37
*
The word counting program
%%
main()
{
yylex();
printf("%d %d %d\n",lineCount, wordCount, charCount);
}
Hathal & Ahmad
38
*
The word counting program
main(argc,argv)
int argc;
char **argv;
{
if (argc > 1) {
FILE *file;
file = fopen(argv[1], "r");
if (!file) {
fprintf(stderr,"could not open %s\n",argv[1]);
exit(1);
}
yyin = file;
}
yylex();
printf("%u %u %u\n",charCount, wordCount, lineCount);
return 0;
}
Hathal & Ahmad
39
*
Parsing a command line
%{
unsigned verbose;
char *progName;
%}
%%
-h |
"-?" |
-help { printf("usage is: %s [-help | -h | -? ] [-verbose | -v] "
"[(-file| -f) filename]\n", progName);
}
-v |
-verbose { printf("verbose mode is on\n"); verbose = 1; }
%%
main(argc, argv)
int argc;
char **argv;
{
progName = *argv;
yylex();
}
Hathal & Ahmad
40
*
Parsing a command line
Hathal & Ahmad
41
*
Parsing a command line
Hathal & Ahmad
42
*
Start state
Hathal & Ahmad
43
*
Start state
Example1 :
Hathal & Ahmad
44
%s MAGIC
%%
<MAGIC>.+ { BEGIN 0; printf("Magic:"); ECHO; }
Magic BEGIN MAGIC;
%%
main()
{
yylex();
}
Example 2:
%s MAGIC
%%
magic BEGIN MAGIC;
.+ ECHO;
<MAGIC>.+ { BEGIN 0; printf ("Magic:"); ECHO; }
%%
main()
{
yylex();
}
*
Yacc
45
Hathal & Ahmad
*
Yacc
Grammar
statement NAME = expression
expression NUMBER + NUMBER | NUMBER − NUMBER
Recursive Rules
expression 🡪NUMBER
| expression + NUMBER
| expression − NUMBER
46
Hathal & Ahmad
*
Yacc
47
Hathal & Ahmad
*
Shift/Reduce Parsing
48
Hathal & Ahmad
*
Yacc
49
Hathal & Ahmad
*
What Yacc cannot parse
phrase cart_animal AND CART
| work_animal AND PLOW
cart_animal HORSE | GOAT
work_animal HORSE | OX
If we changed the first rule to this:
phrase cart_animal CART
| work_animal PLOW
50
Hathal & Ahmad
*
Yacc parser
... definitions ...
%%
... rules ...
%%
... subroutines ...
51
Hathal & Ahmad
*
Yacc
%%
52
Hathal & Ahmad
*
Symbol values and Actions
;
53
Hathal & Ahmad
*
The Lexer
54
Hathal & Ahmad
*
Running Simple Parser
55
Hathal & Ahmad
*
Arithmetic Expressions and Ambiguity
56
Hathal & Ahmad
*
Arithmetic Expressions and Ambiguity
57
Hathal & Ahmad
*
Arithmetic Expressions and Ambiguity
58
Hathal & Ahmad
Explicitly by:
%left '+' '-‘
%left '*' '/‘
%nonassoc UMINUS
When not to use prescedence rule??
two situations: in expression grammars, and to resolve the “dangling else” conflict in grammars for if-then-else language con‐ structs.
*
Variables and Typed Tokens
59
Hathal & Ahmad
*
Variables and Typed Tokens
60
Hathal & Ahmad
*
Variables and Typed Tokens
61
Hathal & Ahmad
To define the possible symbol types, in the definition section we add a %union decla‐ ration:
%union { double dval;
int vblno;
}
Here is the y.tab.h generated from this grammar: #define NAME 257
#define NUMBER 258
#define UMINUS 259
typedef union {
double dval;
int vblno;
}YYSTYPE;
extern YYSTYPE yylval;
*
Symbol Table
Hathal & Ahmad
62
*
Symbol Table
Hathal & Ahmad
63
*
Symbol Table
Hathal & Ahmad
64
*
Functions and reserve words
Hathal & Ahmad
65
*
Reserve words in symbol table
struct symtab {
char *name;
double (*funcptr)();
double value;
} symtab[NSYMS];
Hathal & Ahmad
66
*
Reserve words in symbol table
Hathal & Ahmad
67
*
Reserve words in symbol table
Hathal & Ahmad
68
*