Lex & Yacc
Lex
Lex
Lex
Lex
Pattern Matching Primitives
Lex
Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
Lex
Lex predefined variables.
Lex
Yacc
Yacc
... definitions ...
%%
... rules ...
%%
... subroutines ...
Yacc
yacc& lex in Together
program -> program expr | ε
expr -> expr + expr | expr - expr | id
Lex file
Yacc file
Linking lex&yacc
lex & yacc
Authors
The Simplest Lex Program
%%
. |\n ECHO;
%%
punctuations or other characters.
Recognizing Words with Lex
Word recognizer:
%{
/* The sample program demonstrates a verb or not a verb
*/
%}
%%
[\t]+ /* ignore whitespace*/;
is |
am |
are |
were |
was |
be |
being |
been |
do |
does |
can |
could |
has |
have |
had |
go { printf ("%s: is a verb\n”, yytext) ; }
[a-zA-Z]+ { printf("%s: is not a verb\n”, yytext); }
. | \n { ECHO; /* normal default anyway */ }
%%
main ( )
{
yylex( );
}
Input: did i have fun?
Output: did:is a verb
i: is not a verb
have: is a verb
fun:is not a verb
?
Symbol Tables
applications.
names used in program in its symbol table.
1.type of symbol
2.variable name or type
3.reserve word
4.constant name
5.declaration scope
1.variable assigned before referenced.
2.variable not declared multiple times.
Example: Lexer with symbol table
% {
/*
* Word recognizer with a symbol table.
*/
%}
Enum {
LOOKUP=0, /* default - looking rather than defining. */
VERB,
ADJ,
ADV,
NOUN,
PREP,
PRON,
CONJ,
};
int state;
int add_word(int type, char *word) ;
int lookup_word (char *word) ;
%}
add_word()->puts new word to symbol table.
lookup_word()->looks up a word which should already be entered.
Grammars
noun verb
noun verb noun
The notation for describing grammar is->
Which means particular set of tokens can be replaced by new symbol .*
subject->noun/pronoun
indicates subject can either be noun or pronoun
object->noun
Sentence->subject verb object
Example: teacher teaches students.
Parser-Lexer Communication
A Yacc Parser(Yet Another Compiler Compiler)
The structure of a yacc parser is similar to that of a lex lexer. First section, the definition section, has a literal code block, enclosed in "%{" and "%}". We use it here for a C comment (as with lex, C comments belong inside C code blocks, at least within the definition section) and a single include file.
Then come definitions of all the tokens we expect to receive from the lexical analyzer.
Universal custom dictates that token names be all uppercase and other names in the parser mostly or entirely lowercase.
The first %% indicates the beginning of the rules section. The second %% indicates the end of the rules and the beginning of the user subroutines section.
The most important subroutine is main() which repeatedly calls yyparse() until the lexer's input file runs out.
The routine yyparse() is the parser generated by yacc, so our main program repeatedly tries to parse sentences until the input runs out.Yacc is a tool for automatic generation of
parser.
The Rules Section
The rules section describes the actual grammar as a set of production rules or simply rules. (Some people also call them productions.) Each rule consists of a single name on the left-hand side of the ":" operator, a list of symbols and action code on the right-hand side, and a semicolon indicating the
end of the rule. By default, the first rule is the highest-level rule.
That is, the parser attempts to find a list of tokens which match this initial rule, or
more commonly, rules found from the initial rule. The expression on the right-hand side of the rule is a list of zero or more names. A typical simple rule has a single symbol on the right-hand side as in the object rule which is defined to be a NOUN. The symbol on the left-hand side of the rule can
then be used like a token in other rules.
Running Lex and Yacc
$ gedit filename –l
type Pgm & save
$ lex filename.l
$ cc lex.yy.c -ll
$ ./a.out
Executing lex and Yacc together
$ gedit filename.l
$ gedit filename.y
$ lex filename.l
$ yacc –d filename.y
$ cc lex.yy.c y.tab.c -ll -ly
$ ./a.out
Lex vs. Hand-written Lexers
to its length.
For example program refer text book.
Regular Expressions