1 of 21

Datalog Parser

Project 2

2 of 21

Checking Syntax

3 of 21

Comments

  • Comments can appear anywhere
  • We want to ignore them
  • Easy solution: Modify Scanner class to not create COMMENT tokens

4 of 21

The Parser Class -- Interclass weaving

  • The Parser needs to use the Tokens from the Scanner
  • Therefore it must #include "Token.h"
  • But Token.h was already defined when processing Scanner

Token.h:13:7: error: redefinition of ‘class Token’

class Token {

^~~~~

  • Solution: #pragma once
    • Put at the top of every *.h file
    • DON'T put in any *.cpp files!
  • (You could also use #ifndef CLASS_NAME #define CLASS_NAME
    • End the *.h with #endif)

5 of 21

The Parser Class

  • Doesn't even need to know the Scanner exists (decomposition)
  • Does need access to the Tokens that come from the Scanner
  • Create a function to start the parsing. This might look like:
    • void parse();

6 of 21

From Grammar to Program

fact -> ID LEFT_PAREN STRING stringList RIGHT_PAREN PERIOD

  • Make a function for each nonterminal in the grammar
    • ( e.g. void fact() )
    • Inside that function
    • For a terminal, check if the next token has the right type
    • For a nonterminal, call the corresponding function
  • Inside the parse() function, call the function for the start symbol

7 of 21

Checking the next Token's type

  • Once you find a Token whose type does not fit the grammar, you must stop
  • Easiest solution: throw the token, and use a try-catch block in the parse() function

parse()

datalogProgram()

scheme()

headPredicate()

idList()

throw currentToken;

try {

...

} catch (Token error) {

...

}

8 of 21

Easy Matching

  • Create a function that checks if the next token has the provided type.
  • If so: advance to the next token
  • Else: throw that token

  • The parse function will catch any thrown tokens
    • If you can get through the try portion, print "Success!"
      • You'll also need to check that the END type is all that's left
    • If you end up in the catch portion, print "Failure!" and the token that was thrown

Failure!

(Q_MARK,"?",10)

9 of 21

When a nonterminal has more than one production

  • Many nonterminals only have one production
  • Others also have a lambda* production
    • * lambda is exactly the same as epsilon
    • schemeList -> scheme schemeList | lambda
  • One nonterminal has several productions

  • When there is more than one option, use an if-else statement
    • Consider the parse table setup
    • The if statement checks if the next token is what that production would start with
    • One if statement for every terminal that the nonterminal can become
  • If lambda is an option, don't use else
  • If lambda is not an option, you must throw the next token in the else statement

10 of 21

Test your syntax-checking code

  • Once you've made a function for every nonterminal of the grammar, and set up the match function and the try-catch block, you are ready to test your program against the test cases
  • Perhaps start off only checking for "Success!" or "Failure!"

11 of 21

Creating Objects

12 of 21

Using the Tokens

  • We want to take the Tokens and create actual structures to hold the data
  • Most of our data is focused on predicate objects
    • Name
    • List of parameters

Schemes:

snap(S,N,A,P)

HasSameAddress(X,Y)

Facts:

snap('12345','C. Brown','12 Apple','555-1234').

snap('33333','Snoopy','12 Apple','555-1234').

Rules:

HasSameAddress(X,Y) :- snap(A,X,B,C),snap(D,Y,B,E).

Queries:

HasSameAddress('Snoopy',Who)?

13 of 21

Predicates

  • Schemes, Facts, and Queries are just predicates
  • Rules are a way of creating relationships between predicates
    • A group of predicates (the body) can create new instances of a predicate (the head)

Rules:

HasSameAddress(X,Y) :- snap(A,X,B,C),snap(D,Y,B,E).

Head predicate

Body predicates

14 of 21

Parameters

Rules:

alpha(X,Y) :- beta(X,C,'Bob'),gamma(P,'Jim',Y).

  • A parameter can be a STRING or an ID
  • STRING and ID are just textual values

15 of 21

DatalogProgram

  • We need:
    • A vector of Predicates for the schemes
    • A vector of Predicates for the facts
    • A vector of Predicates for the queries
    • A vector of Rules for the rules
  • These four vectors need to be returned from the Parser for Lab 3 to use
  • So we need to create a container class called DatalogProgram that holds these four vectors

16 of 21

toString()

  • Create a toString() function for each class (DatalogProgram, Predicate, Rule, and Parameter)
  • These can call each other's toString() functions
  • If you are successful in checking the syntax, then cout the DatalogProgram object.toString()

17 of 21

Domain

  • The strings that appear in the Facts section of the Datalog Program
  • Common pitfall: including strings that appear in the Queries section
  • Go through the parameters in the fact predicates
  • set<string>

Domain(6):

'12 Apple'

'12345'

'33333'

'555-1234'

'C. Brown'

'Snoopy'

18 of 21

Modify the Parser to Populate these Classes

  • Now that you've made structures to hold the file's data, we need to change the Parser to make instances of them. Two good options:
  • Option A: make temporary variables as data members of Parser
  • Option B: make each function for a nonterminal in the grammar return what it is syntactically checking for
    • schemeList() returns vector<Predicate>
    • headPredicate() returns Predicate

19 of 21

Test your datalog-structuring code

  • Once you've modified your program to create the objects, you can test your code against the given test cases to match the exact output

Reminders:

  • No need to use pointers
  • Compile your code on the lab machines using the full compilation command:
    • g++ -g -Wall -Werror -std=c++17 *.cpp -o lab2

20 of 21

Common Pitfalls

  • The order must be: Schemes, then Facts, then Rules, then Queries
  • There must be at least one Scheme and at least one Query
  • There does not have to be any Facts or Rules

datalogProgram -> SCHEMES COLON scheme schemeList

FACTS COLON factList

RULES COLON ruleList

QUERIES COLON query queryList

21 of 21

Where to Start

  1. Create a Parser class
    1. Takes in Token objects
    2. Checks the syntax
    3. Run this to see if you get "Success!" or "Failure!" appropriately for the tests on the website
  2. Create the DatalogProgram, Predicate, Rule, and Parameter classes
    • Make sure you understand what makes up each of these classes
    • Look at the input and output of the examples on the website
  3. Modify your Parser to create instances of the wrapper classes
    • Test your code against the example tests on the website