1 of 63

�Program Synthesis�A Tutorial

Yingfei Xiong

Peking University

ISSTA Summer School 2019

2 of 63

Can grandmas program?

  • The development of programming languages is to raise the level of abstraction

2

Level of

Abstraction

What is the next?

Haskell (1990), Prolog (1972)

Java

C

Assembly

3 of 63

Why cannot?

  • Programming languages come with many guarantees
    • Well-typed programs are guaranteed to compile
    • Compiled programs have clear, well-defined semantics

  • It is difficult to further raise the level of abstraction

3

Program

Compile

Executable

4 of 63

Program Synthesis saves grandmas

  • Generate a program from a specification
    • Specification can be fuzzy
    • Generation is not guaranteed

4

Specification

Synthesize

Program

One of the most central problems in the theory of programming.”

----Amir Pneuli

Turing Award Recipient

The fundamental way to improve software productivity.”

----Jiafu Xu

Founder of Software Research in China

5 of 63

History of Program Synthesis

5

1957

    • Start of program synthesis
    • Circuit synthesis problem by Alonzo Church

Before 2000

    • Deductive Synthesis

After 2000

    • Inductive Synthesis

6 of 63

Application – Data Wrangling

6

7 of 63

Application – Superoptimization

7

i=round(i);

a = 6755399441055744.0;

i=(i+a)-a;

8 of 63

Application – Reducing Duplicated Programming

8

9 of 63

Application – Program Repair

9

Synthesize an expression to replace the buggy one

10 of 63

Application – Testing

10

Synthesize a unit test to cover a path

11 of 63

Application – Analysis

11

Synthesize a strategy for a class of problems

SMT Solver

Apply Tactic 1

If formula is long

Apply Tactic 2

Else

Apply Tactic 3

Strategies

12 of 63

Defining Program Synthesis

12

Classic Synthesis

    • Input:
      • A specification
    • Output: A program that
      • meets the specification

Program Optimization

    • Input:
      • A specification
      • A cost function
    • Output: A program that
      • meets the specification, and
      • maximizes the cost function

Program Estimation

    • Input:
      • A specification
      • A dataset for target distribution
    • Output: A program that
      • meets the specification and
      • maximizes the probability represented by the dataset

Test Generation

Superoptimization

Program Repair

13 of 63

This Lecture

13

Classic Synthesis

    • Problem Definition
    • Enumerative
    • Presentation-based
    • Constraint-based

Program Estimation

    • Problem Definition
    • Estimating Probabilities
    • Locating the most-likely one

14 of 63

SyGuS: Syntax-Guided Synthesis

  •  

14

15 of 63

Example: max

  • Grammar:

  • Specification:

  • Expected answer:ite (x <= y) y x

15

16 of 63

SyGuS format: Synth-Lib

  • Synth-Lib uses a format similar to SMT-Lib
    • http://sygus.seas.upenn.edu/files/SyGuS-IF.pdf

16

(set-logic LIA)

(synth-fun max2 ((x Int) (y Int)) Int

((Start Int (x y

(+ Start Start)

(ite StartBool Start Start)))……))

(declare-var x Int)

(declare-var y Int)

(constraint (>= (max2 x y) x))

……

(check-synth)

17 of 63

Program Synthesis as a Search Problem

17

Generate

Verify

Q1: How to generate the next program to be verified?

Q2: How to verify the correctness?

a program

correct

incorrect

18 of 63

Q1: How to verify correctness?

  •  

18

Fast

Slow

Can we combine

the two?

19 of 63

CEGIS: Counter-Example Guided Inductive Synthesis

  • Constraint solvers give counter-examples
  • Save counter-examples as tests
  • First use tests to validate programs

19

Generate

Verify

Program

Counter-

Example

No Counter-

Example

20 of 63

Q2: How to generate the next program to be verified?

  • Enumerative – exhaustive search
  • Representation-based – manipulate sets of programs instead of single programs
  • Constraint-based – convert to an SMT problem

20

21 of 63

Top-Down Enumeration

  • Expand according to the grammar
    • Expr
    • x, y, Expr+Expr, if(BoolExpr, Expr, Expr)
    • y, Expr+Expr, if(BoolExpr, Expr, Expr)
    • Expr+Expr, if(BoolExpr, Expr, Expr)
    • x+Expr, y+Expr, Expr+Expr+Expr, if(BoolExpr, Expr, Expr)+Expr, if(BoolExpr, Expr, Expr)

21

22 of 63

Bottom-Up Enumeration

  • Combine expressions from small to big
    • size=1
      • x, y
    • size=2
    • size=3
      • x+y
    • size=4
    • size=5
      • x+(x+y), (x+y)+y
    • size=6
      • if(x<=y, x, y), …

22

23 of 63

Optimization

  • Discard a partial program early

  • Pruning
    • None of the expansions could satisfy the specification
    • Ite BoolExpr x x

  • Equivalence reduction
    • Equivalent to a previous program
    • Expr+x, x+Expr

23

24 of 63

Pruning

  • Generate constraints from the partial program

  • Generate constraints from each test

24

Ite BoolExpr x x

(declare-fun boolExpr () Int)

(declare-fun max2 ((x Int) (y Int)) Int

(ite boolExpr x x))

max2(1,2)=2

(assert (= (max2 1 2) 2))

(check-sat)

Needs to balance between the benefit and the cost.

25 of 63

Equivalence reduction: How to determine equivalence?

  •  

25

26 of 63

How to generate the next program to be verified?

  • Enumerative – exhaustive search
  • Representation-based – manipulate sets of programs instead of single programs
  • Constraint-based – convert to an SMT problem

26

27 of 63

Representation-based

  • Enumerative approaches manipulates single programs
    • Inefficient: too many in number
  • Can we manipulate sets of programs? e.g.
    • Find a set that satisfies a specification
    • Intersects sets for a conjunction of specifications
    • Combine sets with program constructs to satisfy more complex specifications
  • Representation-based
    • Use data structures to represent such a set
    • E.g. Grammars, Automata, Logic Formulas

27

28 of 63

FlashMeta: Basic Idea

  • Grammar is a representation of sets
    • Size of a grammar = O(log(#Represented Program))
  • The original grammar is too coarse-grained

  • Idea: Annotate a non-terminal with a synthesis goal
    • [2]Expr – expressions that evaluates to 2

28

29 of 63

FlashMeta: Single Test

  •  

29

30 of 63

Intersection of grammars

  •  

30

31 of 63

FlashMeta: Multiple Tests

  • Produce a grammar for each test
  • Intersects the grammars

31

32 of 63

FlashMeta: Discussion

  •  

32

33 of 63

How to generate the next program to be verified?

  • Enumerative – exhaustive search
  • Representation-based – manipulate sets of programs instead of single programs
  • Constraint-based – convert to an SMT problem

33

34 of 63

Component-Based Program Synthesis

34

+

-

if

x

y

+

-

o1

o2

o3

o4

o5

o6

o7

i11

i12

i21

i22

i31

i32

i41

i42

i51

i53

i52

1

2

3

4

5

6

7

8

9

 

Components

Connection

Points

 

35 of 63

Generate constraints

  •  

35

 

36 of 63

This Lecture

36

Classic Synthesis

    • Problem Definition
    • Enumerative
    • Presentation-based
    • Constraint-based

Program Estimation

    • Problem Definition
    • Estimating Probabilities
    • Locating the most-likely one

37 of 63

Program Estimation

  •  

37

38 of 63

Program Estimation as an Search Problem

38

Generate

Estimate

Program

Probability

Enough?

Program with the highest probability

 

39 of 63

Learning to synthesis (L2S)

  • A general framework to address program estimation

  • Combining four tools
    • Rewriting rules: defining a search problem
    • Constraint solving: pruning off invalid choices in each step
    • Machine-learned models: estimating the probabilities of choices in each step
    • Search algorithms: solving the search problem

40 of 63

Example: Condition Completion

  • Given a program without a conditional expression, completing the condition

  • Useful in program repair
    • Many bugs are caused by incorrect conditions
    • Existing work could localize the faulty condition
    • Can we generate a correct condition to replace the incorrect one?

40

 

Space of Conditions

41 of 63

Q3: Estimating the Probability

  • Idea: Using machine learning
    • To train over a set of programs and their contexts
  • Problem: machine learning usually works for classification problems
    • where the number of classes are usually small
  • Idea: turn the generation problem into a set of classification problem along the grammar

41

42 of 63

Decomposing Generation

  • In each step, we estimate the probabilities of the rules to expand the left-most non-terminal
    • A classification problem

42

E

E

>12

hours

E

E

>12

E

Expand E with E -> E “> 12”

Expand E with

E -> “hours”

43 of 63

Probability of the program

  •  

43

E

E

>12

…;if(

) throw new ArgException();

 

 

 

 

44 of 63

Training models

  • Train a model for each non-terminal
    • to classify rules expanding this non-terminal
  • Training set preparation
    • The original training set:
      • A set of programs
      • Their contexts
    • Decomposing the training set:
      • Parse the programs
      • Extract the rules chosen for each non-terminal

44

45 of 63

Feature Engineering

  •  

45

E

E

>12

…;if(

) throw new ArgException();

 

 

 

 

46 of 63

Can we use a different expansion order?

46

  • Top-down

  • Bottom-up

E

E

>12

hours

hours

E

hours

The order may greatly affect the performance of L2S.

E

E

>12

hours

E

E

>12

E

47 of 63

Annotations

  •  

48 of 63

From Grammar to Rewriting Rules

Grammar

Top-down Rules

Bottom-up Rules

Creation Rules

// starting from the root

// starting from a middle node

// starting from a leaf

Ending Rule

49 of 63

Example

  • Top-down

  • Bottom-up

E

E

>12

hours

E

 

>12

 

 

E

>12

hours

 

 

hours

 

 

 

 

 

E

E

>12

hours

 

 

50 of 63

Unambiguity

  •  

50

51 of 63

Q4: How to find the most probable program?

  •  

51

 

 

>12

hours

 

 

>0

value

 

 

 

0.6 * 0.2

= 0.12

0.3 * 0.8

= 0.24

52 of 63

Use Metaheuristic Search

  • Beam Search:
    • Keep n most probable partial programs
    • Expand the programs to get new programs
  • Genetic Search:
    • Keep n most probably complete programs
    • Mutate the programs to get new programs

52

53 of 63

Applications

  • Application 1:
    • Repairing Conditional Expressions
  • Application 2:
    • Generating Code from Natural Language Expression

53

54 of 63

Repairing Conditional Expressions

  • Condition bugs are common

  • Steps:
    1. Localize a buggy if condition with SBFL and predicate switching
    2. Synthesize an if condition to replace the buggy one
    3. Validate the new program with tests

Missing boundary checks

- if (hours >= 24)

+ if (hours > 24)

withinOneDay=true;

Conditions too weak or too strong

lcm = Math.abs(a+b);

+ if (lcm == Integer.MIN_Value)

+ throw new ArithmeticException();

55 of 63

L2S Configuration

  • Rewriting rules
    • Bottom-up
    • Estimate the leftmost variable first
  • Machine learning
    • Xgboost
    • Manually designed features
  • Constraints
    • Type constraints & size constraints
  • Search algorithm
    • Beam search

55

56 of 63

Results

56

Also repaired 8 unique bugs that have never been repaired by any approach.

Benchmark: Defects4J

57 of 63

Generating Code from Natural Language Expression

  • Can we generate code automatically to avoid repetitive coding?
  • Existing approaches use RNN to translate natural language descriptions to programs
    • Long dependency problem: work poorly on long programs

57

58 of 63

L2S Configuration

  • Rewriting rules
    • Top-down
  • Machine learning
    • A CNN-based network
  • Constraints
    • Size constraints
  • Search algorithm
    • Beam search

58

59 of 63

A CNN-based Network Architecture

59

60 of 63

Results

60

Benchmark: HearthStone

61 of 63

Newest Results

  • Replacing CNN with Transformer
    • Transformer: a new neural architecture at 2017
    • The flexibility of L2S allows to easily utilize new models

61

62 of 63

Future Learning

  • Surveys:
    • Sumit Gulwani, Oleksandr Polozov, Rishabh Singh: Program Synthesis. Foundations and Trends in Programming Languages 4(1-2): 1-119 (2017)
    • Rajeev Alur, Rastislav Bodík, et al.: Syntax-guided synthesis. FMCAD 2013: 1-8
  • Tools:
    • sygus.org – the SyGuS competition, a good place to look at
    • Some tools we recently used
      • EUSolver
      • CVC4
      • Second-Order Solver
  • Course:
    • Program Synthesis by Nadia Polikarpova@UCSD
    • https://github.com/nadia-polikarpova/cse291-program-synthesis/

62

63 of 63

Reference

  • Enumerative
    • Sumit Gulwani, Oleksandr Polozov, Rishabh Singh: Program Synthesis. Foundations and Trends in Programming Languages 4(1-2): 1-119 (2017)
    • Rajeev Alur, Rastislav Bodík, et al.: Syntax-guided synthesis. FMCAD 2013: 1-8
  • FlashMeta
    • Oleksandr Polozov, Sumit Gulwani: FlashMeta: a framework for inductive program synthesis. OOPSLA 2015: 107-126
  • Componen-Based Program Synthesis
    • Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, Ashish Tiwari: Oracle-guided component-based program synthesis. ICSE (1) 2010: 215-224
  • L2S
    • Yingfei Xiong, Bo Wang, et al.: Learning to Synthesize. GI'18: Genetic Improvment Workshop, May 2018

63