1 of 28

SAT-based Analysis of

Large Real-world Feature Models is Easy

July 23, 2015

SPLC 2015, Nashville, TN

Jia Liang, Vijay Ganesh

Krzysztof Czarnecki, Venkatesh Raman

2 of 28

Feature Models

  • Popular in product lines
  • Want to know…
    • … what is a valid configuration?
    • … what is a valid configuration with features A, B, C and without features X, Y, Z?
    • … what is a valid configuration where X is optimal?

2

3 of 28

SAT-based Analysis

  • Feature models are easy to convert to SAT
    • Including cross-tree constraints

  • Fast in practice
    • But Boolean satisfiability is NP-complete…
    • Real-world feature models can have 62000+ Boolean variables and 270000+ clauses

3

4 of 28

Feature model to CNF and Back

4

Feature Model

(6000+ features)

Propositional formula

CNF

(62000+ variables)

(270000+ clauses)

Variable Assignment

Product

Tseitin transformation

CDCL

SAT Solver

a or ¬b or c

b or ¬c or d

5 of 28

Problem Statement

Q: Why is CDCL SAT-based analysis so effective in analyzing large real-world feature models?

5

6 of 28

Modern CDCL Features?

Q: Is the SAT-based effectiveness on analyzing real-world feature models attributable to modern CDCL features?

“Empirical study of the anatomy of modern SAT solvers”,

Katebi et al., SAT 2011

6

7 of 28

Modern SAT Solvers

Empirical Study of the Anatomy of Modern SAT

Solvers (2011) identifies 4 main features:

  • Conflict-driven clause learning
  • Random restarts
  • 2-watched literals (data-structure)
  • Conflict-based adaptive branching

7

8 of 28

Without Modern Features

8

9 of 28

Few Backtracks

Q: Why do feature models cause very backtracks in CDCL solvers?

9

10 of 28

Variable Branching

10

a

You are here

false

true

exit

exit

11 of 28

Unrestricted Variable

Either extension (true or false) leads to a satisfying assignment.

11

b

false

true

exit

exit

12 of 28

Restricted Variable

One extension (true xor false) leads to a satisfying assignment.

12

exit

exit

a

false

true

13 of 28

Unsatisfying Assignment

Partial assignment is unsatisfiable.

13

c

false

true

exit

exit

14 of 28

Linux Model Variables

14

15 of 28

Lots of exits

15

16 of 28

Hard Feature Models?

Q: Are SAT problems derived from realistic feature models always tractable?

“Automated generation of computationally hard feature models using evolutionary algorithms”,

Segura et al., Expert Systems with Applications 2013

“SAT-based analysis of feature models is easy”,

Mendonca et al., SPLC 2009

16

17 of 28

Always easy?

… No, if CTC allowed for disjunctions of length 3

17

HARD CNF

  • a or ¬b or c
  • b or ¬c or d
  • ¬a or b or ¬d

a

b

c

d

HARD CTC

  • a or ¬b or c
  • b or ¬c or d
  • ¬a or b or ¬d

18 of 28

How much is easy/hard?

Q: How much of feature models can be simplified in polynomial time?

18

19 of 28

How much is hard?

19

Feature Model

(62000+

Boolean variables)

Worst-case

Exp-time

Worst-case

Poly-time

Simp Model

(~50

Boolean variables)

20 of 28

Simplification

  • Input: a CNF formula
  • Output: a smaller but equisatisfiable (simplified) CNF formula
  • We only consider worst-case poly-time simplifications.

20

21 of 28

Equivalent Variable Substitution

  • xy
  • yx

Replace with x and y with a fresh variable z

21

22 of 28

Boolean Constraint Propagation

  • x
  • x or y
  • ¬x or z

Set x=true.

  • x
  • x or y
  • z

22

23 of 28

Simplifications

  • Equivalent variable substitution
  • Subsumption
  • Self-Subsuming Resolution
  • Variable elimination
  • Asymmetric Branching
  • RCheck
  • Boolean Constraint Propagation

23

24 of 28

Post Simplification

24

25 of 28

Horn

25

  • Random 3CNF
  • 200 variables
  • 850 clauses

26 of 28

Treewidth Complexity

Q: Does treewidth characterize the complexity of real-world feature models?

“Measuring the Structural Complexity of Feature Models”,

Pohl et al., ASE 2013

26

27 of 28

Tree-width

27

Spearman’s rank correlation coefficient between SAT solving time and lower/upper bound of treewidth: 0.257

28 of 28

Summary

  • Unrestricted variables aplenty
  • Lots of solutions
  • Modern SAT solver features unnecessary
  • Feature models mostly poly-time solvable
  • Can construct “realistic” hard feature models

QUESTIONS?

28