CirFix: Automated Hardware Repair and its�Real-World Applications
Priscila Santiesteban, Yu Huang, Westley Weimer, Hammad Ahmad
Preliminary Exam Presentation
January 20th, 2023
Title Slide 2
Debugging = Expensive & Time-Consuming
2
“Debugging, on average, has grown to consume more than 60% of today’s ASIC and SoC verification effort.”
-Harry Foster, Mentor Graphics Corporation
What is a Good Solution to Debugging?
3
1. Fix Bugs
2. Localize bugs
3. But also be liked and useful to users!
(1) Motivation for Automated Fixing�Automatic Program Repair
4
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
(1) Motivation for Automated Fixing�Automatic Program Repair
5
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
Faulty program w/ bug(s)
Test suite w/ at least one failing test
Fault localizing
Patch
Validation
Repaired program
No Repairs Found
OR
“Generate and Validate”
Evaluating
(1) Motivation for Automated Fixing�Automatic Program Repair
6
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
(1) Motivation for Automated Fixing�Automatic Program Repair
7
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
(1) Motivation for Automated Fixing�Automatic Program Repair
8
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
(1) Motivation for Automated Fixing�Automatic Program Repair
9
Definition: Automatic Program Repair (or APR) is a technique that identifies a patch for a given bug
However, APR does not work for hardware as it currently stands
What is a Good Solution to Debugging?
10
1. Fix Bugs
2. Localize bugs
3. But also be liked and useful to programmers!
(2) Motivation for Localizing Bugs�Fault Localization
11
Definition: Fault Localization (FL) is a technique used to help determine which lines in the code could be responsible for a given bug
(2) Motivation for Localizing Bugs�Fault Localization
12
Definition: Fault Localization (FL) is a technique used to help determine which lines in the code could be responsible for a given bug
(2) Motivation for Localizing Bugs�Fault Localization
13
Definition: Fault Localization (FL) is a technique used to help determine which lines in the code could be responsible for a given bug
(2) Motivation for Localizing Bugs�Fault Localization
14
Definition: Fault Localization (FL) is a technique used to help determine which lines in the code could be responsible for a given bug
However, FL does not work for hardware as it currently stands
What is a Good Solution to Debugging?
15
1. Fix Bugs
2. Localize bugs
3. But also be liked and useful to users!
(3) Motivation for Human Relevance�Users Subjective Judgment
16
Summary of Motivation
A good solution may help….
17
1. Fix Bugs
2. Localize bugs
3. But also be liked and useful for users!
What are Some Desired Properties?
Issue – APR does not work for circuits, but we need to alter it to guide the search
Issue – Existing methods cannot handle parallelism and rely on knowing which lines were executed
Issue - Automated test case approaches do not tell us how humans feel
Different tools are useful in many contexts
18
Circuit Design Example:�Gate Level Modeling in Verilog
19
Circuit Design Example:�Gate Level Modeling in Verilog
20
Circuit Design Example:�Gate Level Modeling in Verilog
21
Proposed Solution
22
Introducing CirFix: High-level
23
CirFix: A hardware-design focused automated repair algorithm based on genetic programming (GP)
Introducing CirFix: High-level
24
CirFix: A hardware-design focused automated repair algorithm based on genetic programming (GP)
Fitness Function in APR
25
Test suite with two failing tests
tc0: pass tc4: fail
tc1: pass tc5: pass
tc2: fail tc6: pass
tc3: pass tc7: pass
Fitness = 0.75
(6 passing, 2 failing tests)
Fitness Function in APR
Compiler version N-2017.12-SP2-1_Full64; Runtime version N-2017.12-SP2-1_Full64;
time, clk, reset, enable, count_out, overflow_out
0, 0, 0, 0, x, x
5, 1, 0, 0, x, x
...
250, 0, 0, 1, 5, 1
255, 1, 0, 1, 5, 1
256, 1, 0, 1, 6, 1
$finish called from file "first_counter_tb_t3.v", line 70.
$finish at simulation time 258
26
Test suite with two failing tests
tc0: pass tc4: fail
tc1: pass tc5: pass
tc2: fail tc6: pass
tc3: pass tc7: pass
Fitness = 0.75
(6 passing, 2 failing tests)
Fitness = ???
Fitness Function Modified for Hardware
27
Introducing CirFix: High-level
28
CirFix: A hardware-design focused automated repair algorithm based on genetic programming (GP)
Fault localization for Hardware
29
AST for circuit design, simulation output, circuit oracle
Comparison of output wire values between simulation and oracle to get identifier names with output mismatch
Fixed point analysis of assignments to output wires and registers
Set of implicated AST nodes
Fixed Point Analysis of Assignments
30
Returns a set of wire/register names that have output mismatch
Two ways to be implicated:
(e.g., count <= 1’b1;)
(e.g., if(reset==1’b1) count <= 1’b0;)
Fault Localization Algorithm (Fixed Point Analysis)
More on Methodology in the Paper!
31
Introducing CirFix: High-level
32
CirFix: A hardware-design focused automated repair algorithm based on genetic programming (GP)
Experimental Setup
RQ #1. What fraction of defect scenarios can CirFix repair?
RQ #2. Does the fault localization scale?
RQ #3.a. Does the fault localization algorithm improve designer’s objective performance?
RQ #3.b. In what context do designers find the fault localization helpfull?
33
Experimental Setup
Research Questions of Focus:
Can we fix bugs?
Can we find bugs?
Do designers like it?
Experimental Setup
RQ #1. What fraction of defect scenarios can CirFix repair?
RQ #2. Does the fault localization scale?
RQ #3.a. Does the fault localization algorithm improve designer’s objective performance?
RQ #3.b. In what context do designers find the fault localization helpfull?
34
Experimental Setup
Research Questions of Focus:
Can we fix bugs?
Can we find bugs?
Do designers like it?
35
A defect scenario consists of :
Benchmark Suite of Defect Scenario
Benchmark Suite of Defect Scenario
36
A defect scenario consists of :
Benchmark suite: Hardware Projects
37
Project | Description | LOC |
decoder_3_to_8 | 3-to-8 decoder | 25 |
counter | 4-bit counter with an overflow bit | 56 |
flip_flop | T-flip flop | 16 |
fsm_full | Finite state machine | 115 |
lshift_reg | 8-bit left shift register | 30 |
mux_4_1 | 4-to-1 multiplexer | 19 |
i2c | Two-wire, bidirectional serial bus for data exchange | 2018 |
sha3 | Cryptographic hash function | 499 |
tate_pairing | Core for running the Tate bilinear pairing algorithm for elliptic curves | 2206 |
reed_solomon_decoder | Core for Reed-Solomon error correction | 4366 |
sdram_controller | Synchronous DRAM (SDRAM) memory controller | 420 |
Introductory-level VLSI course
Projects
OpenCores projects
Open-source GitHub project
Benchmark Suite Defect Seeding
38
Recruited three hardware experts to seed defects into circuits
Two categories of defects
32 defect scenarios in benchmark suite
RQ1: What fraction of defects can CirFix Repair?
39
Result: CirFix found 21/32 (65.6%) plausible repairs
16/32 (50.0%) correct repairs (via manual inspection)
Experimental Setup
RQ #1. What fraction of defect scenarios can CirFix repair
RQ #2. Does the fault localization scale?
RQ #3.a. Does the fault localization algorithm improve designer’s objective performance?
RQ #3.b. In what context do designers find the fault localization helpfull?
40
Experimental Setup
Research Questions of Focus:
Can we fix bugs?
Can we find bugs?
Do designers like it?
Why do we care about the scalability of the fault localization?
One Threat to Validity in Our CirFix’s Experiments:
41
Why do we care about the scalability of the fault localization?
One Threat to Validity in Our CirFix’s Experiments:
42
Pointer/Alias Analysis
Why do we care about the scalability of the fault localization?
One Threat to Validity in Our CirFix’s Experiments:
43
Pointer/Alias Analysis
Review on CirFix’s Fault Localization
44
Fixed Point Analysis Assignment
Returns a set of wire/register names that have output mismatch
Two ways to be implicated:
Example Initial Mismatch List :
{wire A, wire B}
Example Final Mismatch List :
{wire A, wire B, wire C, register A}
Introducing Noise to the Fault Localization
45
Example Initial Mismatch List (R = 0) :
{wire A, wire B}
Introducing Noise to the Fault Localization
46
NEW Example Initial Mismatch List (R >0):
{wire A, wire B,
wire C, register f}
Introducing Noise to the Fault Localization
47
'F', 'reset’}
NEW Example Initial Mismatch List (R >0):
{wire A, wire B,
wire C, register f}
Example Final Mismatch List (R = 0) :
{wire A, wire B, wire C, register A}
Introducing Noise to the Fault Localization
48
'F', 'reset’}
NEW Example Final Mismatch List (R >0) :
{wire A, wire B, wire C, register A, register f}
NEW Example Initial Mismatch List (R >0) :
{wire A, wire B,
wire C, register f}
RQ2: Does CirFix Fault Localization Scale?
49
Answer: CirFix may scale to larger real-world projects
Why?
21/21 (100%) plausible repairs at 25% additional noise
19/21 (90.4%) plausible repairs at 50% additional noise
20/21 (95.2%) plausible repairs at 75% additional noise
Experimental Setup
RQ #1. What fraction of defect scenarios can CirFix repair?
RQ #2. Does the fault localization scale?
RQ #3.a. Does the fault localization algorithm improve designers’ objective performance?
RQ #3.b. In what context do designers find the fault localization helpfull?
50
Experimental Setup
Research Questions of Focus:
Can we fix bugs?
Can we find bugs?
Do designers like it?
Human Study Experiment Setup
51
A debugging task consists of :
UM IRB- HUM00199335
Experimental Design: Debugging task
52
Defect Scenario Snippet
Design Description
Debugging Hints
(Full, Partial, None)
UM IRB- HUM00199335
Experimental Design: Debugging task
53
Defect Scenario Snippet
Design Description
Debugging Hints
(Full, Partial, None)
UM IRB- HUM00199335
Experimental Design: Debugging task
54
Defect Scenario Snippet
Design Description
Debugging Hints
(Full, Partial, None)
UM IRB- HUM00199335
Experimental Design: Survey Questions
55
What line(s) in the circuit design are responsible for the bug?
How would you fix the bug? Indicate the line number you are changing
How accurate did you find the debugging information for this bug?
How useful did you find the debugging information for this bug?
Objective Performance
(Precision, Recall, F-Score)
Subjective Judgment
(Likert Scale Ratings)
UM IRB- HUM00199335
Experimental Design: Recruitment
Recruited participants (n=33) with the following characteristics:
(less than a month – 2+ years, median = 4 months – 1year )
56
UM IRB- HUM00199335
RQ3.a: Does the FL algorithm improve designer’s objective performance?�
57
Answer: CirFix’s FL produced no significant improvement in designer’s objective performance
Why?
RQ3.b: In what context do designers find the FL helpful?
58
Answer: Designers rate CirFix’s FL helpful for debugging multi-line defects in pedagogical contexts
Why?
Conclusion
Problem: Debugging is Expensive and Time-Consuming
59
Conclusion
Problem: Debugging is Expensive and Time-Consuming
60
A good solutions consists of:
Conclusion
Problem: Debugging is Expensive and Time-Consuming
61
A good solutions consists of:
State of the art in Software:
Conclusion
Problem: Debugging is Expensive and Time-Consuming
62
A good solutions consists of:
Issues with State of the art:
Not Applicable to Hardware
Conclusion
Problem: Debugging is Expensive and Time-Consuming
63
A good solutions consists of:
Proposed Solution:
Summary of Results
Does our solution have the properties of a good solution?
64
CirFix: Automated Hardware Repair and its Real-World Applications
Presented by: Priscila Santiesteban
Bonus Slides
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs:
where and correspond to the bth bit for time step ci in the simulated circuit and the oracle respectively
Oracle: a developer-provided information for circuit behavior
66
Simulation output
Oracle
Bit values of 0 or 1
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
67
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 |
Fitness Function: Comparison
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
68
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 0/4 = 0.0 |
Fitness Function: Comparison
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
69
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 0/8 = 0.0 2/10 = 0.2 |
Fitness Function: Comparison
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
70
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 0/8 = 0.0 2/10 = 0.2 4/12 = 0.33 |
Fitness Function: Comparison
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
71
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 0/8 = 0.0 2/10 = 0.2 4/12 = 0.33 6/14 = 0.43 |
Fitness Function: Comparison
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs
Oracle: a developer-provided information for circuit behavior
72
Oracle time, output A, output B 5 , x , x 15 , 0 , 0 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 0 , 0 | Simulation Result time, output A, output B 5 , x , x 15 , x , x 25 , 0 , 1 35 , 1 , 0 45 , 1 , 1 55 , 1 , 1 |
Fitness Score (sum/total_possible)
4/4 = 1.0 0/8 = 0.0 2/10 = 0.2 4/12 = 0.33 6/14 = 0.43 4/16 = 0.25 |
Fitness Function: Comparison
Implication for Fault Localization
A node in the AST is implicated when:
73
Repair Operators
74
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
1
0
0
0
1
0
1
1
0
1
1
1
0
0
1
1
0
1
0
1
1
0
0
1
Parents
Children
Mutation
1
0
1
1
0
1
1
1
0
1
1
0
1
0
1
1
0
0
0
Crossover
1
Repair Operators
75
Fix Patterns
Pre-identified templates that can be applied to aspects of design code to fix a defect
76
Defect Category | Fix Pattern Description |
Incorrect conditional | Negate the condition of a conditional block (e.g., reset == 1’b1 to reset != 1’b1) |
Incorrect sensitivity list | Trigger an always block on either the falling edge of a signal (e.g., always@(negedge clk)), or the rising edge of a signal (e.g., always@(posedge clk)), or when a signal is level (e.g., always@(clk)), or on any change to a variable within the block (e.g., always@*) |
Incorrect assignment | Change a blocking assignment to a non-blocking one (e.g., overflow = 1'b1; to overflow <= 1'b1;) and vice versa |
Off-by-one error | Increment or decrement the value of an identifier by 1 (e.g., counter <= #1 counter + 2; to counter <= #1 (counter + 2) - 1;) |
Fix Localization
Guidelines for “how to perform mutations to code”
Reduces the number of mutations resulting in syntactically invalid code
77
Insertion | Replacement |
Program statements (e.g., assignments, if-statements) are the only sources of inserted code Such statements are inserted into begin … end blocks only | An element in the design code replaced by another element of the same type (e.g. reset == 1’b1 to enable == 1’b1), or by an element sharing the same abstract type (e.g., counter + 1 to counter - 1) |
Abstract Syntax Trees (AST)
AST: a program represented as a tree data structure
78
block
counter_out
if-then-else
equals
reset
1’b1
assignment
4’b0000
if-then-else
#1
overflow_out
assignment
4’b1
#1
65
66
67
68
69
70
71
72
73
74
75
76
77
78
CirFix fitness function is highly effective at capturing incremental changes to design code to guide the search for repairs
79
Reported by original, off-the-shelf testbench
Reported by instrumented testbench and fitness function
RQ3. How effective is CirFix’s Fitness Function?
Generate and Validate: Genetic Algorithms
Goal: Evolve populations of individuals to meet a certain goal
80
Population initialization
Fitness calculation
Selection
Reproduction
Termination criteria met?
Return best individual
Y
N
Selection: Fitness-based Tournaments
The process of choosing a parent for reproduction to produce a child for the next generation
81
Fitness Value
Candidate
Select K candidates at random
Pick the best as parent
82
Software Programs vs. Hardware Descriptions
83
Software Program | Hardware Description |
Support design concepts like expressions, assignments, control structures, etc. | |
Compiled to executable binaries | Simulated using software for testing, synthesized into actual hardware for production |
Typically based around a serial execution model (i.e., one line of code executes before the next) | Inherently parallel in nature (think: parts of hardware can operate simultaneously) |
Use test suites to evaluate functional correctness, where individual test cases may pass or fail | Use testbenches to instantiate a DUT and manipulate input signals to simulate behavior and read output |
Fitness Function: Comparison
Bit-level comparison of instrumented wires and registers against an oracle to assess functional correctness of candidate repairs:
where and correspond to the bth bit for time step ci in the simulated circuit and the oracle respectively
Oracle: a developer-provided information for circuit behavior
84
Simulation output
Oracle
Bit values of 0 or 1