1 of 17

CircuitVQA: A VQA Dataset for Electrical Circuit Images

Rahul Mehta¹, Bhavyajeet Singh^1,2

Vasudeva Varma¹, Manish Gupta^1,2

¹IIIT, ²Microsoft

manishg.iitb@gmail.com

07-Jun-24

2 of 17

What is VQA for electrical circuit images, and why do this?

Teaching tool or a quiz generator for students who are learning about electrical circuits.
Provide feedback and hints to help students solve circuit problems.
Design assistant or a verification tool for engineers who are creating or modifying electrical circuits.
Suggest improvements or optimizations for the circuit design.
Debugging or a diagnosis tool for technicians who are repairing or testing electrical circuits.
Identify faults or errors in the circuit functionality or performance.
Accessibility tool for visually impaired users who want to interact with or learn about electrical circuits.
Analysis tool for researchers who want to study or compare electrical circuits.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

3 of 17

Related Work

VQA for Science

scientific images, such as diagrams, graphs, charts, and illustrations.
ScienceQA (on science lectures)
AI2D (on diagrams)
ChartQA (on chart summaries)
FigureQA (on scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts)
DVQA (on barcharts)
PlotQA (on plots)
LeafQA (on figures/charts)
BizGraphQA (on graph-structured diagrams from business domains)

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

ML for electrical circuits

Solve EDA (electronic design automation) tasks and electronic circuit design.
Recognition of hand-drawn electrical and electronic circuit components
Fault diagnosis of analog circuits.

Hallucinations for VLMs

contradictions between the visual input (taken as ‘fact’) and the text output of a VLM
CHAIR and POPE: object hallucinations
Nothing for VQA

manishg.iitb@gmail.com

07-Jun-24

4 of 17

CircuitVQA Dataset Curation

Source: Roboflow and Kaggle
Derived from Handwritten Circuit Diagram Images (CGHD) dataset.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Images, human annotated bounding boxes and the corresponding component classes like resistor, ammeter etc.
Unified dataset of 5725 images of which 3175 are hand-drawn and 2550 are schematic.

manishg.iitb@gmail.com

07-Jun-24

5 of 17

Generation of Question Templates

5 question types: Simple Counting, Spatial counting, Position based, Value Based and Junction based.
For each type, obtain question templates using ChatGPT.
Instantiate questions using these templates, and image metadata like the associated components and their bounding boxes.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

6 of 17

Generation of Question Answer Pairs

Simple Counting Questions

Ask for the count of each component type in the image.
ChatGPT: “Paraphrase the following text in 20 ways - How many X does the circuit have?”
Replace X with the actual component name from metadata.
Metadata contains component names and their counts.
Object recognition and counting skills.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Spatial Counting Questions

Ask how many components of a certain type are connected directly to the left, right, top or bottom of the given component.
For D1, D2, D3 and D5: ChatGPT: “How many Y are connected directly to the ⟨direction⟩ of X?”
For D4 (digital gates): “How many gates are providing an input to X?”, “How many gates are connected to the right of X?”, “How many Y gates are connected to the right of X?”, and “How many Y gates are connected to the left of X?”
Replace X and Y with the actual component name
Human annotation to annotate answers.
Object detection and localization skills.

manishg.iitb@gmail.com

07-Jun-24

7 of 17

Generation of Question Answer Pairs

Value Based Questions

Ask what is the value associated with a particular electrical component.
ChatGPT: “Paraphrase the following sentence in 20 ways. What is the reading on X?”
Provide a list of all values as the answer.
Optical character recognition skills, object recognition skills, and also the capability to link text labels with components.
Manual answer labelling.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Junction based Questions

Ask whether a component exists between two junctions.
Datasets D2, D3 and D5 also have labeled bounding boxes for junctions.
ChatGPT: “Paraphrase the following text in 20 ways - Does a X exist between junction Y and junction Z?”
For a positive answer (i.e., answer=“yes”), we need valid triples of component X, junction Y and junction Z.

Choose random junction Y.
Choose junction Z which is closest to Y.
Positive: Choose component X such that its sum of distances to junctions Y and Z is minimum compared to any other pair of junctions.
Negative: Randomly sample component X’ from the image metadata, of a different type from X.

Object detection and localization, as well as spatial reasoning skills.

manishg.iitb@gmail.com

07-Jun-24

8 of 17

Generation of Question Answer Pairs

Position based Questions

Know the component at the left-most, right-most, top-most or bottom-most of the image.
ChatGPT: “Paraphrase the following in 20 ways - Which is the Xmost circuit symbol?”
Replace X with one of left, right, top or bottom.
Answer: use bounding boxes of the components to decide the left-most, right-most, top-most or bottom-most components. If there is no unique answer, we eliminate those questions.
Object detection and localization skills.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

9 of 17

CircuitVQA Dataset Analysis

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Components like “resistor”, “gnd”, “and gate”, “nand gate”, and “inductor” are the most frequent in value-based questions.
Although several questions have count as 1, ∼52% questions have the answer count greater than 1.

manishg.iitb@gmail.com

07-Jun-24

Frequency distribution of value-based questions across component names.

Frequency distribution of count-based questions

10 of 17

Methods for CircuitVQA

Generative models: BLIP, GIT, Pix2Struct
Instruction-tuned models: LLaVA, InstructBLIP, GPT4V
Language modelling loss for finetuning

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

11 of 17

Input Representations

Base: Image and text as input.
OCR text: Google Vision API

Question [OCR] OCR output

OCR-Post

Keep only the numbers and units typically expected by electrical measurements
Retain OCR output tokens that contain any of [‘Ω’, ‘H’, ‘A’, ‘F’, ‘V’, ‘W’, ‘k’, ‘K’, ‘.’, ‘κ’, ‘M’] or a combination of these symbols with a digit or only digits.

Bounding box information

Use human annotated bounding boxes in metadata to fine-tune YOLOv8
Precision of 78.1, recall of 63.9, mAP50 of 69.8 and mAP(50-95) of 51.3.
Classes: background, acv, ammeter, and, antenna, arr, block, capacitor, capacitor-unpolarized, capacitor.adjustable, capacitor-polarized, crossover, crystal, current-source, diac, diode, diode.light emitting, diode.thyrector, fuse, gnd, diode.zener, inductor, inductor.coupled, inductor.ferrite, inductor2, integrated circuit, integrated circuit.ne555, integrated circuit.voltage regulator, junction, lamp, magnetic, mechanical, microphone, motor, multi-cell-battery, nand, nor, not, operational amplifier, operational amplifier.schmitt trigger, optical, optocoupler, or, probe, probe.current, relay, resistor, probe.voltage, resistor.adjustable, resistor.photo, single-cell-battery, socket, speaker, switch, terminal, text, thyristor, transformer, transistor, transistor-photo, transistor.bjt, transistor.fet, triac, unknown, varistor, voltage-ac, voltage-dc, voltage-dc ac, voltage.battery, voltmeter, vss, xnor, and xor.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

12 of 17

Input Representations

BBox: For each detected component, along with the component name, we pass bounding boxes ⟨x, y, w, h⟩
BBox+Segment: Assign each BBox to one of the 9 segments “upper left”, “upper middle”, “upper right”, “left”, “middle”, “right”, “lower left”, “lower middle”, “lower right”.

Pass component name, ⟨x, y, w, h⟩, segment name.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Visual description of components

ChatGPT: “Describe the electrical component ⟨component⟩ in 50 words”
Desc: Pass question, [DESC], component description of relevant circuit component
“Capacitor: Symbolized by two parallel lines with a gap, it stores and releases electrical energy, acting as a temporary energy reservoir in a circuit.”

manishg.iitb@gmail.com

07-Jun-24

13 of 17

Input Prompt Templates for Instruction-based Models

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

14 of 17

Results

Hallucination in VQA

HVQA: average of three scores

HVQA-count (captures over-counting of existing objects): for simple counting, spatial counting and value based questions.
HVQA-in-domain (captures predictions of non-existing in-domain objects)
HVQA-out-domain (captures predictions with out-of-domain objects)

HVQA-in-domain and HVQA-out-domain are both applicable for position-based questions.

BLIP provides the best accuracy while LLaVa and GPT4V provide the lowest hallucination scores.
Best: finetuned BLIP (acc=91.7), when it is paired with prompts of visual description of the component (BLIP-Desc).

It also hallucinates less on in-domain objects compared to its fine-tuned counterparts GIT and Pix2Struct.

Fine tuning broadly ensures that the models (BLIP, GIT and Pix2Struct) do not hallucinate out-of-domain objects.
LLaVA predicts out-of-domain objects like ‘circle’, ‘square’, ‘A’, ‘B’, ‘D’, ‘F’, ‘triangle’, ‘carlin’, ‘nano’, ‘peizo-keeper’, ‘trigger’, ‘Snake’, ‘Snake Detector’.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

Results per question type for the Desc variants of the models on CircuitVQA test set.

Hallucination scores. A=count, B=indomain, C=out-domain.

15 of 17

Examples of Predictions from our best model

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

manishg.iitb@gmail.com

07-Jun-24

16 of 17

Examples of error cases from our best model

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Manual analysis (100 cases); 20 for each question type
Value-based questions

4: incorrect units
5: both units and values were wrong
11: incorrect values

Junction-based questions

12: ≥40 junctions
8: <40 junctions

Position-based questions

9: predicted component was physically the second closest to the correct answer component
11: predictions were far from the actual answer.

Simple counting questions

11 over-counting errors, all within a range of 1 to 5
9: undercounting.

Spatial counting questions

4 over-counting
16 under-counting.

manishg.iitb@gmail.com

07-Jun-24

17 of 17

Summary

New problem: VQA for electrical circuit images.
New dataset: CircuitVQA with five question types.
Extensive evaluation of several SOTA VLMs.
Input representations: OCR text, bounding boxes and detailed description of relevant circuit components.
BLIP with component description provides highest VQA accuracy across most question types, and one of the lowest hallucination scores.

CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images. Rahul Mehta, Bhavyajeet Singh, Vasudeva Varma, Manish Gupta. ECML-PKDD 2024.

Thanks for watching!
LinkedIn: http://aka.ms/manishgupta
HomePage: https://sites.google.com/view/manishg/

manishg.iitb@gmail.com

07-Jun-24