1 of 17

CircuitVQA: A VQA Dataset for Electrical Circuit Images

Rahul Mehta1, Bhavyajeet Singh1,2

Vasudeva Varma1, Manish Gupta1,2

1IIIT, 2Microsoft

manishg.iitb@gmail.com

1

07-Jun-24

2 of 17

What is VQA for electrical circuit images, and why do this?

  • Teaching tool or a quiz generator for students who are learning about electrical circuits.
  • Provide feedback and hints to help students solve circuit problems.
  • Design assistant or a verification tool for engineers who are creating or modifying electrical circuits.
  • Suggest improvements or optimizations for the circuit design.
  • Debugging or a diagnosis tool for technicians who are repairing or testing electrical circuits.
  • Identify faults or errors in the circuit functionality or performance.
  • Accessibility tool for visually impaired users who want to interact with or learn about electrical circuits.
  • Analysis tool for researchers who want to study or compare electrical circuits.

manishg.iitb@gmail.com

2

07-Jun-24

3 of 17

Related Work

  • VQA for Science
    • scientific images, such as diagrams, graphs, charts, and illustrations.
    • ScienceQA (on science lectures)
    • AI2D (on diagrams)
    • ChartQA (on chart summaries)
    • FigureQA (on scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts)
    • DVQA (on barcharts)
    • PlotQA (on plots)
    • LeafQA (on figures/charts)
    • BizGraphQA (on graph-structured diagrams from business domains)
  • ML for electrical circuits
    • Solve EDA (electronic design automation) tasks and electronic circuit design.
    • Recognition of hand-drawn electrical and electronic circuit components
    • Fault diagnosis of analog circuits.
  • Hallucinations for VLMs
    • contradictions between the visual input (taken as ‘fact’) and the text output of a VLM
    • CHAIR and POPE: object hallucinations
    • Nothing for VQA

manishg.iitb@gmail.com

3

07-Jun-24

4 of 17

CircuitVQA Dataset Curation

  • Source: Roboflow and Kaggle
  • Derived from Handwritten Circuit Diagram Images (CGHD) dataset.
  • Images, human annotated bounding boxes and the corresponding component classes like resistor, ammeter etc.
  • Unified dataset of 5725 images of which 3175 are hand-drawn and 2550 are schematic.

manishg.iitb@gmail.com

4

07-Jun-24

5 of 17

Generation of Question Templates

  • 5 question types: Simple Counting, Spatial counting, Position based, Value Based and Junction based.
  • For each type, obtain question templates using ChatGPT.
  • Instantiate questions using these templates, and image metadata like the associated components and their bounding boxes.

manishg.iitb@gmail.com

5

07-Jun-24

6 of 17

Generation of Question Answer Pairs

  • Simple Counting Questions
    • Ask for the count of each component type in the image.
    • ChatGPT: “Paraphrase the following text in 20 ways - How many X does the circuit have?”
    • Replace X with the actual component name from metadata.
    • Metadata contains component names and their counts.
    • Object recognition and counting skills.

  • Spatial Counting Questions
    • Ask how many components of a certain type are connected directly to the left, right, top or bottom of the given component.
    • For D1, D2, D3 and D5: ChatGPT: “How many Y are connected directly to the ⟨direction⟩ of X?”
    • For D4 (digital gates): “How many gates are providing an input to X?”, “How many gates are connected to the right of X?”, “How many Y gates are connected to the right of X?”, and “How many Y gates are connected to the left of X?”
    • Replace X and Y with the actual component name
    • Human annotation to annotate answers.
    • Object detection and localization skills.

manishg.iitb@gmail.com

6

07-Jun-24

7 of 17

Generation of Question Answer Pairs

  • Value Based Questions
    • Ask what is the value associated with a particular electrical component.
    • ChatGPT: “Paraphrase the following sentence in 20 ways. What is the reading on X?”
    • Provide a list of all values as the answer.
    • Optical character recognition skills, object recognition skills, and also the capability to link text labels with components.
    • Manual answer labelling.
  • Junction based Questions
    • Ask whether a component exists between two junctions.
    • Datasets D2, D3 and D5 also have labeled bounding boxes for junctions.
    • ChatGPT: “Paraphrase the following text in 20 ways - Does a X exist between junction Y and junction Z?”
    • For a positive answer (i.e., answer=“yes”), we need valid triples of component X, junction Y and junction Z.
      • Choose random junction Y.
      • Choose junction Z which is closest to Y.
      • Positive: Choose component X such that its sum of distances to junctions Y and Z is minimum compared to any other pair of junctions.
      • Negative: Randomly sample component X’ from the image metadata, of a different type from X.
    • Object detection and localization, as well as spatial reasoning skills.

manishg.iitb@gmail.com

7

07-Jun-24

8 of 17

Generation of Question Answer Pairs

  • Position based Questions
    • Know the component at the left-most, right-most, top-most or bottom-most of the image.
    • ChatGPT: “Paraphrase the following in 20 ways - Which is the Xmost circuit symbol?”
    • Replace X with one of left, right, top or bottom.
    • Answer: use bounding boxes of the components to decide the left-most, right-most, top-most or bottom-most components. If there is no unique answer, we eliminate those questions.
    • Object detection and localization skills.

manishg.iitb@gmail.com

8

07-Jun-24

9 of 17

CircuitVQA Dataset Analysis

  • Components like “resistor”, “gnd”, “and gate”, “nand gate”, and “inductor” are the most frequent in value-based questions.
  • Although several questions have count as 1, ∼52% questions have the answer count greater than 1.

manishg.iitb@gmail.com

9

07-Jun-24

Frequency distribution of value-based questions across component names.

Frequency distribution of count-based questions

10 of 17

Methods for CircuitVQA

  • Generative models: BLIP, GIT, Pix2Struct
  • Instruction-tuned models: LLaVA, InstructBLIP, GPT4V
  • Language modelling loss for finetuning

manishg.iitb@gmail.com

10

07-Jun-24

11 of 17

Input Representations

  • Base: Image and text as input.
  • OCR text: Google Vision API
    • Question [OCR] OCR output
  • OCR-Post
    • Keep only the numbers and units typically expected by electrical measurements
    • Retain OCR output tokens that contain any of [‘Ω’, ‘H’, ‘A’, ‘F’, ‘V’, ‘W’, ‘k’, ‘K’, ‘.’, ‘κ’, ‘M’] or a combination of these symbols with a digit or only digits.
  • Bounding box information
    • Use human annotated bounding boxes in metadata to fine-tune YOLOv8
    • Precision of 78.1, recall of 63.9, mAP50 of 69.8 and mAP(50-95) of 51.3.
    • Classes: background, acv, ammeter, and, antenna, arr, block, capacitor, capacitor-unpolarized, capacitor.adjustable, capacitor-polarized, crossover, crystal, current-source, diac, diode, diode.light emitting, diode.thyrector, fuse, gnd, diode.zener, inductor, inductor.coupled, inductor.ferrite, inductor2, integrated circuit, integrated circuit.ne555, integrated circuit.voltage regulator, junction, lamp, magnetic, mechanical, microphone, motor, multi-cell-battery, nand, nor, not, operational amplifier, operational amplifier.schmitt trigger, optical, optocoupler, or, probe, probe.current, relay, resistor, probe.voltage, resistor.adjustable, resistor.photo, single-cell-battery, socket, speaker, switch, terminal, text, thyristor, transformer, transistor, transistor-photo, transistor.bjt, transistor.fet, triac, unknown, varistor, voltage-ac, voltage-dc, voltage-dc ac, voltage.battery, voltmeter, vss, xnor, and xor.

manishg.iitb@gmail.com

11

07-Jun-24

12 of 17

Input Representations

  • BBox: For each detected component, along with the component name, we pass bounding boxes ⟨x, y, w, h⟩
  • BBox+Segment: Assign each BBox to one of the 9 segments “upper left”, “upper middle”, “upper right”, “left”, “middle”, “right”, “lower left”, “lower middle”, “lower right”.
    • Pass component name, ⟨x, y, w, h⟩, segment name.

  • Visual description of components
    • ChatGPT: “Describe the electrical component ⟨component⟩ in 50 words”
    • Desc: Pass question, [DESC], component description of relevant circuit component
    • “Capacitor: Symbolized by two parallel lines with a gap, it stores and releases electrical energy, acting as a temporary energy reservoir in a circuit.”

manishg.iitb@gmail.com

12

07-Jun-24

13 of 17

Input Prompt Templates for Instruction-based Models

manishg.iitb@gmail.com

13

07-Jun-24

14 of 17

Results

  • Hallucination in VQA
    • HVQA: average of three scores
      • HVQA-count (captures over-counting of existing objects): for simple counting, spatial counting and value based questions.
      • HVQA-in-domain (captures predictions of non-existing in-domain objects)
      • HVQA-out-domain (captures predictions with out-of-domain objects)
    • HVQA-in-domain and HVQA-out-domain are both applicable for position-based questions.
  • BLIP provides the best accuracy while LLaVa and GPT4V provide the lowest hallucination scores.
  • Best: finetuned BLIP (acc=91.7), when it is paired with prompts of visual description of the component (BLIP-Desc).
    • It also hallucinates less on in-domain objects compared to its fine-tuned counterparts GIT and Pix2Struct.
  • Fine tuning broadly ensures that the models (BLIP, GIT and Pix2Struct) do not hallucinate out-of-domain objects.
  • LLaVA predicts out-of-domain objects like ‘circle’, ‘square’, ‘A’, ‘B’, ‘D’, ‘F’, ‘triangle’, ‘carlin’, ‘nano’, ‘peizo-keeper’, ‘trigger’, ‘Snake’, ‘Snake Detector’.

manishg.iitb@gmail.com

14

07-Jun-24

Results per question type for the Desc variants of the models on CircuitVQA test set.

Hallucination scores. A=count, B=indomain, C=out-domain.

15 of 17

Examples of Predictions from our best model

manishg.iitb@gmail.com

15

07-Jun-24

16 of 17

Examples of error cases from our best model

  • Manual analysis (100 cases); 20 for each question type
  • Value-based questions
    • 4: incorrect units
    • 5: both units and values were wrong
    • 11: incorrect values
  • Junction-based questions
    • 12: ≥40 junctions
    • 8: <40 junctions
  • Position-based questions
    • 9: predicted component was physically the second closest to the correct answer component
    • 11: predictions were far from the actual answer.
  • Simple counting questions
    • 11 over-counting errors, all within a range of 1 to 5
    • 9: undercounting.
  • Spatial counting questions
    • 4 over-counting
    • 16 under-counting.

manishg.iitb@gmail.com

16

07-Jun-24

17 of 17

Summary

  • New problem: VQA for electrical circuit images.
  • New dataset: CircuitVQA with five question types.
  • Extensive evaluation of several SOTA VLMs.
  • Input representations: OCR text, bounding boxes and detailed description of relevant circuit components.
  • BLIP with component description provides highest VQA accuracy across most question types, and one of the lowest hallucination scores.

manishg.iitb@gmail.com

17

07-Jun-24