1 of 12

FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages

Sarmistha Das1∗, Vaibhav Vishal1∗, Syed Ibrahim Ahmad1∗

Sriparna Saha1, Manish Gupta2

1IIT Patna, India 2Microsoft, India

1

sarmistha1515@gmail.com, vvaibhav728@gmail.com, syediahmad0@gmail.com

sriparna@iitp.ac.in, gmanish@microsoft.com

2 of 12

Why is finance numeric reasoning challenging?

  • Precise mathematical computation
  • Faithful application of domain knowledge
  • Robust reasoning over hybrid multimodal contexts such as tables, charts, and textual descriptions
  • VLMs over-rely on textual cues while underutilizing visual financial signals
  • Existing benchmarks largely focus on en and lack coverage of Indic languages

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

2

3 of 12

What is FinVQA?

  • A benchmark for evaluating financial numerical and multimodal reasoning in multilingual Indic contexts.
  • en, hi, bn, mr, gu, ta
  • 18,900 samples across 14 domains.
    • 15,114 text-only and 3,786 image-based.
  • Easy (6234), moderate (6384), hard (6282)
  • 4 question formats: multiple choice, fill-in-the-blank, table matching, and true/false.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

3

4 of 12

How was FinVQA curated?

  • National Council of Educational Research and Training (NCERT) textbooks
    • Accountancy, Business Studies, and Economics for Classes 9–12.
  • ICMAI–CMA program
  • 2519 text-only samples and 631 image-based en samples.
  • GPT-4o translates to hi, mr, ta, gu and bn.
    • Quality using cos sim over multilingual MiniLM-L12-v2 embeddings
  • Back-Translation to enhance linguistic diversity.
  • OCR-based text replacement using PaddleOCR in NotoSans-{IndicLanguage} font.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

4

5 of 12

Learning Framework

  • Sample: question, image, options, correct option.
  • Zeroshot: formatting instability, verbosity, and language inconsistency
  • Constrained Decoding
    • single-token for en, hi and mr
    • Up to three tokens for bn, gu and ta
  • SFT
    • LoRA on QKV projections in self and cross-attention layers.
    • Rank r = 8. α = 32.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

5

6 of 12

Model-wise Accuracy for text-only data

  • Overall perf
    • Smaller models (≤4B) show weak multilingual generalization.
    • SFT improves perf considerably.
    • Large models achieve highest perf.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

  • SFT > constrained decoding.
  • Language-specific perf
    • Highest accuracies: en, hi and bn
    • Largest gains from constrained decoding and SFT: mr and gu
    • Most challenging: ta

6

7 of 12

Model-wise Accuracy for multimodal part of data

  • Constrained decoding and SFT enhance perf over zero-shot for Indic languages.
  • Large models achieve highest perf.
  • Highest accuracy: en

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

7

8 of 12

Human Evaluation

  • Financial Domain Understanding and Problem Interpretation are strongly capacity-dependent.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

  • Syntactic compliance is achievable at low compute, but high-fidelity financial reasoning requires large-scale models.

8

9 of 12

Case study

  • Smaller models
    • weak visual grounding
    • inaccurate extraction of data points
    • misunderstanding of core task objectives
    • logical disconnects between intermediate reasoning traces and final answer selections.

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

9

10 of 12

Case study

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

10

11 of 12

Case study

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

11

12 of 12

Summary

  • FinVQA
    • Financial numerical and multimodal reasoning in 6 Indic langs
    • en, hi, bn, mr, gu and ta
    • 18,900 samples across 14 domains.
    • 4 question formats: multiple choice, fill-in-the-blank, table matching, and true/false.
  • Benchmarking
    • SFT + constraint-aware decoding
    • Larger VLMs > small models
    • en is best. ta is challenging.

  • Thanks!

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Sriparna Saha, Manish Gupta. FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages. ACL 2026.

12