1 of 17

Multimodal Needle in a Haystack:Benchmarking Long-Context Capability ofMultimodal Large Language Models

Hengyi Wang, Haizhou Shi, Shiwei Tan,

Weiyi Qin, Wenyuan Wang, Tunyu Zhang,Akshay Nambi, Tanuja Ganu, Hao Wang

06/27/24

2 of 17

4.

3.

2.

1.

Introduction

MMNeedle Benchmark

Experiments

Conclusion

CONTENTS

3 of 17

Introduction

1.

4 of 17

Needle-in-a-Haystack Test

Input: July 2010What hard liquor, cigarettes, heroin, and crack have in common is that they're all more concentrated forms of less addictive predecessors.

Most if not all the things we describe as addictive are. And the

scary thing is, the process that created them is accelerating.We wouldn't want to stop it. It's the same process that cures

diseases: technological progress. The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day. Technological progress means making things do more of what we want. When the thing we want is something we want to want, we consider technological progress good.

If some new technique makes solar cells x% more efficient, that

seems strictly better. When progress concentrates something we

don't want to want—when it transforms opium into heroin—it seems

bad. But it‘s the same process at work.

Question: What is the best thing to do in San Francisco?

5 of 17

Needle-in-a-Haystack Test

6 of 17

Challenges in Multimodal LLMs

  • Insufficient image context length
      • Costly annotations on sub-image objects
      • Our solution – Image Stitching

7 of 17

MMNeedle Benchmark

2.

8 of 17

Key Components

  • Needle Sub-Image
  • Haystack Image Inputs
  • Text Inputs (Instructions and Caption)
  • LLM Outputs

9 of 17

Experiments

3.

10 of 17

Long-Context Capability

11 of 17

Hallucination on Negative Examples

12 of 17

Multi-Needle Evaluation

13 of 17

Statistitical Significance

14 of 17

Effect of Context Length

15 of 17

Conclusion

4.

16 of 17

Links and Resources

17 of 17

References

[1] G. Kamradt. Needle in a haystack - pressure testing llms.

https://github.com/gkamradt/LLMTest_NeedleInAHaystack, 2023.

[2] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.