ABCDEFGHI
1
Comment here or email jingyuk@cs.cmu.edu to submit your work! Please attach how should the data looks like.
2
Starting in September 2024, we require submissions to include raw trajectories (per-step observation and predicted action), to enhance transparency on the leaderboard.
3
Release DateModel TypeModelInputsSuccess Rate (%)Result Source
4
Human BaselineImage88.70VisualWebArena
5
07/2025Multimodal (SoM)Gemini 2.5 FlashSoM54.0SGV
6
10/2025Multimodal (SoM)GPT-5SoM52.9WALT
7
06/2026Caption-augmentedGLM-4-32BStructured DOM Representation48.7WebChallenger
8
09/2025Multimodal (SoM)GPT-5SoM + Caption + Image36.5AWorld
9
04/2026MultimodalA3-Qwen3.5-9BAXTree + Image33.7Agent-as-Annotators
10
10/2024Multimodal (SoM)GPT-4o + R-MCTSSoM + Caption + Image33.7ExACT
11
06/2024Multimodal (SoM)GPT-4o + SearchSoM + Caption + Image26.4Tree Search for LM Agents
12
11/2024Multimodal (SoM)GPT-4o + WebDreamerSoM + Caption + Image23.6WebDreamer
13
06/2024Multimodal (SoM)GPT-4o + ICALSoM + Caption + Image23.4ICAL
14
06/2024Multimodal (SoM)GPT-4oSoM + Caption + Image19.78VisualWebArena
15
06/2024Caption-augmentedLlama-3-70B + SearchAxTree + Caption16.7Tree Search for LM Agents
16
01/2024Multimodal (SoM)GPT-4VSoM + Caption + Image16.37VisualWebArena
17
01/2024MultimodalGPT-4VAXTree + Caption + Image15.05VisualWebArena
18
01/2024Caption-augmentedGPT-4 + BLIP-2-T5XLAxTree + Caption12.75VisualWebArena
19
06/2024Multimodal (SoM)Gemini-Pro-1.5SoM + Caption + Image11.98VisualWebArena
20
05/2025Image-onlyViGoRL-7BImage11.2ViGoRL
21
06/2024Caption-augmented
Llama-3-70B-Instruct + BLIP-2-T5XL
AxTree + Caption9.78VisualWebArena
22
01/2025Multimodal (SoM)Qwen2-VL + ICALSoM + Caption + Image8.2ICAL
23
01/2024Text-onlyGPT-4AXTree7.25VisualWebArena
24
06/2024Multimodal (SoM)Gemini-Flash-1.5SoM + Caption + Image6.59VisualWebArena
25
05/2025Image-onlyViGoRL-3BImage6.4ViGoRL
26
01/2024MultimodalGemini-ProAXTree + Caption + Image6.04VisualWebArena
27
01/2024Multimodal (SoM)Gemini-ProSoM + Caption + Image5.71VisualWebArena
28
05/2025Image-onlyQwen-VL-7BImage5.5ViGoRL
29
01/2024Caption-augmentedGemini-Pro + BLIP-2-T5XLAxTree + Caption3.85VisualWebArena
30
01/2024Caption-augmentedGPT-3.5 + BLIP-2-T5XLAxTree + Caption2.97VisualWebArena
31
01/2025Multimodal (SoM)Qwen2-VLSoM + Caption + Image2.9ICAL
32
01/2024Caption-augmentedGPT-3.5 + LLaVa-7BAxTree + Caption2.75VisualWebArena
33
01/2024Text-onlyGPT-3.5AXTree2.20VisualWebArena
34
01/2024Text-onlyGemini-ProAXTree2.20VisualWebArena
35
01/2024Caption-augmentedMixtral-8x7b + BLIP-2-T5XLAxTree + Caption1.87VisualWebArena
36
01/2024Text-onlyMixtral-8x7BAXTree1.76VisualWebArena
37
01/2024Text-onlyLlama-2-70BAXTree1.10VisualWebArena
38
01/2024Multimodal (SoM)IDEFICS-80B-InstructSoM + Caption + Image0.99VisualWebArena
39
01/2024MultimodalIDEFICS-80B-InstructAXTree + Caption + Image0.77VisualWebArena
40
01/2024Caption-augmentedLlama-2-70B + BLIP-2-T5XLAxTree + Caption0.66VisualWebArena
41
01/2024Multimodal (SoM)CogVLMSoM + Caption + Image0.33VisualWebArena
42
01/2024MultimodalCogVLMAXTree + Caption + Image0.33VisualWebArena
43