A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | Release Date | Model Type | Model | Inputs | Success Rate (%) | Result Source |
2 | Human Baseline | Image | 88.70 | VisualWebArena | ||
3 | 10/2024 | Multimodal (SoM) | GPT-4o + R-MCTS | SoM + Caption + Image | 33.7 | ExACT |
4 | 06/2024 | Multimodal (SoM) | GPT-4o + Search | SoM + Caption + Image | 26.4 | Tree Search for LM Agents |
5 | 11/2024 | Multimodal (SoM) | GPT-4o + WebDreamer | SoM + Caption + Image | 23.6 | WebDreamer |
6 | 06/2024 | Multimodal (SoM) | GPT-4o + ICAL | SoM + Caption + Image | 23.4 | ICAL |
7 | 06/2024 | Multimodal (SoM) | GPT-4o | SoM + Caption + Image | 19.78 | VisualWebArena |
8 | 06/2024 | Caption-augmented | Llama-3-70B + Search | AxTree + Caption | 16.7 | Tree Search for LM Agents |
9 | 01/2024 | Multimodal (SoM) | GPT-4V | SoM + Caption + Image | 16.37 | VisualWebArena |
10 | 01/2024 | Multimodal | GPT-4V | AXTree + Caption + Image | 15.05 | VisualWebArena |
11 | 01/2024 | Caption-augmented | GPT-4 + BLIP-2-T5XL | AxTree + Caption | 12.75 | VisualWebArena |
12 | 06/2024 | Multimodal (SoM) | Gemini-Pro-1.5 | SoM + Caption + Image | 11.98 | VisualWebArena |
13 | 05/2025 | Image-only | ViGoRL-7B | Image | 11.2 | ViGoRL |
14 | 06/2024 | Caption-augmented | Llama-3-70B-Instruct + BLIP-2-T5XL | AxTree + Caption | 9.78 | VisualWebArena |
15 | 01/2025 | Multimodal (SoM) | Qwen2-VL + ICAL | SoM + Caption + Image | 8.2 | ICAL |
16 | 01/2024 | Text-only | GPT-4 | AXTree | 7.25 | VisualWebArena |
17 | 06/2024 | Multimodal (SoM) | Gemini-Flash-1.5 | SoM + Caption + Image | 6.59 | VisualWebArena |
18 | 05/2025 | Image-only | ViGoRL-3B | Image | 6.4 | ViGoRL |
19 | 01/2024 | Multimodal | Gemini-Pro | AXTree + Caption + Image | 6.04 | VisualWebArena |
20 | 01/2024 | Multimodal (SoM) | Gemini-Pro | SoM + Caption + Image | 5.71 | VisualWebArena |
21 | 05/2025 | Image-only | Qwen-VL-7B | Image | 5.5 | ViGoRL |
22 | 01/2024 | Caption-augmented | Gemini-Pro + BLIP-2-T5XL | AxTree + Caption | 3.85 | VisualWebArena |
23 | 01/2024 | Caption-augmented | GPT-3.5 + BLIP-2-T5XL | AxTree + Caption | 2.97 | VisualWebArena |
24 | 01/2025 | Multimodal (SoM) | Qwen2-VL | SoM + Caption + Image | 2.9 | ICAL |
25 | 01/2024 | Caption-augmented | GPT-3.5 + LLaVa-7B | AxTree + Caption | 2.75 | VisualWebArena |
26 | 01/2024 | Text-only | GPT-3.5 | AXTree | 2.20 | VisualWebArena |
27 | 01/2024 | Text-only | Gemini-Pro | AXTree | 2.20 | VisualWebArena |
28 | 01/2024 | Caption-augmented | Mixtral-8x7b + BLIP-2-T5XL | AxTree + Caption | 1.87 | VisualWebArena |
29 | 01/2024 | Text-only | Mixtral-8x7B | AXTree | 1.76 | VisualWebArena |
30 | 01/2024 | Text-only | Llama-2-70B | AXTree | 1.10 | VisualWebArena |
31 | 01/2024 | Multimodal (SoM) | IDEFICS-80B-Instruct | SoM + Caption + Image | 0.99 | VisualWebArena |
32 | 01/2024 | Multimodal | IDEFICS-80B-Instruct | AXTree + Caption + Image | 0.77 | VisualWebArena |
33 | 01/2024 | Caption-augmented | Llama-2-70B + BLIP-2-T5XL | AxTree + Caption | 0.66 | VisualWebArena |
34 | 01/2024 | Multimodal (SoM) | CogVLM | SoM + Caption + Image | 0.33 | VisualWebArena |
35 | 01/2024 | Multimodal | CogVLM | AXTree + Caption + Image | 0.33 | VisualWebArena |
36 |