ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACAD
1
⚠️ Community-submitted results. No independent verification is performed. Scores are self-reported and may be inaccurate. Use at your own risk.
2
RankRelease DateResult SourceModel TypeOpen?Model SizeModelScreen
Representation
Success Rate
(pass@1)
Number
of trials
Success Rate
(pass@k)
Trajectory
submissions
Note
3
110/2025AGI-0AI agent-AGI-0Screenshot97.41n/a
10/14/2025. Changed from model --> agent, based on description in blog post. https://www.theagi.company/blog/android-world
4
210/2025
askui AndroidVisionAgent
AI agent-
askui AndroidVisionAgent, Claude 4.5 Sonnet + Claude 4.0 Sonnet
Screenshot94.81n/a
5
310/2025DroidRunAI agent-GPT5, Gemini 2.5 ProScreenshot + A11y tree91.41Trajectories
[10/6/2025]: 78.4 --> 91.4; details here
[09/19/2025]: Updated score from 63.0 --> 78.4. For this run, we used GPT-5 for reasoning in combination with Gemini-2.5-Pro for acting, while continuing to rely on a hybrid method of the Accessibility (A11y) tree plus corresponding screenshots to provide both structural and visual context to the agent.

6
39/2025mobile-useAI agent-
Llama 4-scout, Gemini 2.5 pro, GPT-5 nano
Screenshot + A11y tree91.41
https://minitap.ai/benchmark
[12/16/2025] 84.5% -> 91.4%
[10/1/2025]: 77.6% -> 84.5%
[09/12/2025]: Updated score from 74.1% --> 77.6%
[08/19/2025]: Initial trajectory labeling issue (MarkorCreateNoteFromClipboard) has been corrected by the authors. They provide Discord support for setup issues.
[08/18/2025]: Repo has been reported broken by some users (not independently verified). Some trajectories (e.g., MarkorCreateNoteFromClipboard, BrowserMaze) are labeled as successful by the authors but contain incorrect actions
7
512/2025Agent-ViscoAI agent--Screenshot88.81
https://work.aliyun.com/alimail/openLinks/downloadMimeMetaDiskBigAttach?id=netdiskid%3Av001%3Afile%3ADzzzzzzNqZy%3BG8EhE7UV7U9nk2t2O6opysa%2BgRf5Zm6eTqXqtVApm1nDD8W%2BxnLAgdDsJsOSlxfXTx44NU7qRM%2BkJMuIA0smltxGWspXGX2Ox228yqp07rNX%2FIm2QehcGJBgHwUs0f81cz2sX76WNxvxnhpZnqmxiYxHNHRd2hybrPLi2DRQ8bM5ORhI%2BtGG%2B7n%2BcxdY1Ji0
Research paper and codes will be open-source in early months.
8
610/2025Surfer 2AI agent-o3 + holo1.5-72bScreenshot87.11pass@3 93.1
https://hcompai.github.io/android-world-traces/
Agent is based on https://github.com/hcompai/surfer-h-cli - same architecture, but with modifications to prompt and action space
9
710/2025gbox.aiAI agent-
Sonnet 4.5 + Sonnet 4
Screenshot86.21
https://github.com/babelcloud/android_world_benchmark/tree/main/GBOX/trajectory
GBOX uses Claude code as the agent with GBOX mcp. Link to report -> https://github.com/babelcloud/android_world_benchmark/blob/main/GBOX/report.md
10
88/2025AutoGLM-MobileModel9BAutoGLM-MobileScreenshot + A11y tree80.21
autoglm-mobile-aw-ckpt.zip
[9/26/2025]: Updated score from 75.8 --> 80.2 This is an updated version of AutoGLM-Mobile. In this version, we added more pretraining data and trained the VLM at full image resolution.
11
99/2025LX-GUIAgentAI agent-LX-GUIAgentScreenshot + A11y tree79.31
androidworld-trajectories.zip
[9/23/2025]: Updated score from 75.0 --> 79.3. Added trajectories.
12
1012/2025AgentProgAI agent-
Gemini-2.5-Pro+UI-TARS-1.5
Screenshot78.01
AgentProg_AndroidWorld.zip
Paper: https://arxiv.org/pdf/2512.10371; Code: https://github.com/MobileLLM/AgentProg
13
118/2025FinalrunAI agent-GPT-5Screenshot + A11y tree76.71
run_20250829T064517298482.zip
The technical doc/approach can be found here: https://github.com/final-run/finalrun-android-world-benchmark
14
119/2025K²-AgentAI agent72B + 7B
Qwen2.5-VL-72B + Qwen2.5-VL-7B
Screenshot76.71
https://github.com/k2-agent/k2-agent/blob/main/androidworld-trajectories.zip
K²-Agent, a hierarchical approach that self-evolves a Qwen2.5-VL-72B for high-level planning and post-trains a Qwen2.5-VL-7B for low-level execution.
Our GitHub repository, which contains a detailed description of our approach, can be found here: https://github.com/k2-agent/k2-agent.
15
111/2026MAI-UIModel235BMAI-UI-235B-A22BScreenshot76.7
16
149/2025MobileUse-v2AI agent32BHammer-UI-32BScreenshot75.01
MobileUse-v2-aw-ckpt.zip
Make further post-training based on the GUI-Owl-32b model. Optimize the memory and knowledge module of the MobileUse framework.
17
158/2025Mobile-Agent-v3AI agent32BGUI-Owl-32BScreenshot73.31https://github.com/X-PLUG/MobileAgent/tree/main
18
151/2026MAI-UIModel32BMAI-UI-32BScreenshot73.3
19
171/2026MAI-UIModel8BMAI-UI-8BScreenshot70.7
20
1810/2025
Gemini 2.5 Computer Use
Model-
Gemini 2.5 Computer Use
Screenshot69.71
21
196/2025JT-GUIAgent-V2AI agent-JT-GUIAgent-V2Screenshot67.21
22
208/2025GUI-Owl-7BModel7BGUI-Owl-7BScreenshot66.41https://github.com/X-PLUG/MobileAgent/tree/main
23
218/2025UI-VenusModel72BUI-Venus-Navi-72BScreenshot65.91
https://github.com/inclusionAI/UI-Venus/blob/main/vis_androidworld/UI-Venus-androidworld.zip
https://huggingface.co/inclusionAI/UI-Venus-Navi-72B
24
2207/2025MobileUseAI agent72BQwen2.5-VL-72BScreenshot62.91
25
2305/2025Seed1.5-VLModel20.BSeed1.5-VLScreenshot + A11y tree62.11
26
246/2025JT-GUIAgent-V1AI agent-JT-GUIAgent-V1Screenshot60.01
27
253/2025V-Droid PaperAI agent8BV-Droid (Llama8B)A11y tree59.51--Training data consists of apps and tasks from the AndroidWorld benchmark. Code.
28
264/2025Agent S2AI agent-Agent S2Screenshot54.31-
29
278/2025UI-VenusModel7BVenus-Navi-7BScreenshot49.11
https://github.com/inclusionAI/UI-Venus/blob/main/vis_androidworld/UI-Venus-androidworld.zip
https://huggingface.co/inclusionAI/UI-Venus-Navi-7B
30
271/2026MAI-UIModel2BMAI-UI-2BScreenshot49.1
31
2905/2025GUI-ExplorerAI agent-GPT-4oScreenshot + A11y tree47.41
32
304/2025AndroidGenAI agent-GPT-4oA11y tree46.81
33
311/2025UI-TARSModel72BUI-TARSScreenshot46.61-
34
3212/2024Aria-UIModel-GPT-4o + Aria-UIScreenshot44.81--
35
334/2025ScaleTrackModel8BScaleTrack-7BA11y tree44.01
36
331/2025UGroundModel-GPT-4o + UGroundScreenshot44.01-TrajectoriesCode for reproduction
37
356/2025Mirage-1AI agent-GPT-4oScreenshot42.21With Mirage-1-O; uses OS-Atlas grounder
38
3612/2024Ponder & PressAI agent-GPT-4oScreenshot34.51--Code is not yet open-sourced
39
3705/2024AndroidWorldAI agent-GPT-4 TurboA11y tree30.61--
40
386/2025GUI-Critic-R1Model7BQwen-2.5-VL-7BScreenshot + A11y tree27.61
41
3805/2024EcoAgentAI agent-
GPT-4o, OS-Atlas-Pro 4B, Qwen2-VL-2B-Instruct
Screenshot27.61
42
401/2025InfiGUIAgentModel2B
Qwen2-VL-2B (fine-tuned)
Screenshot9.01--
43
10/2024OSCARAI agent-GPT-4oScreenshot161.6 (k=4)-Code will be open-source upon publication.
44
45
Human Performance
46
05/2024AndroidWorld-Human80.03
47
48
Comment here or email crawles@gmail.com to submit your work! Please attach how should the data entry look like.
49
50
Definitions
51
Model
A relatively simple prompt involving one LLM /VLM call
52
AI agent
A multi-agent architecture involving several LLM calls and a protocol to coordinate the various agents or an LLM wrapped into an advanced agent with memory, subgoal planning, etc.
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100