1 of 9

FRAMEWORK VARIATIONS

MICRO-BENCHMARKING FRAMEWORK FOR WEB AGENTS

2 of 9

REASONS TO PURSUE THIS

Conclusion: No collision. Our work would be novel, micro-diagnostic and likely COMPLEMENTING macro-benchmarks.

3 of 9

failure?

Det algorithm

AI-powered Algorithm

WHAT DO WE PROPOSE?

Assessment of web agents-LLM configurations on the basis of detection and interaction of UI elements and figuring out the failure cause using the novel hybrid approach which uses an element agnostic rule based deterministic algorithm. If this fails, the system fall back to an AI-powered root cause algorithm, which also adds on new rules if it is not classified in existing categories (pipeline image above)

Liteagent

Webagent

Websites

Logs

Hybrid failure cause analysis pipeline

Dashboard

4 of 9

EXPERIMENTAL DESIGN

    • We have chosen 2 webpages/elements through webvoyager’s most used websites (Amazon motivated navigation menu and ArXiv advanced search)
    • We have recreated each of these elements using 5 different frameworks: vanillaJS, TailwindCSS, SvelteJS, Angular, Bootstrap(chocie of framework wasn’t random, after studying the modern websites and taking javascript and CSS stylings)
    • We have selected 2 LLMs configuration(GPT-4.1 and Gemini-2.0-Flash)
    • Selected 4 Agents which are open source (Skyvern, Broweruse, WebArena, and Agent-E) (VISION-DISABLED SETTING)
    • Consistency iterations = 3
    • Total runs: 2x5x2x4x3=240

5 of 9

ARXIVE WEBSITE’S ADVANCED SEARCH

Task: Give me 2nd most recent paper by author Alice in the field of CS and Statistics in the year 2025

6 of 9

MULTICASCADE DROPDOWN

Task: Buy mens Nike Air Max from all products, the one with the lowest price add to cart and buy

7 of 9

FAILURE CAUSE ANALYSIS BUCKET

8 of 9

failure?

Det algorithm

AI-powered Algorithm

Hybrid failure cause analysis pipeline

Dashboard

HYBRID APPROACH PIPELINE AND DEMO

Deterministic algorithm

AI-powered analysis

    • Currently have multiple LLMs support
    • Additional feature of storing new scenarios into SQL database for future

9 of 9

RESULTS AND WRITING

Visuals: https://claude.ai/public/artifacts/ad0f5fa1-b872-4f0f-8013-75fe5566d74b

Doc: https://docs.google.com/document/d/18gTZntVpGzIVV5Ne-1wn8DB-Q2oiignv-ogUFn64EJQ/edit?usp=sharing

overleaf:

Githubs:

https://github.com/bulia-keshav/Element-variation-websites

https://github.com/hassan-byt0/Webagent_RCA_pipelines