FRAMEWORK VARIATIONS
MICRO-BENCHMARKING FRAMEWORK FOR WEB AGENTS
REASONS TO PURSUE THIS
Conclusion: No collision. Our work would be novel, micro-diagnostic and likely COMPLEMENTING macro-benchmarks.
failure?
Det algorithm
AI-powered Algorithm
WHAT DO WE PROPOSE?
Assessment of web agents-LLM configurations on the basis of detection and interaction of UI elements and figuring out the failure cause using the novel hybrid approach which uses an element agnostic rule based deterministic algorithm. If this fails, the system fall back to an AI-powered root cause algorithm, which also adds on new rules if it is not classified in existing categories (pipeline image above)
Liteagent
Webagent
Websites
Logs
Hybrid failure cause analysis pipeline
Dashboard
EXPERIMENTAL DESIGN
ARXIVE WEBSITE’S ADVANCED SEARCH
Task: Give me 2nd most recent paper by author Alice in the field of CS and Statistics in the year 2025
MULTICASCADE DROPDOWN
Task: Buy mens Nike Air Max from all products, the one with the lowest price add to cart and buy
FAILURE CAUSE ANALYSIS BUCKET
failure?
Det algorithm
AI-powered Algorithm
Hybrid failure cause analysis pipeline
Dashboard
HYBRID APPROACH PIPELINE AND DEMO
Deterministic algorithm
AI-powered analysis
RESULTS AND WRITING
Visuals: https://claude.ai/public/artifacts/ad0f5fa1-b872-4f0f-8013-75fe5566d74b
Doc: https://docs.google.com/document/d/18gTZntVpGzIVV5Ne-1wn8DB-Q2oiignv-ogUFn64EJQ/edit?usp=sharing
overleaf:
Githubs:
https://github.com/bulia-keshav/Element-variation-websites
https://github.com/hassan-byt0/Webagent_RCA_pipelines