OpenMCP: A Reproducible Benchmarking Harness for Evaluating Computer-Use Agents on Chameleon Cloud
Agustin Leon & Anup Raj Niroula
New York University (NYU)
Agenda
(1) Brief intro | (2) Existing benchmarks | (3) OpenMCP architecture | (4) Running experiments | (5) Results | (6) Demo
MCP protocol popularity has grown rapidly
Source: Pulse MCP, https://www.pulsemcp.com/statistics
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
Why this topic?
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
What MCP benchmarks are out there?
Framework | GUI Support | Deterministic Evaluation | Local Model Support | Portable Infrastructure | Infrastructure Telemetry |
MCPBench | ✗ | ✗ | ✗ | ✗ | ✗ |
LiveMCPBench | ✗ | ✗ | ✗ | ✗ | ✗ |
MCP Universe | ✗ | ✓ | ✓ | ✗ | ✗ |
MCPWorld | ✓ | ✓ | ✗ | ✗ | ✗ |
OpenMCP | ✓ | ✓ | ✓ | ✓ | ✓ |
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
An overview of OpenMCP’s architecture
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
Running experiments on OpenMCP
1) Some possible research questions
2) Experiments Configuration
3) Experiment Execution
4) Output Artifacts
One config file sets everything up
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
Some of the data and results we collected
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
Time for a quick demo
1. Brief intro | 2. Existing benchmarks | 3. OpenMCP architecture | 4. Running experiments | 5. Results | 6. Demo
The role of Chameleon in all of this
Chameleon was core to the project
Challenges?
Summary
Link to Trovi artifact
Agustin Leon, Anup Raj Niroula, and Fraida Fund. 2026. OpenMCP: an open-source self-hosted benchmarking harness for MCP-enabled computer use agents. In The 6th Workshop on Machine Learning and Systems (EuroMLSys ’26), April 27–30, 2026, Edinburgh, Scotland, UK.