Blog

WebVoyager Benchmark Results

Browserable - Open source state of the art browser agent

Browserable has achieved 90.4% on the WebVoyager benchmark. This is best-in-class performance across all web agents. This was done across 567 web tasks which are part of the benchmark.

Browserable is fully open source and self-hostable. You can check out the Github repo here.

We ran the benchmark evaluation with primary LLM as Gemini 2.0 Flash and backup LLMs as GPT-4o and Claude 3.5 Sonnet (for rate limiting).

50%
Web Voyager
52%
Computer Use
67%
Runner H 0.1
87%
Operator
90.4%
Browserable

Benchmark Details

We have removed 56 tasks from the original task list for the following reasons:

  • Some tasks are outdated or no longer valid because of website updates.
  • Some tasks are not possible to do with a bare bones browser agent without human intervention (like logging in to a website).
  • Wherever possible, we have updated the task to be more relevant to the current state of the web. (e.g. updating the date in a flight booking website).

Coming Soon

  • Shareable links for individual runs
  • Full list of results

The total LLM cost of running the 567 tasks was ~70 USD.

What is the WebVoyager?

The WebVoyager benchmark is a standardized evaluation suite designed to measure the real-world browsing capabilities of autonomous agents. It tests how well an agent can complete tasks in real browser environments—like navigating a website, filling forms, clicking buttons, and extracting information—without human assistance on live websites like Amazon, Apple, Google flights etc.