WebVoyager Benchmark Results

Browserable has achieved 90.4% on the WebVoyager benchmark. This is best-in-class performance across all web agents. This was done across 567 web tasks which are part of the benchmark.

Browserable is fully open source and self-hostable. You can check out the Github repo here.

We ran the benchmark evaluation with primary LLM as Gemini 2.0 Flash and backup LLMs as GPT-4o and Claude 3.5 Sonnet (for rate limiting).

50%

Web Voyager

52%

Computer Use

67%

Runner H 0.1

87%

Operator

90.4%

Browserable

Benchmark Details

We have removed 56 tasks from the original task list for the following reasons:

Some tasks are outdated or no longer valid because of website updates.
Some tasks are not possible to do with a bare bones browser agent without human intervention (like logging in to a website).
Wherever possible, we have updated the task to be more relevant to the current state of the web. (e.g. updating the date in a flight booking website).

Coming Soon

Shareable links for individual runs
Full list of results

The total LLM cost of running the 567 tasks was ~70 USD.

What is the WebVoyager?

The WebVoyager benchmark is a standardized evaluation suite designed to measure the real-world browsing capabilities of autonomous agents. It tests how well an agent can complete tasks in real browser environments—like navigating a website, filling forms, clicking buttons, and extracting information—without human assistance on live websites like Amazon, Apple, Google flights etc.