[Add] browser-use and main.py
This commit is contained in:
parent
08e64bdf45
commit
96914d44ac
221 changed files with 30952 additions and 1 deletions
48
browser-use/docs/development/evaluations.mdx
Normal file
48
browser-use/docs/development/evaluations.mdx
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
title: "Evaluations"
|
||||
description: "Test the Browser Use agent on standardized benchmarks"
|
||||
icon: "chart-bar"
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Browser Use uses proprietary/private test sets that must never be committed to Github and must be fetched through a authorized api request.
|
||||
Accessing these test sets requires an approved Browser Use account.
|
||||
There are currently no publicly available test sets, but some may be released in the future.
|
||||
|
||||
## Get an Api Access Key
|
||||
|
||||
First, navigate to https://browser-use.tools and log in with an authorized browser use account.
|
||||
|
||||
Then, click the "Account" button at the top right of the page, and click the "Cycle New Key" button on that page.
|
||||
|
||||
Copy the resulting url and secret key into your `.env` file. It should look like this:
|
||||
|
||||
```bash .env
|
||||
EVALUATION_TOOL_URL= ...
|
||||
EVALUATION_TOOL_SECRET_KEY= ...
|
||||
```
|
||||
|
||||
## Running Evaluations
|
||||
|
||||
First, ensure your file `eval/service.py` is up to date.
|
||||
|
||||
Then run the file:
|
||||
|
||||
```bash
|
||||
python eval/service.py
|
||||
```
|
||||
|
||||
## Configuring Evaluations
|
||||
|
||||
You can modify the evaluation by providing flags to the evaluation script. For instance:
|
||||
|
||||
```bash
|
||||
python eval/service.py --parallel_runs 5 --parallel_evaluations 5 --max-steps 25 --start 0 --end 100 --model gpt-4o
|
||||
```
|
||||
|
||||
The evaluations webpage has a convenient GUI for generating these commands. To use it, navigate to https://browser-use.tools/dashboard.
|
||||
|
||||
Then click the button "New Eval Run" on the left panel. This will open a interface with selectors, inputs, sliders, and switches.
|
||||
|
||||
Input your desired configuration into the interface and copy the resulting python command at the bottom. Then run this command as before.
|
||||
Loading…
Add table
Add a link
Reference in a new issue