mirror of
https://github.com/j93es/browser-use-oauth.git
synced 2026-06-04 06:21:52 +09:00
Add comprehensive documentation for Browser Use features
- Introduced custom output format instructions with example code. - Detailed connection methods for launching and connecting to browsers, including local and remote options. - Provided guidelines for handling sensitive data securely, including best practices and examples. - Documented supported LangChain chat models with setup instructions and environment variable requirements. - Added instructions for customizing the system prompt to control agent behavior.
This commit is contained in:
parent
34ee66b4e8
commit
638a3d47ce
10 changed files with 3056 additions and 0 deletions
414
.github/instructions/real-browser.instructions.md
vendored
Normal file
414
.github/instructions/real-browser.instructions.md
vendored
Normal file
|
|
@ -0,0 +1,414 @@
|
|||
---
|
||||
description: "Connect to a remote browser or launch a new local browser."
|
||||
applyTo: '**'
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Browser Use supports a wide variety of ways to launch or connect to a browser:
|
||||
|
||||
- Launch a new local browser using playwright/patchright chromium (the default)
|
||||
- Connect to a remote browser using CDP or WSS
|
||||
- Use an existing playwright `Page`, `Browser`, or `BrowserContext` object
|
||||
- Connect to a local browser already running using `browser_pid`
|
||||
|
||||
<Tip>
|
||||
Don't want to manage your own browser infrastructure? Try [☁️ Browser Use Cloud](https://browser-use.com) ➡️
|
||||
|
||||
We provide automatic CAPTCHA solving, proxies, human-in-the-loop automation, and more!
|
||||
</Tip>
|
||||
|
||||
## Connection Methods
|
||||
|
||||
### Method A: Launch a New Local Browser (Default)
|
||||
|
||||
Launch a local browser using built-in default (playwright `chromium`) or a provided `executable_path`:
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
|
||||
# If no executable_path provided, uses Playwright/Patchright's built-in Chromium
|
||||
browser_session = BrowserSession(
|
||||
# Path to a specific Chromium-based executable (optional)
|
||||
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', # macOS
|
||||
# For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
|
||||
# For Linux: '/usr/bin/google-chrome'
|
||||
|
||||
# Use a specific data directory on disk (optional, set to None for incognito)
|
||||
user_data_dir='~/.config/browseruse/profiles/default', # this is the default
|
||||
# ... any other BrowserProfile or playwright launch_persistnet_context config...
|
||||
# headless=False,
|
||||
)
|
||||
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
browser_session=browser_session,
|
||||
)
|
||||
```
|
||||
|
||||
We support most `chromium`-based browsers in `executable_path`, including [Brave](https://github.com/browser-use/browser-use/tree/main/examples/browser/stealth.py), [patchright chromium](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright), [rebrowser](https://rebrowser.net/), Edge, and more. See [`examples/browser/stealth.py`](https://github.com/browser-use/browser-use/tree/main/examples/browser) for more. We do not support Firefox or Safari at the moment.
|
||||
|
||||
<Warning>
|
||||
[As of Chrome v136](https://github.com/browser-use/browser-use/issues/1520), driving browsers with the default profile is [no longer supported](https://developer.chrome.com/blog/remote-debugging-port) for security reasons. Browser-Use has transitioned to creating a new dedicated profile for agents in: `~/.config/browseruse/profiles/default`. You can [open this profile](https://superuser.com/questions/377186/how-do-i-start-chrome-using-a-specified-user-profile) and log into everything you need your agent to have access to, and it will persist over time.
|
||||
</Warning>
|
||||
|
||||
### Method B: Connect Using Existing Playwright Objects
|
||||
|
||||
Pass existing Playwright `Page`, `BrowserContext`, `Browser`, and/or `playwright` API object to `BrowserSession(...)`:
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
from playwright.async_api import async_playwright
|
||||
# from patchright.async_api import async_playwright # stealth alternative
|
||||
|
||||
async with async_playwright() as playwright:
|
||||
browser = await playwright.chromium.launch()
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
browser_session = BrowserSession(
|
||||
page=page,
|
||||
# browser_context=context, # all these are supported
|
||||
# browser=browser,
|
||||
# playwright=playwright,
|
||||
)
|
||||
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
browser_session=browser_session,
|
||||
)
|
||||
```
|
||||
|
||||
You can also pass `page` directly to `Agent(...)` as a shortcut.
|
||||
|
||||
```python
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
page=page,
|
||||
)
|
||||
```
|
||||
|
||||
### Method C: Connect to Local Browser Using Browser PID
|
||||
|
||||
Connect to a browser with open `--remote-debugging-port`:
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
|
||||
# First, start Chrome with remote debugging:
|
||||
# /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-port=9242
|
||||
|
||||
# Then connect using the process ID
|
||||
browser_session = BrowserSession(browser_pid=12345) # Replace with actual Chrome PID
|
||||
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
browser_session=browser_session,
|
||||
)
|
||||
```
|
||||
|
||||
### Method D: Connect to remote Playwright Node.js Browser Server via WSS URL
|
||||
|
||||
Connect to Playwright Node.js server providers:
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
|
||||
# Connect to a playwright server
|
||||
browser_session = BrowserSession(wss_url="wss://your-playwright-server.com/ws")
|
||||
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
browser_session=browser_session,
|
||||
)
|
||||
```
|
||||
|
||||
### Method E: Connect to Remote Browser via CDP URL
|
||||
|
||||
Connect to any remote Chromium-based browser:
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
|
||||
# Connect to Chrome via CDP
|
||||
browser_session = BrowserSession(cdp_url="http://localhost:9222")
|
||||
|
||||
agent = Agent(
|
||||
task="Your task here",
|
||||
llm=llm,
|
||||
browser_session=browser_session,
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Security Considerations
|
||||
|
||||
<Warning>
|
||||
When using any browser profile, the agent will have access to:
|
||||
- All its logged-in sessions and cookies
|
||||
- Saved passwords (if autofill is enabled)
|
||||
- Browser history and bookmarks
|
||||
- Extensions and their data
|
||||
|
||||
Always review the task you're giving to the agent and ensure it aligns with your security requirements!
|
||||
Use `Agent(sensitive_data={'https://auth.example.com': {x_key: value}})` for any secrets, and restrict the browser with `BrowserSession(allowed_domains=['https://*.example.com'])`.
|
||||
</Warning>
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use isolated profiles**: Create separate Chrome profiles for different agents to limit scope of risk:
|
||||
```python
|
||||
browser_session = BrowserSession(
|
||||
user_data_dir='~/.config/browseruse/profiles/banking',
|
||||
# profile_directory='Default'
|
||||
)
|
||||
```
|
||||
|
||||
2. **Limit domain access**: Restrict which sites the agent can visit:
|
||||
```python
|
||||
browser_session = BrowserSession(
|
||||
allowed_domains=['example.com', 'http*://*.github.com'],
|
||||
)
|
||||
```
|
||||
|
||||
3. **Enable `keep_alive=True`** If you want to use a single `BrowserSession` with more than one agent:
|
||||
```python
|
||||
browser_session = BrowserSession(
|
||||
keep_alive=True,
|
||||
...
|
||||
)
|
||||
await browser_session.start() # start the session yourself before passing to Agent
|
||||
...
|
||||
agent = Agent(..., browser_session=browser_session)
|
||||
await agent.run()
|
||||
...
|
||||
await browser_session.kill() # end the session yourself, shortcut for keep_alive=False + .stop()
|
||||
```
|
||||
|
||||
## Re-Using a Browser
|
||||
|
||||
A `BrowserSession` starts when the browser is launched/connected, and ends when the browser process exits/disconnects. A session internally manages a single live playwright browser context, and is normally auto-closed by the agent when its task is complete (*if* the agent started the session itself). If you pass an existing `BrowserSession` into an Agent, or if you set `BrowserSession(keep_alive=True)`, the session will not be closed and can be re-used between agents.
|
||||
|
||||
Browser Use provides a number of ways to re-use profiles, sessions, and other configuration across multiple agents.
|
||||
|
||||
- ✅ sequential agents can re-use a single `user_data_dir` in new `BrowserSession`s
|
||||
- ✅ sequential agents can re-use a single `BrowserSession` without closing it
|
||||
- ❌ parallel agents cannot run separate `BrowserSession`s using the same `user_data_dir`
|
||||
- ✅ parallel agents can run separate `BrowserSession`s using the same `storage_state`
|
||||
- ✅ parallel agents can share a single `BrowserSession`, working in different tabs
|
||||
- ⚠️ parallel agents can share a single `BrowserSession`, working in the same tab
|
||||
|
||||
<Important>
|
||||
Multiple `BrowserSession`s (aka chrome processes) cannot share the same `user_data_dir` at the same time, but they can share a `storage_state` file or `BrowserProfile` config.
|
||||
</Important>
|
||||
|
||||
### Sequential Agents, Same Profile, Different Browser
|
||||
|
||||
If you are only running one agent & browser at a time, they can re-use the same `user_data_dir` sequentially.
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
reused_profile = BrowserProfile(user_data_dir='~/.config/browseruse/profiles/default')
|
||||
|
||||
agent1 = Agent(
|
||||
task="The first task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_profile=reused_profile, # pass the profile in, it will auto-create a session
|
||||
)
|
||||
await agent1.run()
|
||||
|
||||
agent2 = Agent(
|
||||
task="The second task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_profile=reused_profile, # agent will auto-create its own new session
|
||||
)
|
||||
await agent2.run()
|
||||
```
|
||||
|
||||
> Make sure to never mix different browser versions or `executable_path`s with the same `user_data_dir`. Once run with a newer browser version, some migrations are applied to the dir and older browsers wont be able to read it.
|
||||
|
||||
### Sequential Agents, Same Profile, Same Browser
|
||||
|
||||
If you are only running one agent at a time, they can re-use the same active `BrowserSession` and avoid having to relaunch chrome.
|
||||
Each agent will start off looking at the same tab the last agent ended off on.
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
reused_session = BrowserSession(
|
||||
user_data_dir='~/.config/browseruse/profiles/default',
|
||||
keep_alive=True, # dont close browser after 1st agent.run() ends
|
||||
)
|
||||
await reused_session.start() # when keep_alive=True, session must be started manually
|
||||
|
||||
agent1 = Agent(
|
||||
task="The first task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=reused_session,
|
||||
)
|
||||
await agent1.run()
|
||||
|
||||
agent2 = Agent(
|
||||
task="The second task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=reused_session, # re-use the same session
|
||||
)
|
||||
await agent2.run()
|
||||
|
||||
await reused_session.close()
|
||||
```
|
||||
|
||||
### Parallel Agents, Same Browser, Multiple Tabs
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
shared_browser = BrowserSession(
|
||||
storage_state='/tmp/cookies.json',
|
||||
user_data_dir=None,
|
||||
keep_alive=True,
|
||||
headless=True,
|
||||
)
|
||||
await shared_browser.start() # when keep_alive=True, you must start the session yourself
|
||||
|
||||
agent1 = Agent(
|
||||
task="The first task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=shared_browser, # pass the session in
|
||||
)
|
||||
agent2 = Agent(
|
||||
task="The second task...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=shared_browser, # re-use the same session
|
||||
)
|
||||
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
|
||||
|
||||
await shared_browser.close()
|
||||
```
|
||||
|
||||
### Parallel Agents, Same Browser, Same Tab
|
||||
|
||||
<Warning>
|
||||
⚠️ This mode is not recommended. Agents are not yet optimized to share the same tab in the same browser, they may interfere with each other or cause errors.
|
||||
</Warning>
|
||||
|
||||
|
||||
```python
|
||||
from browser_use import Agent, BrowserSession
|
||||
from langchain_openai import ChatOpenAI
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
playwright = await async_playwright().start()
|
||||
browser = await playwright.chromium.launch(headless=True)
|
||||
context = await browser.new_context()
|
||||
shared_page = await context.new_page()
|
||||
await shared_page.goto('https://example.com', wait_until='domcontentloaded')
|
||||
|
||||
shared_session = BrowserSession(page=shared_page, keep_alive=True)
|
||||
await shared_session.start()
|
||||
|
||||
agent1 = Agent(
|
||||
task="Fill out the form in section A...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=shared_session
|
||||
)
|
||||
agent2 = Agent(
|
||||
task="Fill out the form in section B...",
|
||||
llm=ChatOpenAI(model="gpt-4o-mini"),
|
||||
browser_session=shared_session,
|
||||
)
|
||||
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
|
||||
|
||||
await shared_session.kill()
|
||||
```
|
||||
|
||||
### Parallel Agents, Same Profile, Different Browsers
|
||||
|
||||
<Tip>
|
||||
This mode is the recommended default.
|
||||
</Tip>
|
||||
|
||||
To share a single set of configuration or cookies, but still have agents working in their own browser sessions (potentially in parallel), use our provided `BrowserProfile` object.
|
||||
|
||||
The recommended way to re-use cookies and localStorage state between separate parallel sessions is to use the [`storage_state`](https://docs.browser-use.com/customize/browser-settings#storage-state) option.
|
||||
|
||||
```bash
|
||||
# open a browser to log into sites you want the Agent to have access to
|
||||
playwright open https://example.com/ --save-storage=/tmp/auth.json
|
||||
playwright open https://example.com/ --load-storage=/tmp/auth.json
|
||||
```
|
||||
|
||||
```python
|
||||
from browser_use.browser import BrowserProfile, BrowserSession
|
||||
|
||||
shared_profile = BrowserProfile(
|
||||
headless=True,
|
||||
user_data_dir=None, # use dedicated tmp user_data_dir per session
|
||||
storage_state='/tmp/auth.json', # load/save cookies to/from json file
|
||||
keep_alive=True, # don't close the browser after the agent finishes
|
||||
)
|
||||
|
||||
window1 = BrowserSession(browser_profile=profile_a)
|
||||
await window1.start()
|
||||
agent1 = Agent(browser_session=window1)
|
||||
|
||||
window2 = BrowserSession(browser_profile=profile_a)
|
||||
await window2.start()
|
||||
agent2 = Agent(browser_session=window2)
|
||||
|
||||
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
|
||||
await window1.save_storage_state() # write storage state (cookies, localStorage, etc.) to auth.json
|
||||
await window2.save_storage_state() # you must decide when to save manually
|
||||
|
||||
# can also reload the cookies from the file into the active session if they change
|
||||
await window1.load_storage_state()
|
||||
await window1.close()
|
||||
await window2.close()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Chrome Won't Connect
|
||||
|
||||
If you're having trouble connecting:
|
||||
|
||||
1. **Close all Chrome instances** before trying to launch with a custom profile
|
||||
2. **Check if Chrome is running with debugging port**:
|
||||
```bash
|
||||
ps aux | grep chrome | grep remote-debugging-port
|
||||
```
|
||||
3. **Verify the executable path** is correct for your system
|
||||
4. **Check profile permissions** - ensure your user has read/write access
|
||||
|
||||
### Profile Lock Issues
|
||||
|
||||
If you get a "profile is already in use" error:
|
||||
|
||||
1. Close all Chrome instances
|
||||
2. The profile will automatically be unlocked when BrowserSession starts
|
||||
3. Alternatively, manually delete the `SingletonLock` file in the profile directory
|
||||
|
||||
<Note>
|
||||
For more configuration options, see the [Browser Settings](/customize/browser-settings) documentation.
|
||||
</Note>
|
||||
|
||||
### Profile Version Issues
|
||||
|
||||
The browser version you run must always be equal to or greater than the version used to create the `user_data_dir`.
|
||||
If you see errors like `Failed to parse Extensions` when launching, you're likely attempting to run an older browser with an incompatible `user_data_dir` that's already been migrated to a newer Chrome version.
|
||||
|
||||
Playwright ships a version of chromium that's newer than the default stable Google Chrome release channel, so this can happen if you try to use
|
||||
a profile created by the default playwright chromium (e.g. `user_data_dir='~/.config/browseruse/profiles/default'`) with an older
|
||||
local browser like `executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'`.
|
||||
Loading…
Add table
Add a link
Reference in a new issue