Add comprehensive documentation for Browser Use features

- Introduced custom output format instructions with example code. - Detailed connection methods for launching and connecting to browsers, including local and remote options. - Provided guidelines for handling sensitive data securely, including best practices and examples. - Documented supported LangChain chat models with setup instructions and environment variable requirements. - Added instructions for customizing the system prompt to control agent behavior.
2026-06-04 06:21:52 +09:00 · 2025-06-21 16:18:16 +09:00 · 2025-06-21 16:18:16 +09:00 · 638a3d47ce
commit 638a3d47ce
parent 34ee66b4e8
10 changed files with 3056 additions and 0 deletions
--- a/.github/instructions/real-browser.instructions.md
+++ b/.github/instructions/real-browser.instructions.md
@ -0,0 +1,414 @@
+---
+description: "Connect to a remote browser or launch a new local browser."
+applyTo: '**'
+---
+
+## Overview
+
+Browser Use supports a wide variety of ways to launch or connect to a browser:
+
+- Launch a new local browser using playwright/patchright chromium (the default)
+- Connect to a remote browser using CDP or WSS
+- Use an existing playwright `Page`, `Browser`, or `BrowserContext` object
+- Connect to a local browser already running using `browser_pid`
+
+<Tip>
+Don't want to manage your own browser infrastructure? Try [☁️ Browser Use Cloud](https://browser-use.com) ➡️  
+
+We provide automatic CAPTCHA solving, proxies, human-in-the-loop automation, and more!
+</Tip>
+
+## Connection Methods
+
+### Method A: Launch a New Local Browser (Default)
+
+Launch a local browser using built-in default (playwright `chromium`) or a provided `executable_path`:
+
+```python
+from browser_use import Agent, BrowserSession
+
+# If no executable_path provided, uses Playwright/Patchright's built-in Chromium
+browser_session = BrowserSession(
+    # Path to a specific Chromium-based executable (optional)
+    executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',  # macOS
+    # For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
+    # For Linux: '/usr/bin/google-chrome'
+    
+    # Use a specific data directory on disk (optional, set to None for incognito)
+    user_data_dir='~/.config/browseruse/profiles/default',   # this is the default
+    # ... any other BrowserProfile or playwright launch_persistnet_context config...
+    # headless=False,
+)
+
+agent = Agent(
+    task="Your task here",
+    llm=llm,
+    browser_session=browser_session,
+)
+```
+
+We support most `chromium`-based browsers in `executable_path`, including [Brave](https://github.com/browser-use/browser-use/tree/main/examples/browser/stealth.py), [patchright chromium](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright), [rebrowser](https://rebrowser.net/), Edge, and more. See [`examples/browser/stealth.py`](https://github.com/browser-use/browser-use/tree/main/examples/browser) for more. We do not support Firefox or Safari at the moment.
+
+<Warning>
+    [As of Chrome v136](https://github.com/browser-use/browser-use/issues/1520), driving browsers with the default profile is [no longer supported](https://developer.chrome.com/blog/remote-debugging-port) for security reasons. Browser-Use has transitioned to creating a new dedicated profile for agents in: `~/.config/browseruse/profiles/default`. You can [open this profile](https://superuser.com/questions/377186/how-do-i-start-chrome-using-a-specified-user-profile) and log into everything you need your agent to have access to, and it will persist over time.
+</Warning>
+
+### Method B: Connect Using Existing Playwright Objects
+
+Pass existing Playwright `Page`, `BrowserContext`, `Browser`, and/or `playwright` API object to `BrowserSession(...)`:
+
+```python
+from browser_use import Agent, BrowserSession
+from playwright.async_api import async_playwright
+# from patchright.async_api import async_playwright   # stealth alternative
+
+async with async_playwright() as playwright:
+    browser = await playwright.chromium.launch()
+    context = await browser.new_context()
+    page = await context.new_page()
+    
+    browser_session = BrowserSession(
+        page=page,
+        # browser_context=context,  # all these are supported
+        # browser=browser,
+        # playwright=playwright,
+    )
+
+    agent = Agent(
+        task="Your task here",
+        llm=llm,
+        browser_session=browser_session,
+    )
+```
+
+You can also pass `page` directly to `Agent(...)` as a shortcut.
+
+```python
+agent = Agent(
+    task="Your task here",
+    llm=llm,
+    page=page,
+)
+```
+
+### Method C: Connect to Local Browser Using Browser PID
+
+Connect to a browser with open `--remote-debugging-port`:
+
+```python
+from browser_use import Agent, BrowserSession
+
+# First, start Chrome with remote debugging:
+# /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-port=9242
+
+# Then connect using the process ID
+browser_session = BrowserSession(browser_pid=12345)  # Replace with actual Chrome PID
+
+agent = Agent(
+    task="Your task here",
+    llm=llm,
+    browser_session=browser_session,
+)
+```
+
+### Method D: Connect to remote Playwright Node.js Browser Server via WSS URL
+
+Connect to Playwright Node.js server providers:
+
+```python
+from browser_use import Agent, BrowserSession
+
+# Connect to a playwright server
+browser_session = BrowserSession(wss_url="wss://your-playwright-server.com/ws")
+
+agent = Agent(
+    task="Your task here",
+    llm=llm,
+    browser_session=browser_session,
+)
+```
+
+### Method E: Connect to Remote Browser via CDP URL
+
+Connect to any remote Chromium-based browser:
+
+```python
+from browser_use import Agent, BrowserSession
+
+# Connect to Chrome via CDP
+browser_session = BrowserSession(cdp_url="http://localhost:9222")
+
+agent = Agent(
+    task="Your task here",
+    llm=llm,
+    browser_session=browser_session,
+)
+```
+
+
+
+## Security Considerations
+
+<Warning>
+  When using any browser profile, the agent will have access to:
+  - All its logged-in sessions and cookies
+  - Saved passwords (if autofill is enabled)
+  - Browser history and bookmarks
+  - Extensions and their data
+  
+  Always review the task you're giving to the agent and ensure it aligns with your security requirements!
+  Use `Agent(sensitive_data={'https://auth.example.com': {x_key: value}})` for any secrets, and restrict the browser with `BrowserSession(allowed_domains=['https://*.example.com'])`.
+</Warning>
+
+## Best Practices
+
+1. **Use isolated profiles**: Create separate Chrome profiles for different agents to limit scope of risk:
+   ```python
+   browser_session = BrowserSession(
+       user_data_dir='~/.config/browseruse/profiles/banking',
+       # profile_directory='Default'
+   )
+   ```
+
+2. **Limit domain access**: Restrict which sites the agent can visit:
+   ```python
+   browser_session = BrowserSession(
+       allowed_domains=['example.com', 'http*://*.github.com'],
+   )
+   ```
+
+3. **Enable `keep_alive=True`** If you want to use a single `BrowserSession` with more than one agent:
+   ```python
+   browser_session = BrowserSession(
+       keep_alive=True,
+       ...
+   )
+   await browser_session.start()  # start the session yourself before passing to Agent
+   ...
+   agent = Agent(..., browser_session=browser_session)
+   await agent.run()
+   ...
+   await browser_session.kill()   # end the session yourself, shortcut for keep_alive=False + .stop()
+   ```
+
+## Re-Using a Browser
+
+A `BrowserSession` starts when the browser is launched/connected, and ends when the browser process exits/disconnects. A session internally manages a single live playwright browser context, and is normally auto-closed by the agent when its task is complete (*if* the agent started the session itself). If you pass an existing `BrowserSession` into an Agent, or if you set `BrowserSession(keep_alive=True)`, the session will not be closed and can be re-used between agents.
+
+Browser Use provides a number of ways to re-use profiles, sessions, and other configuration across multiple agents.
+
+- ✅ sequential agents can re-use a single `user_data_dir` in new `BrowserSession`s
+- ✅ sequential agents can re-use a single `BrowserSession` without closing it
+- ❌ parallel agents cannot run separate `BrowserSession`s using the same `user_data_dir`
+- ✅ parallel agents can run separate `BrowserSession`s using the same `storage_state`
+- ✅ parallel agents can share a single `BrowserSession`, working in different tabs
+- ⚠️ parallel agents can share a single `BrowserSession`, working in the same tab
+
+<Important>
+Multiple `BrowserSession`s (aka chrome processes) cannot share the same `user_data_dir` at the same time, but they can share a `storage_state` file or `BrowserProfile` config.
+</Important>
+
+### Sequential Agents, Same Profile, Different Browser
+
+If you are only running one agent & browser at a time, they can re-use the same `user_data_dir` sequentially.
+
+```python
+from browser_use import Agent, BrowserSession
+from langchain_openai import ChatOpenAI
+
+reused_profile = BrowserProfile(user_data_dir='~/.config/browseruse/profiles/default')
+
+agent1 = Agent(
+    task="The first task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_profile=reused_profile,    # pass the profile in, it will auto-create a session
+)
+await agent1.run()
+
+agent2 = Agent(
+    task="The second task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_profile=reused_profile,    # agent will auto-create its own new session
+)
+await agent2.run()
+```
+
+> Make sure to never mix different browser versions or `executable_path`s with the same `user_data_dir`. Once run with a newer browser version, some migrations are applied to the dir and older browsers wont be able to read it.
+
+### Sequential Agents, Same Profile, Same Browser
+
+If you are only running one agent at a time, they can re-use the same active `BrowserSession` and avoid having to relaunch chrome.
+Each agent will start off looking at the same tab the last agent ended off on.
+
+```python
+from browser_use import Agent, BrowserSession
+from langchain_openai import ChatOpenAI
+
+reused_session = BrowserSession(
+    user_data_dir='~/.config/browseruse/profiles/default',
+    keep_alive=True,  # dont close browser after 1st agent.run() ends
+)
+await reused_session.start()   # when keep_alive=True, session must be started manually
+
+agent1 = Agent(
+    task="The first task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=reused_session,
+)
+await agent1.run()
+
+agent2 = Agent(
+    task="The second task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=reused_session,      # re-use the same session
+)
+await agent2.run()
+
+await reused_session.close()
+```
+
+### Parallel Agents, Same Browser, Multiple Tabs
+
+```python
+from browser_use import Agent, BrowserSession
+from langchain_openai import ChatOpenAI
+
+shared_browser = BrowserSession(
+    storage_state='/tmp/cookies.json',
+    user_data_dir=None,
+    keep_alive=True,
+    headless=True,
+)
+await shared_browser.start()   # when keep_alive=True, you must start the session yourself
+
+agent1 = Agent(
+    task="The first task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=shared_browser,              # pass the session in
+)
+agent2 = Agent(
+    task="The second task...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=shared_browser,              # re-use the same session
+)
+await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
+
+await shared_browser.close()
+```
+
+### Parallel Agents, Same Browser, Same Tab
+
+<Warning>
+⚠️ This mode is not recommended. Agents are not yet optimized to share the same tab in the same browser, they may interfere with each other or cause errors.
+</Warning>
+
+
+```python
+from browser_use import Agent, BrowserSession
+from langchain_openai import ChatOpenAI
+from playwright.async_api import async_playwright
+
+playwright = await async_playwright().start()
+browser = await playwright.chromium.launch(headless=True)
+context = await browser.new_context()
+shared_page = await context.new_page()
+await shared_page.goto('https://example.com', wait_until='domcontentloaded')
+
+shared_session = BrowserSession(page=shared_page, keep_alive=True)
+await shared_session.start()
+
+agent1 = Agent(
+    task="Fill out the form in section A...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=shared_session
+)
+agent2 = Agent(
+    task="Fill out the form in section B...",
+    llm=ChatOpenAI(model="gpt-4o-mini"),
+    browser_session=shared_session,
+)
+await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
+
+await shared_session.kill()
+```
+
+### Parallel Agents, Same Profile, Different Browsers
+
+<Tip>
+This mode is the recommended default.
+</Tip>
+
+To share a single set of configuration or cookies, but still have agents working in their own browser sessions (potentially in parallel), use our provided `BrowserProfile` object.
+
+The recommended way to re-use cookies and localStorage state between separate parallel sessions is to use the [`storage_state`](https://docs.browser-use.com/customize/browser-settings#storage-state) option.
+
+```bash
+# open a browser to log into sites you want the Agent to have access to
+playwright open https://example.com/ --save-storage=/tmp/auth.json
+playwright open https://example.com/ --load-storage=/tmp/auth.json
+```
+
+```python
+from browser_use.browser import BrowserProfile, BrowserSession
+
+shared_profile = BrowserProfile(
+    headless=True,
+    user_data_dir=None,               # use dedicated tmp user_data_dir per session
+    storage_state='/tmp/auth.json',   # load/save cookies to/from json file
+    keep_alive=True,                  # don't close the browser after the agent finishes
+)
+
+window1 = BrowserSession(browser_profile=profile_a)
+await window1.start()
+agent1 = Agent(browser_session=window1)
+
+window2 = BrowserSession(browser_profile=profile_a)
+await window2.start()
+agent2 = Agent(browser_session=window2)
+
+await asyncio.gather(agent1.run(), agent2.run())  # run in parallel
+await window1.save_storage_state()  # write storage state (cookies, localStorage, etc.) to auth.json
+await window2.save_storage_state()  # you must decide when to save manually
+
+# can also reload the cookies from the file into the active session if they change
+await window1.load_storage_state()
+await window1.close()
+await window2.close()
+```
+
+---
+
+## Troubleshooting
+
+### Chrome Won't Connect
+
+If you're having trouble connecting:
+
+1. **Close all Chrome instances** before trying to launch with a custom profile
+2. **Check if Chrome is running with debugging port**:
+   ```bash
+   ps aux | grep chrome | grep remote-debugging-port
+   ```
+3. **Verify the executable path** is correct for your system
+4. **Check profile permissions** - ensure your user has read/write access
+
+### Profile Lock Issues
+
+If you get a "profile is already in use" error:
+
+1. Close all Chrome instances
+2. The profile will automatically be unlocked when BrowserSession starts
+3. Alternatively, manually delete the `SingletonLock` file in the profile directory
+
+<Note>
+  For more configuration options, see the [Browser Settings](/customize/browser-settings) documentation.
+</Note>
+
+### Profile Version Issues
+
+The browser version you run must always be equal to or greater than the version used to create the `user_data_dir`.
+If you see errors like `Failed to parse Extensions` when launching, you're likely attempting to run an older browser with an incompatible `user_data_dir` that's already been migrated to a newer Chrome version.
+
+Playwright ships a version of chromium that's newer than the default stable Google Chrome release channel, so this can happen if you try to use
+a profile created by the default playwright chromium (e.g. `user_data_dir='~/.config/browseruse/profiles/default'`) with an older
+local browser like `executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'`.