Add comprehensive documentation for Browser Use features

- Introduced custom output format instructions with example code. - Detailed connection methods for launching and connecting to browsers, including local and remote options. - Provided guidelines for handling sensitive data securely, including best practices and examples. - Documented supported LangChain chat models with setup instructions and environment variable requirements. - Added instructions for customizing the system prompt to control agent behavior.
2026-06-04 06:21:52 +09:00 · 2025-06-21 16:18:16 +09:00 · 2025-06-21 16:18:16 +09:00 · 638a3d47ce
commit 638a3d47ce
parent 34ee66b4e8
10 changed files with 3056 additions and 0 deletions
--- a/.github/instructions/custom-functions.instructions.md
+++ b/.github/instructions/custom-functions.instructions.md
@ -0,0 +1,249 @@
+---
+description: "Extend default agent and write custom action functions to do certain tasks"
+applyTo: '**'
+---
+
+Custom actions are functions *you* provide, that are added to our [default actions](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) the agent can use to accomplish tasks.
+Action functions can request [arbitrary parameters](#action-parameters-via-pydantic-model) that the LLM has to come up with + a fixed set of [framework-provided arguments](#framework-provided-parameters) for browser APIs / `Agent(context=...)` / etc.
+
+<Note>
+  Our default set of actions is already quite powerful, the built-in `Controller` provides basics like `open_tab`, `scroll_down`, `extract_content`, [and more](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py).
+</Note>
+
+It's easy to add your own actions to implement additional custom behaviors, integrations with other apps, or performance optimizations.
+
+For examples of custom actions (e.g. uploading files, asking a human-in-the-loop for help, drawing a polygon with the mouse, and more), see [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).
+
+
+## Action Function Registration
+
+To register your own custom functions (which can be `sync` or `async`), decorate them with the `@controller.action(...)` decorator. This saves them into the `controller.registry`.
+
+```python
+from browser_use import Controller, ActionResult
+
+controller = Controller()
+
+@controller.action('Ask human for help with a question', domains=['example.com'])   # pass allowed_domains= or page_filter= to limit actions to certain pages
+def ask_human(question: str) -> ActionResult:
+    answer = input(f'{question} > ')
+    return ActionResult(extracted_content=f'The human responded with: {answer}', include_in_memory=True)
+```
+
+```python
+# Then pass your controller to the agent to use it
+agent = Agent(
+    task='...',
+    llm=llm,
+    controller=controller,
+)
+```
+
+<Note>
+  Keep your action function names and descriptions short and concise:
+  - The LLM chooses between actions to run solely based on the function name and description
+  - The LLM decides how to fill action params based on their names, type hints, & defaults
+</Note>
+
+---
+
+## Action Parameters
+
+Browser Use supports two patterns for defining action parameters: normal function arguments, or a Pydantic model.
+
+### Function Arguments
+
+For simple actions that don't need default values, you can define the action parameters directly as arguments to the function. This one takes a single string argument, `css_selector`.
+When the LLM calls an action, it sees its argument names & types, and will provide values that fit.
+
+```python
+@controller.action('Click element')
+def click_element(css_selector: str, page: Page) -> ActionResult:
+    # css_selector is an action param the LLM must provide when calling
+    # page is a special framework-provided param to access the browser APIs (see below)
+    await page.locator(css_selector).click()
+    return ActionResult(extracted_content=f"Clicked element {css_selector}")
+```
+
+### Pydantic Model
+
+You can define a pydantic model for the parameters your action expects by setting a `@controller.action(..., param_model=MyParams)`.
+This allows you to use optional parameters, default values, `Annotated[...]` types with custom validation, field descriptions, and other features offered by pydantic.
+
+When the agent calls calls your agent function, an instance of your model with the values filled by the LLM will be passed as the argument named `params` to your action function.
+
+Using a pydantic model is helpful because it allows more flexibility and power to enforce the schema of the values the LLM should provide.
+The LLM gets the entire pydantic JSON schema for your `param_model`, it will see the function name & description + individual field names, types, descriptions, and default values.
+
+
+```python
+from typing import Annotated
+from pydantic import BaseModel, AfterValidator
+from browser_use import ActionResult
+
+class MyParams(BaseModel):
+    field1: int
+    field2: str = 'default value'
+    field3: Annotated[str, AfterValidator(lambda s: s.lower())]  # example: enforce always lowercase
+    field4: str = Field(default='abc', description='Detailed description for the LLM')
+
+@controller.action('My action', param_model=MyParams)
+def my_action(params: MyParams, page: Page) -> ActionResult:
+    await page.keyboard.type(params.field2)
+    return ActionResult(extracted_content=f"Inputted {params} on {page.url}")
+```
+
+Any special framework-provided arguments (e.g. `page`) will be passed as separate positional arguments after `params`.
+
+<Important>
+To use a `BaseModel` the arg *must* be called `params`. Action function args are matched and filled like named arguments; arg order doesn't matter but names and types do.
+</Important>
+
+### Framework-Provided Parameters
+
+These special action parameters are injected by the `Controller` and are passed as extra args to any actions that expect them.
+
+For example, actions that need to run playwright code to interact with the browser should take the argument `page` or `browser_session`.
+
+- `page: Page` - The current Playwright page (shortcut for `browser_session.get_current_page()`)
+- `browser_session: BrowserSession` - The current browser session (and playwright context via `browser_session.browser_context`)
+- `context: AgentContext` - Any optional top-level context object passed to the Agent, e.g. `Agent(context=user_provided_obj)`
+- `page_extraction_llm: BaseChatModel` - LLM instance used for page content extraction
+- `available_file_paths: list[str]` - List of available file paths for upload / processing
+- `has_sensitive_data: bool` - Whether the action content contains sensitive data markers (check this to avoid logging sensitive data to terminal by accident)
+
+#### Example: Action uses the current `page`
+
+```python
+from playwright.async_api import Page
+from browser_use import Controller, ActionResult
+
+controller = Controller()
+
+@controller.action('Type keyboard input into a page')
+async def input_text_into_page(text: str, page: Page) -> ActionResult:
+    await page.keyboard.type(text)
+    return ActionResult(extracted_content='Website opened')
+```
+
+#### Example: Action uses the `browser_context`
+
+```python
+from browser_use import BrowserSession, Controller, ActionResult
+
+controller = Controller()
+
+@controller.action('Open website')
+async def open_website(url: str, browser_session: BrowserSession) -> ActionResult:
+    # find matching existing tab by looking through all pages in playwright browser_context
+    all_tabs = await browser_session.browser_context.pages
+    for tab in all_tabs:
+        if tab.url == url:
+            await tab.bring_to_foreground()
+            return ActionResult(extracted_content=f'Switched to tab with url {url}')
+    # otherwise, create a new tab
+    new_tab = await browser_session.browser_context.new_page()
+    await new_tab.goto(url)
+    return ActionResult(extracted_content=f'Opened new tab with url {url}')
+```
+
+
+---
+
+
+## Important Rules
+
+1. **Return an [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code)**: All actions should return an `ActionResult | str | None`. The stringified version of the result is passed back to the LLM, and optionally persisted in the long-term memory when `ActionResult(..., include_in_memory=True)`.
+2. **Type hints on arguments are required**: They are used to verify that action params don't conflict with special arguments injected by the controller (e.g. `page`)
+3. **Actions functions called directly must be passed kwargs**: When calling actions from other actions or python code, you must **pass all parameters as kwargs only**, even though the actions are usually defined using positional args (for the same reasons as [pluggy](https://pluggy.readthedocs.io/en/stable/index.html#calling-hooks)).
+    Action arguments are always matched by name and type, **not** positional order, so this helps prevent ambiguity / reordering issues while keeping action signatures short.
+    ```python
+    @controller.action('Fill in the country form field')
+    def input_country_field(country: str, page: Page) -> ActionResult:
+        await some_action(123, page=page)                                # ❌ not allowed: positional args, use kwarg syntax when calling
+        await some_action(abc=123, page=page)                            # ✅ allowed: action params & special kwargs
+        await some_other_action(params=OtherAction(abc=123), page=page)  # ✅ allowed: params=model & special kwargs
+    ```
+
+```python
+# Using Pydantic Model to define action params (recommended)
+class PinCodeParams(BaseModel):
+    code: int
+    retries: int = 3                                               # ✅ supports optional/defaults
+
+@controller.action('...', param_model=PinCodeParams)
+async def input_pin_code(params: PinCodeParams, page: Page): ...   # ✅ special params at the end
+
+# Using function arguments to define action params
+async def input_pin_code(code: int, retries: int, page: Page): ... # ✅ params first, special params second, no defaults
+async def input_pin_code(code: int, retries: int=3): ...           # ✅ defaults ok only if no special params needed
+async def input_pin_code(code: int, retries: int=3, page: Page): ... # ❌ Python SyntaxError! not allowed
+```
+
+
+---
+
+
+## Reusing Custom Actions Across Agents
+
+You can use the same controller for multiple agents.
+
+```python
+controller = Controller()
+
+# ... register actions to the controller
+
+agent = Agent(
+    task="Go to website X and find the latest news",
+    llm=llm,
+    controller=controller
+)
+
+# Run the agent
+await agent.run()
+
+agent2 = Agent(
+    task="Go to website Y and find the latest news",
+    llm=llm,
+    controller=controller
+)
+
+await agent2.run()
+```
+
+<Note>
+  The controller is stateless and can be used to register multiple actions and
+  multiple agents.
+</Note>
+
+
+
+## Exclude functions
+
+If you want to exclude some registered actions and make them unavailable to the agent, you can do:
+```python
+controller = Controller(exclude_actions=['open_tab', 'search_google'])
+agent = Agent(controller=controller, ...)
+```
+
+
+If you want actions to only be available on certain pages, and to not tell the LLM about them on other pages,
+ you can use the `allowed_domains` and `page_filter`:
+
+```python
+from pydantic import BaseModel
+from browser_use import Controller, ActionResult
+
+controller = Controller()
+
+async def is_ai_allowed(page: Page):
+    if api.some_service.check_url(page.url):
+        logger.warning('Allowing AI agent to visit url:', page.url)
+        return True
+    return False
+
+@controller.action('Fill out secret_form', allowed_domains=['https://*.example.com'], page_filter=is_ai_allowed)
+def fill_out_form(...) -> ActionResult:
+    ... will only be runnable by LLM on pages that match https://*.example.com *AND* where is_ai_allowed(page) returns True
+
+```