mirror of
https://github.com/j93es/browser-use-oauth.git
synced 2026-06-04 06:21:52 +09:00
Add comprehensive documentation for Browser Use features
- Introduced custom output format instructions with example code. - Detailed connection methods for launching and connecting to browsers, including local and remote options. - Provided guidelines for handling sensitive data securely, including best practices and examples. - Documented supported LangChain chat models with setup instructions and environment variable requirements. - Added instructions for customizing the system prompt to control agent behavior.
This commit is contained in:
parent
34ee66b4e8
commit
638a3d47ce
10 changed files with 3056 additions and 0 deletions
249
.github/instructions/custom-functions.instructions.md
vendored
Normal file
249
.github/instructions/custom-functions.instructions.md
vendored
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
---
|
||||
description: "Extend default agent and write custom action functions to do certain tasks"
|
||||
applyTo: '**'
|
||||
---
|
||||
|
||||
Custom actions are functions *you* provide, that are added to our [default actions](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) the agent can use to accomplish tasks.
|
||||
Action functions can request [arbitrary parameters](#action-parameters-via-pydantic-model) that the LLM has to come up with + a fixed set of [framework-provided arguments](#framework-provided-parameters) for browser APIs / `Agent(context=...)` / etc.
|
||||
|
||||
<Note>
|
||||
Our default set of actions is already quite powerful, the built-in `Controller` provides basics like `open_tab`, `scroll_down`, `extract_content`, [and more](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py).
|
||||
</Note>
|
||||
|
||||
It's easy to add your own actions to implement additional custom behaviors, integrations with other apps, or performance optimizations.
|
||||
|
||||
For examples of custom actions (e.g. uploading files, asking a human-in-the-loop for help, drawing a polygon with the mouse, and more), see [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).
|
||||
|
||||
|
||||
## Action Function Registration
|
||||
|
||||
To register your own custom functions (which can be `sync` or `async`), decorate them with the `@controller.action(...)` decorator. This saves them into the `controller.registry`.
|
||||
|
||||
```python
|
||||
from browser_use import Controller, ActionResult
|
||||
|
||||
controller = Controller()
|
||||
|
||||
@controller.action('Ask human for help with a question', domains=['example.com']) # pass allowed_domains= or page_filter= to limit actions to certain pages
|
||||
def ask_human(question: str) -> ActionResult:
|
||||
answer = input(f'{question} > ')
|
||||
return ActionResult(extracted_content=f'The human responded with: {answer}', include_in_memory=True)
|
||||
```
|
||||
|
||||
```python
|
||||
# Then pass your controller to the agent to use it
|
||||
agent = Agent(
|
||||
task='...',
|
||||
llm=llm,
|
||||
controller=controller,
|
||||
)
|
||||
```
|
||||
|
||||
<Note>
|
||||
Keep your action function names and descriptions short and concise:
|
||||
- The LLM chooses between actions to run solely based on the function name and description
|
||||
- The LLM decides how to fill action params based on their names, type hints, & defaults
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Action Parameters
|
||||
|
||||
Browser Use supports two patterns for defining action parameters: normal function arguments, or a Pydantic model.
|
||||
|
||||
### Function Arguments
|
||||
|
||||
For simple actions that don't need default values, you can define the action parameters directly as arguments to the function. This one takes a single string argument, `css_selector`.
|
||||
When the LLM calls an action, it sees its argument names & types, and will provide values that fit.
|
||||
|
||||
```python
|
||||
@controller.action('Click element')
|
||||
def click_element(css_selector: str, page: Page) -> ActionResult:
|
||||
# css_selector is an action param the LLM must provide when calling
|
||||
# page is a special framework-provided param to access the browser APIs (see below)
|
||||
await page.locator(css_selector).click()
|
||||
return ActionResult(extracted_content=f"Clicked element {css_selector}")
|
||||
```
|
||||
|
||||
### Pydantic Model
|
||||
|
||||
You can define a pydantic model for the parameters your action expects by setting a `@controller.action(..., param_model=MyParams)`.
|
||||
This allows you to use optional parameters, default values, `Annotated[...]` types with custom validation, field descriptions, and other features offered by pydantic.
|
||||
|
||||
When the agent calls calls your agent function, an instance of your model with the values filled by the LLM will be passed as the argument named `params` to your action function.
|
||||
|
||||
Using a pydantic model is helpful because it allows more flexibility and power to enforce the schema of the values the LLM should provide.
|
||||
The LLM gets the entire pydantic JSON schema for your `param_model`, it will see the function name & description + individual field names, types, descriptions, and default values.
|
||||
|
||||
|
||||
```python
|
||||
from typing import Annotated
|
||||
from pydantic import BaseModel, AfterValidator
|
||||
from browser_use import ActionResult
|
||||
|
||||
class MyParams(BaseModel):
|
||||
field1: int
|
||||
field2: str = 'default value'
|
||||
field3: Annotated[str, AfterValidator(lambda s: s.lower())] # example: enforce always lowercase
|
||||
field4: str = Field(default='abc', description='Detailed description for the LLM')
|
||||
|
||||
@controller.action('My action', param_model=MyParams)
|
||||
def my_action(params: MyParams, page: Page) -> ActionResult:
|
||||
await page.keyboard.type(params.field2)
|
||||
return ActionResult(extracted_content=f"Inputted {params} on {page.url}")
|
||||
```
|
||||
|
||||
Any special framework-provided arguments (e.g. `page`) will be passed as separate positional arguments after `params`.
|
||||
|
||||
<Important>
|
||||
To use a `BaseModel` the arg *must* be called `params`. Action function args are matched and filled like named arguments; arg order doesn't matter but names and types do.
|
||||
</Important>
|
||||
|
||||
### Framework-Provided Parameters
|
||||
|
||||
These special action parameters are injected by the `Controller` and are passed as extra args to any actions that expect them.
|
||||
|
||||
For example, actions that need to run playwright code to interact with the browser should take the argument `page` or `browser_session`.
|
||||
|
||||
- `page: Page` - The current Playwright page (shortcut for `browser_session.get_current_page()`)
|
||||
- `browser_session: BrowserSession` - The current browser session (and playwright context via `browser_session.browser_context`)
|
||||
- `context: AgentContext` - Any optional top-level context object passed to the Agent, e.g. `Agent(context=user_provided_obj)`
|
||||
- `page_extraction_llm: BaseChatModel` - LLM instance used for page content extraction
|
||||
- `available_file_paths: list[str]` - List of available file paths for upload / processing
|
||||
- `has_sensitive_data: bool` - Whether the action content contains sensitive data markers (check this to avoid logging sensitive data to terminal by accident)
|
||||
|
||||
#### Example: Action uses the current `page`
|
||||
|
||||
```python
|
||||
from playwright.async_api import Page
|
||||
from browser_use import Controller, ActionResult
|
||||
|
||||
controller = Controller()
|
||||
|
||||
@controller.action('Type keyboard input into a page')
|
||||
async def input_text_into_page(text: str, page: Page) -> ActionResult:
|
||||
await page.keyboard.type(text)
|
||||
return ActionResult(extracted_content='Website opened')
|
||||
```
|
||||
|
||||
#### Example: Action uses the `browser_context`
|
||||
|
||||
```python
|
||||
from browser_use import BrowserSession, Controller, ActionResult
|
||||
|
||||
controller = Controller()
|
||||
|
||||
@controller.action('Open website')
|
||||
async def open_website(url: str, browser_session: BrowserSession) -> ActionResult:
|
||||
# find matching existing tab by looking through all pages in playwright browser_context
|
||||
all_tabs = await browser_session.browser_context.pages
|
||||
for tab in all_tabs:
|
||||
if tab.url == url:
|
||||
await tab.bring_to_foreground()
|
||||
return ActionResult(extracted_content=f'Switched to tab with url {url}')
|
||||
# otherwise, create a new tab
|
||||
new_tab = await browser_session.browser_context.new_page()
|
||||
await new_tab.goto(url)
|
||||
return ActionResult(extracted_content=f'Opened new tab with url {url}')
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Important Rules
|
||||
|
||||
1. **Return an [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code)**: All actions should return an `ActionResult | str | None`. The stringified version of the result is passed back to the LLM, and optionally persisted in the long-term memory when `ActionResult(..., include_in_memory=True)`.
|
||||
2. **Type hints on arguments are required**: They are used to verify that action params don't conflict with special arguments injected by the controller (e.g. `page`)
|
||||
3. **Actions functions called directly must be passed kwargs**: When calling actions from other actions or python code, you must **pass all parameters as kwargs only**, even though the actions are usually defined using positional args (for the same reasons as [pluggy](https://pluggy.readthedocs.io/en/stable/index.html#calling-hooks)).
|
||||
Action arguments are always matched by name and type, **not** positional order, so this helps prevent ambiguity / reordering issues while keeping action signatures short.
|
||||
```python
|
||||
@controller.action('Fill in the country form field')
|
||||
def input_country_field(country: str, page: Page) -> ActionResult:
|
||||
await some_action(123, page=page) # ❌ not allowed: positional args, use kwarg syntax when calling
|
||||
await some_action(abc=123, page=page) # ✅ allowed: action params & special kwargs
|
||||
await some_other_action(params=OtherAction(abc=123), page=page) # ✅ allowed: params=model & special kwargs
|
||||
```
|
||||
|
||||
```python
|
||||
# Using Pydantic Model to define action params (recommended)
|
||||
class PinCodeParams(BaseModel):
|
||||
code: int
|
||||
retries: int = 3 # ✅ supports optional/defaults
|
||||
|
||||
@controller.action('...', param_model=PinCodeParams)
|
||||
async def input_pin_code(params: PinCodeParams, page: Page): ... # ✅ special params at the end
|
||||
|
||||
# Using function arguments to define action params
|
||||
async def input_pin_code(code: int, retries: int, page: Page): ... # ✅ params first, special params second, no defaults
|
||||
async def input_pin_code(code: int, retries: int=3): ... # ✅ defaults ok only if no special params needed
|
||||
async def input_pin_code(code: int, retries: int=3, page: Page): ... # ❌ Python SyntaxError! not allowed
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Reusing Custom Actions Across Agents
|
||||
|
||||
You can use the same controller for multiple agents.
|
||||
|
||||
```python
|
||||
controller = Controller()
|
||||
|
||||
# ... register actions to the controller
|
||||
|
||||
agent = Agent(
|
||||
task="Go to website X and find the latest news",
|
||||
llm=llm,
|
||||
controller=controller
|
||||
)
|
||||
|
||||
# Run the agent
|
||||
await agent.run()
|
||||
|
||||
agent2 = Agent(
|
||||
task="Go to website Y and find the latest news",
|
||||
llm=llm,
|
||||
controller=controller
|
||||
)
|
||||
|
||||
await agent2.run()
|
||||
```
|
||||
|
||||
<Note>
|
||||
The controller is stateless and can be used to register multiple actions and
|
||||
multiple agents.
|
||||
</Note>
|
||||
|
||||
|
||||
|
||||
## Exclude functions
|
||||
|
||||
If you want to exclude some registered actions and make them unavailable to the agent, you can do:
|
||||
```python
|
||||
controller = Controller(exclude_actions=['open_tab', 'search_google'])
|
||||
agent = Agent(controller=controller, ...)
|
||||
```
|
||||
|
||||
|
||||
If you want actions to only be available on certain pages, and to not tell the LLM about them on other pages,
|
||||
you can use the `allowed_domains` and `page_filter`:
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
from browser_use import Controller, ActionResult
|
||||
|
||||
controller = Controller()
|
||||
|
||||
async def is_ai_allowed(page: Page):
|
||||
if api.some_service.check_url(page.url):
|
||||
logger.warning('Allowing AI agent to visit url:', page.url)
|
||||
return True
|
||||
return False
|
||||
|
||||
@controller.action('Fill out secret_form', allowed_domains=['https://*.example.com'], page_filter=is_ai_allowed)
|
||||
def fill_out_form(...) -> ActionResult:
|
||||
... will only be runnable by LLM on pages that match https://*.example.com *AND* where is_ai_allowed(page) returns True
|
||||
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue