refect: 코드 가독성 해결

This commit is contained in:
암냥 2025-07-02 19:10:58 +09:00
commit 3199a53a44
52 changed files with 389 additions and 3246 deletions

View file

@ -1,345 +0,0 @@
---
description: "Learn how to configure the agent"
applyTo: '**'
---
## Overview
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
## Basic Settings
```python
from browser_use import Agent
from langchain_openai import ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4o"),
)
```
### Required Parameters
- `task`: The instruction for the agent to execute
- `llm`: A LangChain chat model instance. See <a href="/customize/supported-models">LangChain Models</a> for supported models.
## Agent Behavior
Control how the agent operates:
```python
agent = Agent(
task="your task",
llm=llm,
controller=custom_controller, # For custom tool calling
use_vision=True, # Enable vision capabilities
save_conversation_path="logs/conversation" # Save chat logs
)
```
### Behavior Parameters
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
- When enabled, the model processes visual information from web pages
- Disable to reduce costs or use models without vision support
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
- `override_system_message`: Completely replace the default system prompt with a custom one.
- `extend_system_message`: Add additional instructions to the default system prompt.
<Note>
Vision capabilities are recommended for better web interaction understanding,
but can be disabled to reduce costs or when using models without vision
support.
</Note>
### Reuse Existing Browser Context
By default browser-use launches its own builtin browser using playwright chromium.
You can also connect to a remote browser or pass any of the following
existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`.
These all get passed down to create a `BrowserSession` for the `Agent`:
```python
agent = Agent(
task='book a flight to fiji',
llm=llm,
browser_profile=browser_profile, # use this profile to create a BrowserSession
browser_session=BrowserSession( # use an existing BrowserSession
cdp_url=..., # remote CDP browser to connect to
# or
wss_url=..., # remote wss playwright server provider
# or
browser_pid=... # pid of a locally running browser process to attach to
# or
executable_path=... # provide a custom chrome binary path
# or
channel=... # specify chrome, chromium, ms-edge, etc.
# or
page=page, # use an existing playwright Page object
# or
browser_context=browser_context, # use an existing playwright BrowserContext object
# or
browser=browser, # use an existing playwright Browser object
),
)
```
For example, to connect to an existing browser over CDP you could do:
```python
agent = Agent(
...
browser_session=BrowserSession(cdp_url='http://localhost:9222'),
)
```
For example, to connect to a local running chrome instance you can do:
```python
agent = Agent(
...
browser_session=BrowserSession(browser_pid=1234),
)
```
See <a href="/customize/real-browser">Connect to your Browser</a> for more info.
<Note>
You can reuse the same `BrowserSession` after an agent has completed running. If you do nothing, the
browser will be automatically closed on `run()` completion only if it was launched by us.
</Note>
## Running the Agent
The agent is executed using the async `run()` method:
- `max_steps` (default: `100`)
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
## Agent History
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
```python
# Example of accessing history
history = await agent.run()
# Access (some) useful information
history.urls() # List of visited URLs
history.screenshots() # List of screenshot paths
history.action_names() # Names of executed actions
history.extracted_content() # Content extracted during execution
history.errors() # Any errors that occurred
history.model_actions() # All actions with their parameters
```
The `AgentHistoryList` provides many helper methods to analyze the execution:
- `final_result()`: Get the final extracted content
- `is_done()`: Check if the agent completed successfully
- `has_errors()`: Check if any errors occurred
- `model_thoughts()`: Get the agent's reasoning process
- `action_results()`: Get results of all actions
<Note>
For a complete list of helper methods and detailed history analysis
capabilities, refer to the [AgentHistoryList source
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
</Note>
## Run initial actions without LLM
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM.
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code.
```python
initial_actions = [
{'open_tab': {'url': 'https://www.google.com'}},
{'open_tab': {'url': 'https://en.wikipedia.org/wiki/Randomness'}},
{'scroll_down': {'amount': 1000}},
]
agent = Agent(
task='What theories are displayed on the page?',
initial_actions=initial_actions,
llm=llm,
)
```
## Run with message context
You can configure the agent and provide a separate message to help the LLM understand the task better.
```python
from langchain_openai import ChatOpenAI
agent = Agent(
task="your task",
message_context="Additional information about the task",
llm = ChatOpenAI(model='gpt-4o')
)
```
## Run with planner model
You can configure the agent to use a separate planner model for high-level task planning:
```python
from langchain_openai import ChatOpenAI
# Initialize models
llm = ChatOpenAI(model='gpt-4o')
planner_llm = ChatOpenAI(model='o3-mini')
agent = Agent(
task="your task",
llm=llm,
planner_llm=planner_llm, # Separate model for planning
use_vision_for_planner=False, # Disable vision for planner
planner_interval=4 # Plan every 4 steps
)
```
### Planner Parameters
- `planner_llm`: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
- `planner_interval`: Number of steps between planning phases. Defaults to `1`.
Using a separate planner model can help:
- Reduce costs by using a smaller model for high-level planning
- Improve task decomposition and strategic thinking
- Better handle complex, multi-step tasks
<Note>
The planner model is optional. If not specified, the agent will not use the planner model.
</Note>
### Optional Parameters
- `message_context`: Additional information about the task to help the LLM understand the task better.
- `initial_actions`: List of initial actions to run before the main task.
- `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`.
- `max_failures`: Maximum number of failures before giving up. Defaults to `3`.
- `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
- `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.
## Memory Management
Browser Use includes a procedural memory system using [Mem0](https://mem0.ai) that automatically summarizes the agent's conversation history at regular intervals to optimize context window usage during long tasks.
```python
from browser_use.agent.memory import MemoryConfig
agent = Agent(
task="your task",
llm=llm,
enable_memory=True,
memory_config=MemoryConfig( # Ensure llm_instance is passed if not using default LLM config
llm_instance=llm, # Important: Pass the agent's LLM instance here
agent_id="my_custom_agent",
memory_interval=15
)
)
```
### Memory Parameters
- `enable_memory`: Enable/disable the procedural memory system. Defaults to `True`.
- `memory_config`: A `MemoryConfig` Pydantic model instance (required if `enable_memory` is `True`). Dictionary format is not supported.
### Using MemoryConfig
You must configure the memory system using the `MemoryConfig` Pydantic model for a type-safe approach:
```python
from browser_use.agent.memory import MemoryConfig
from langchain_openai import ChatOpenAI # Assuming llm is an instance of ChatOpenAI
llm_for_agent = ChatOpenAI(model="gpt-4o")
agent = Agent(
task=task_description,
llm=llm_for_agent,
enable_memory=True, # This is True by default
memory_config=MemoryConfig(
llm_instance=llm_for_agent, # Pass the LLM instance for Mem0
agent_id="my_agent",
memory_interval=15, # Summarize every 15 steps
embedder_provider="openai",
embedder_model="text-embedding-3-large",
embedder_dims=1536,
# --- Vector Store Customization ---
vector_store_provider="qdrant", # e.g., Qdrant, Pinecone, Chroma, etc.
vector_store_collection_name="my_browser_use_memories", # Optional: custom collection name
vector_store_config_override={ # Provider-specific config
"host": "localhost",
"port": 6333
# Add other Qdrant specific configs here if needed, e.g., api_key for cloud
}
)
)
```
The `MemoryConfig` model provides these configuration options:
#### Memory Settings
- `agent_id`: Unique identifier for the agent (default: `"browser_use_agent"`). Essential for persistent memory sessions if using a persistent vector store.
- `memory_interval`: Number of steps between memory summarization (default: `10`)
#### LLM Settings (for Mem0's internal operations)
- `llm_instance`: The LangChain `BaseChatModel` instance that Mem0 will use for its internal summarization and processing. You must pass the same LLM instance used by the main agent, or another compatible one, here.
#### Embedder Settings
- `embedder_provider`: Provider for embeddings (`'openai'`, `'gemini'`, `'ollama'`, or `'huggingface'`)
- `embedder_model`: Model name for the embedder
- `embedder_dims`: Dimensions for the embeddings
#### Vector Store Settings
- `vector_store_provider`: Choose the vector store backend. Supported options include:
`'faiss'` (default), `'qdrant'`, `'pinecone'`, `'supabase'`, `'elasticsearch'`, `'chroma'`, `'weaviate'`, `'milvus'`, `'pgvector'`, `'upstash_vector'`, `'vertex_ai_vector_search'`, `'azure_ai_search'`, `'lancedb'`, `'mongodb'`, `'redis'`, `'memory'` (in-memory, non-persistent).
- `vector_store_collection_name`: (Optional) Specify a custom name for the collection or index in your vector store. If not provided, a default name is generated (especially for local stores like FAISS/Chroma) or used by Mem0.
- `vector_store_base_path`: Path for local vector stores like FAISS or Chroma (e.g., `/tmp/mem0`). Default is `/tmp/mem0`.
- `vector_store_config_override`: (Optional) A dictionary to provide or override specific configuration parameters required by Mem0 for the chosen `vector_store_provider`. This is where you'd put connection details like `host`, `port`, `api_key`, `url`, `environment`, etc., for cloud-based or server-based vector stores.
The model automatically sets appropriate defaults based on the LLM being used:
- For `ChatOpenAI`: Uses OpenAI's `text-embedding-3-small` embeddings
- For `ChatGoogleGenerativeAI`: Uses Gemini's `models/text-embedding-004` embeddings
- For `ChatOllama`: Uses Ollama's `nomic-embed-text` embeddings
- Default: Uses Hugging Face's `all-MiniLM-L6-v2` embeddings
<Note>
**Important:**
- Always pass a properly constructed `MemoryConfig` object to the `memory_config` parameter.
- Ensure the `llm_instance` is provided to `MemoryConfig` so Mem0 can perform its operations.
- For persistent memory across agent runs or for shared memory, choose a scalable vector store provider (like Qdrant, Pinecone, etc.) and configure it correctly using `vector_store_provider` and `vector_store_config_override`. The default 'faiss' provider stores data locally in `vector_store_base_path`.
</Note>
### How Memory Works
When enabled, the agent periodically compresses its conversation history into concise summaries:
1. Every `memory_interval` steps, the agent reviews its recent interactions.
2. It uses Mem0 (configured with your chosen LLM and vector store) to create a procedural memory summary.
3. The original messages in the agent's active context are replaced with this summary, reducing token usage.
4. This process helps maintain important context while freeing up the context window for new information.
### Disabling Memory
If you want to disable the memory system (for debugging or for shorter tasks), set `enable_memory` to `False`:
```python
agent = Agent(
task="your task",
llm=llm,
enable_memory=False
)
```
<Note>
Disabling memory may be useful for debugging or short tasks, but for longer
tasks, it can lead to context window overflow as the conversation history
grows. The memory system helps maintain performance during extended sessions.
</Note>

View file

@ -1,968 +0,0 @@
---
description: "Launch or connect to an existing browser and configure it to your needs."
applyTo: '**'
---
Browser Use uses [playwright](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) (or [patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)) to manage its connection with a real browser.
---
**To launch or connect to a browser**, pass any playwright / browser-use configuration arguments you want to `BrowserSession(...)`:
```python
from browser_use import BrowserSession, Agent
browser_session = BrowserSession(
headless=True,
viewport={'width': 964, 'height': 647},
user_data_dir='~/.config/browseruse/profiles/default',
)
agent = Agent('fill out the form on this page', browser_session=browser_session)
```
<Note>
The new `BrowserSession` & `BrowserProfile` accept all the same arguments that Playwright's [`launch_persistent_context(...)`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) takes, giving you full control over browser settings at launch. (see below for the full list)
</Note>
---
## `BrowserSession`
- 🎭 `BrowserSession(**params)` is Browser Use's object that tracks a playwright connection to a running browser. It sets up:
- the `playwright` library, `browser` and/or `browser_context`, and `page` objects and tracks which tabs the agent & human are focused on
- methods to interact with the browser window, apply config needed by the Agent, and run the `DOMService` for element detection
- it can take a `browser_profile=BrowserProfile(...)` template containing some config defaults, and `**kwargs` session-specific config overrides
### Browser Connection Parameters
Provide any one of these options to connect to an existing browser. These options are session-specific and cannot be stored in a `BrowserProfile(...)` template.
#### `wss_url`
```python
wss_url: str | None = None
```
WSS URL of the playwright-protocol browser server to connect to. See here for [WSS connection instructions](https://docs.browser-use.com/customize/real-browser#method-d%3A-connect-to-remote-playwright-node-js-browser-server-via-wss-url).
#### `cdp_url`
```python
cdp_url: str | None = None
```
CDP URL of the browser to connect to (e.g. `http://localhost:9222`). See here for [CDP connection instructions](https://docs.browser-use.com/customize/real-browser#method-e%3A-connect-to-remote-browser-via-cdp-url).
#### `browser_pid`
```python
browser_pid: int | None = None
```
PID of a running chromium-based browser process to connect to on localhost. See here for [connection via pid](https://docs.browser-use.com/customize/real-browser#method-c%3A-connect-to-local-browser-using-browser-pid) instructions.
<Note>
For web scraping tasks on sites that restrict automated access, we recommend
using [our cloud](https://browser-use.com) or an external browser provider for better reliability.
See the [Connect to your Browser](real-browser) guide for detailed connection instructions.
</Note>
### Session-Specific Parameters
#### `browser_profile`
```python
browser_profile: BrowserProfile = BrowserProfile()
```
Optional `BrowserProfile` template containing default config to use for the `BrowserSession`. (see below for more info)
#### `playwright`
```python
playwright: Playwright | None = None
```
Optional playwright or patchright API client handle to use, the result of `(await async_playwright().start())` or `(await async_patchright().start())`, which spawns a node.js child subprocess that relays commands to the browser over CDP.
See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `browser`
```python
browser: Browser | None = None
```
Playwright Browser object to use (optional). See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `browser_context`
```python
browser_context: BrowserContext | None = None
```
Playwright BrowserContext object to use (optional). See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `page` *aka* `agent_current_page`
<a name="page"></a><a name="agent-current-page"></a>
```python
page: Page | None = None
```
Foreground Page that the agent is focused on, can also be passed as `page=...` as a shortcut. See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `human_current_page`
```python
human_current_page: Page | None = None
```
Foreground Page that the human is focused on to start, not necessary to set manually.
#### `initialized`
```python
initialized: bool = False
```
Mark BrowserSession as already initialized, skips launch/connection (not recommended)
#### `**kwargs`
`BrowserSession` can also accept *all* of the parameters [below](#browserprofile).
(the parameters *above* this point are specific to `BrowserSession` and cannot be stored in a `BrowserProfile` template)
Extra `**kwargs` passed to `BrowserSession(...)` act as session-specific overrides to the `BrowserProfile(...)` template.
```python
base_iphone13 = BrowserProfile(
storage_state='/tmp/auth.json', # share cookies between parallel browsers
**playwright.devices['iPhone 13'],
timezone_id='UTC',
)
usa_phone = BrowserSession(
browser_profile=base_iphone13,
timezone_id='America/New_York', # kwargs override values in base_iphone13
)
eu_phone = BrowserSession(
browser_profile=base_iphone13,
timezone_id='Europe/Paris',
)
usa_agent = Agent(task='show me todays schedule...', browser_session=usa_phone)
eu_agent = Agent(task='show me todays schedule...', browser_session=eu_phone)
await asyncio.gather(agent1.run(), agent2.run())
```
---
## `BrowserProfile`
A `BrowserProfile` is a 📋 config template for a 🎭 `BrowserSession(...)`.
It's basically just a typed + validated version of a `dict` to hold config.
When you find yourself storing or re-using many browser configs, you can upgrade from:
```diff
- config = {key: val, key: val, ...}
- BrowserSession(**config)
```
To this instead:
```diff
+ config = BrowserProfile(key=val, key=val, ...)
+ BrowserSession(browser_profile=config)
```
<Tip>
You don't ever *need* to use a `BrowserProfile`, you can always pass config parameters directly to `BrowserSession`:
```python
session = BrowserSession(headless=True, storage_state='auth.json', viewport={...}, ...)
```
</Tip>
`BrowserProfile` is optional, but it provides a number of benefits over a normal `dict` for holding config:
- has type hints and pydantic field descriptions that show up in your IDE
- validates config at runtime quickly without having to start a browser
- provides helper methods to autodetect screen size, set up local paths, save/load config as json, and more...
<Tip>
`BrowserProfiles`s are designed to easily be given 🆔 `uuid`s and put in a database + made editable by users.
`BrowserSession`s get their own 🆔 `uuid`s and be linked by 🖇 foreign key to whatever `BrowserProfiles` they use.
This cleanly separates the per-connection rows from the bulky re-usable config and avoids wasting space in your db.
This is useful because a user may only have 2 or 3 profiles, but they could have 100k+ sessions within a few months.
</Tip>
`BrowserProfile` and `BrowserSession` can both take any of the:
- [Playwright parameters](#playwright)
- [Browser-Use parameters](#browser-use-parameters) (extra options we provide on top of `playwright`)
The only parameters `BrowserProfile` can NOT take are the session-specific connection parameters and live playwright objects:
`cdp_url`, `wss_url`, `browser_pid`, `page`, `browser`, `browser_context`, `playwright`, etc.
### Basic Example
```python
from browser_use.browser import BrowserProfile
profile = BrowserProfile(
stealth=True,
storage_state='/tmp/google_docs_cookies.json',
allowed_domains=['docs.google.com', 'https://accounts.google.com'],
viewport={'width': 396, 'height': 774},
# ... playwright args / browser-use config args ...
)
phone1 = BrowserSession(browser_profile=profile, device_scale_factor=1)
phone2 = BrowserSession(browser_profile=profile, device_scale_factor=2)
phone3 = BrowserSession(browser_profile=profile, device_scale_factor=3)
```
### Browser-Use Parameters
These parameters control Browser Use-specific features, and are outside the standard playwright set. They can be passed to `BrowserSession(...)` and/or stored in a `BrowserProfile` template.
#### `keep_alive`
```python
keep_alive: bool | None = None
```
If `True` it wont close the browser after the first `agent.run()` ends. Useful for running multiple tasks with the same browser instance. If this is left as `None` and the Agent launched its own browser, the default is to close the browser after the agent completes. If the agent connected to an existing browser then it will leave it open.
#### `stealth`
```python
stealth: bool = False
```
Set to `True` to use [`patchright`](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) to avoid bot-blocking. (Might cause issues with some sites, requires manual testing.)
<a name="restrict-urls"></a>
#### `allowed_domains`
```python
allowed_domains: list[str] | None = None
```
List of allowed domains for navigation. If None, all domains are allowed.
Example: `['google.com', '*.wikipedia.org']` - Here the agent will only be able to access `google.com` exactly and `wikipedia.org` + `*.wikipedia.org`.
Glob patterns are supported:
- `['example.com']` ✅ will match only `https://example.com/*` exactly, subdomains will not be allowed.
It's always the most secure to list all the domains you want to give the access to explicitly w/ schemes e.g.
`['https://google.com', 'http*://www.google.com', 'https://myaccount.google.com', 'https://mail.google.com', 'https://docs.google.com']`
- `['*.example.com']` ⚠️ **CAUTION** this will match `https://example.com` and *all* its subdomains.
Make sure *all* the subdomains are safe for the agent! `abc.example.com`, `def.example.com`, ..., `useruploads.example.com`, `admin.example.com`
#### `disable_security`
```python
disable_security: bool = False
```
Completely disables all basic browser security features. Allows interacting across cross-site iFrames boundaries, but
<Warning>
This option is very INSECURE and is only for niche use cases. DO NOT LET YOUR AGENT visit untrusted URLs or give it real cookies when `disable_security=True`.
Visiting a single malicious site in this mode can trivially compromise *all* the cookies in the browser profile in under 1 second.
</Warning>
#### `deterministic_rendering`
```python
deterministic_rendering: bool = False
```
Attempt to forced more deterministic rendering for consistent screenshots across different host operating systems and hardware.
Disables OS-specific font hints, aliasing, GPU-accelerated rendering, normalizes DPI, and sets a specific JS random seed to try to avoid nondeterministic JS.
<Warning>
This flag is for niche use cases (e.g. screenshot diffing) where pixel-perfect rendering across different server operating systems is more important than stability.
It makes the agent more likely to be blocked as a bot and triggers some glitchy behavior in chrome occasionally, it's not recommended unless you know you need it.
</Warning>
#### `highlight_elements`
```python
highlight_elements: bool = True
```
Highlight interactive elements on the screen with colorful bounding boxes.
#### `viewport_expansion`
```python
viewport_expansion: int = 500
```
Viewport expansion in pixels. With this you can control how much of the page is included in the context of the LLM:
- `-1`: All elements from the entire page will be included, regardless of visibility (highest token usage but most complete).
- `0`: Only elements which are currently visible in the viewport will be included.
- `500` (default): Elements in the viewport plus an additional 500 pixels in each direction will be included, providing a balance between context and token usage.
#### `include_dynamic_attributes`
```python
include_dynamic_attributes: bool = True
```
Include dynamic attributes in selectors for better element targeting.
#### `minimum_wait_page_load_time`
```python
minimum_wait_page_load_time: float = 0.25
```
Minimum time to wait before capturing page state for LLM input.
#### `wait_for_network_idle_page_load_time`
```python
wait_for_network_idle_page_load_time: float = 0.5
```
Time to wait for network activity to cease. Increase to 3-5s for slower websites. This tracks essential content loading, not dynamic elements like videos.
#### `maximum_wait_page_load_time`
```python
maximum_wait_page_load_time: float = 5.0
```
Maximum time to wait for page load before proceeding.
#### `wait_between_actions`
```python
wait_between_actions: float = 0.5
```
Time to wait between agent actions.
#### `cookies_file`
```python
cookies_file: str | None = None
```
JSON file path to save cookies to.
<Warning>
This option is DEPRECATED. Use [`storage_state`](#storage-state) instead, it's the standard playwright format and also supports `localStorage` and `indexedDB`!
The library will automatically save a new `storage_state.json` next to any `cookies_file` path you provide, just use `storage_state='path/to/storage_state.json' to switch to the new format:
`cookies_file.json`: `[{cookie}, {cookie}, {cookie}]`
⬇️
`storage_state.json`: `{"cookies": [{cookie}, {cookie}, {cookie}], "origins": {... optional localstorage state ...}}`
Or run `playwright open https://example.com/ --save-storage=storage_state.json` and log into any sites you need to generate a fresh storage state file.
</Warning>
#### `profile_directory`
```python
profile_directory: str = 'Default'
```
Chrome profile subdirectory name inside of your `user_data_dir` (e.g. `Default`, `Profile 1`, `Work`, etc.).
No need to set this unless you have multiple profiles set up in a single `user_data_dir` and need to use a specific one.
#### `window_position`
```python
window_position: dict | None = {"width": 0, "height": 0}
```
Window position from top-left.
---
<a name="playwright-parameters"></a><a name="playwright"></a>
### Playwright Launch Options
All the parameters below are standard playwright parameters and can be passed to both `BrowserSession` and `BrowserProfile`.
They are defined in `browser_use/browser/profile.py`. See here for the [official Playwright documentation](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) for all of these options.
#### `headless`
```python
headless: bool | None = None
```
Runs the browser without a visible UI. If None, auto-detects based on display availability. If you set `headless=False` on a server with no monitor attached, the browser will fail to launch (use `xvfb` + vnc to give a headless server a virtual display you can remote control).
`headless=False` is recommended for maximum stealth and is required for human-in-the-loop workflows.
#### `channel`
```python
channel: BrowserChannel = 'chromium'
```
Browser channel: `['chromium']` (default when `stealth=False`), `'chrome'` (default when `stealth=True`), `'chrome-beta'`, `'chrome-dev'`, `'chrome-canary'`, `'msedge'`, `'msedge-beta'`, `'msedge-dev'`, `'msedge-canary'`
Don't worry, other chromium-based browsers not in this list (e.g. `brave`) are still supported if you provide your own [`executable_path`](#executable_path), just set it to `chromium` for those.
#### `executable_path`
```python
executable_path: str | Path | None = None
```
Path to browser executable for custom installations.
#### `user_data_dir`
```python
user_data_dir: str | Path | None = '~/.config/browseruse/profiles/default'
```
Directory for browser profile data. Set to `None` to use an ephemeral temporary profile (aka incognito mode).
Multiple running browsers **cannot share a single `user_data_dir` at the same time**. You must set it to `None` or
provide a unique `user_data_dir` per-session if you plan to run multiple browsers.
The browser version run must always be equal to or greater than the version used to create the `user_data_dir`.
If you see errors like `Failed to parse Extensions` or similar and failures when launching, you're attempting to run an older browser with an incompatible `user_data_dir` that's already been migrated to a newer schema version.
#### `args`
```python
args: list[str] = []
```
Additional command-line arguments to pass to the browser. See here for the [full list of available chrome launch options](https://peter.sh/experiments/chromium-command-line-switches/).
#### `ignore_default_args`
```python
ignore_default_args: list[str] | bool = ['--enable-automation', '--disable-extensions']
```
List of default CLI args to stop playwright from including when launching chrome. Set it to `True` to disable *all* default options (not recommended).
#### `env`
```python
env: dict[str, str] = {}
```
Extra environment variables to set when launching browser. e.g. `{'DISPLAY': '1'}` to use a specific X11 display.
#### `chromium_sandbox`
```python
chromium_sandbox: bool = not IN_DOCKER
```
Whether to enable Chromium sandboxing (recommended for security). Should always be `False` when running inside Docker
because Docker provides its own sandboxing can conflict with Chrome's.
#### `devtools`
```python
devtools: bool = False
```
Whether to open DevTools panel automatically (only works when `headless=False`).
#### `slow_mo`
```python
slow_mo: float = 0
```
Slow down actions by this many milliseconds.
#### `timeout`
```python
timeout: float = 30000
```
Default timeout in milliseconds for connecting to a remote browser.
#### `accept_downloads`
```python
accept_downloads: bool = True
```
Whether to automatically accept all downloads.
#### `proxy`
```python
proxy: dict | None = None
```
Proxy settings. Example: `{"server": "http://proxy.com:8080", "username": "user", "password": "pass"}`.
#### `permissions`
```python
permissions: list[str] = ['clipboard-read', 'clipboard-write', 'notifications']
```
Browser permissions to grant. See here for the [full list of available permission](https://playwright.dev/python/docs/api/class-browsercontext#browser-context-grant-permissions).
#### `storage_state`
```python
storage_state: str | Path | dict | None = None
```
Browser storage state (cookies, localStorage). Can be file path or dict. See here for the [Playwright `storage_state` documentation](https://playwright.dev/python/docs/api/class-browsercontext#browser-context-storage-state) on how to use it.
This option is only applied when launching a new browser using the default builtin playwright chromium and `user_data_dir=None` is set.
```bash
# to create a storage state file, run the following and log into the sites you need once the browser opens:
playwright open https://example.com/ --save-storage=./storage_state.json
# then setup a BrowserSession with storage_state='./storage_state.json' and user_data_dir=None to use it
```
### Playwright Timing Settings
These control how the browser waits for CDP API calls to complete and pages to load.
#### `default_timeout`
```python
default_timeout: float | None = None
```
Default timeout for Playwright operations in milliseconds.
#### `default_navigation_timeout`
```python
default_navigation_timeout: float | None = None
```
Default timeout for page navigation in milliseconds.
### Playwright Viewport Options
Configure browser window size, viewport, and display properties:
#### `user_agent`
```python
user_agent: str | None = None
```
Specific user agent to use in this context.
#### `is_mobile`
```python
is_mobile: bool = False
```
Whether the meta viewport tag is taken into account and touch events are enabled.
#### `has_touch`
```python
has_touch: bool = False
```
Specifies if viewport supports touch events.
#### `geolocation`
```python
geolocation: dict | None = None
```
Geolocation coordinates. Example: `{"latitude": 59.95, "longitude": 30.31667}`
#### `locale`
```python
locale: str | None = None
```
Specify user locale, for example en-GB, de-DE, etc. Locale will affect the navigator.language value, Accept-Language request header value as well as number and date formatting rules.
#### `timezone_id`
```python
timezone_id: str | None = None
```
Timezone identifier (e.g., 'America/New_York').
#### `window_size`
```python
window_size: dict | None = None
```
Browser window size for headful mode. Example: `{"width": 1920, "height": 1080}`
#### `viewport`
```python
viewport: dict | None = None
```
Viewport size with `width` and `height`. Example: `{"width": 1280, "height": 720}`
#### `no_viewport`
```python
no_viewport: bool | None = not headless
```
Disable fixed viewport. Content will resize with window.
*Tip:* don't use this parameter, it's a playwright standard parameter but it's redundant and only serves to override the `viewport` setting above.
A viewport is *always* used in headless mode regardless of this setting, and is *never* used in headful mode unless you pass `viewport={width, height}` explicitly.
#### `device_scale_factor`
```python
device_scale_factor: float | None = None
```
Device scale factor (DPI). Useful for high-resolution screenshots (set it to 2).
#### `screen`
```python
screen: dict | None = None
```
Screen size available to browser. Auto-detected if not specified.
#### `color_scheme`
```python
color_scheme: ColorScheme = 'light'
```
Preferred color scheme: `'light'`, `'dark'`, `'no-preference'`
#### `contrast`
```python
contrast: Contrast = 'no-preference'
```
Contrast preference: `'no-preference'`, `'more'`, `'null'`
#### `reduced_motion`
```python
reduced_motion: ReducedMotion = 'no-preference'
```
Reduced motion preference: `'reduce'`, `'no-preference'`, `'null'`
#### `forced_colors`
```python
forced_colors: ForcedColors = 'none'
```
Forced colors mode: `'active'`, `'none'`, `'null'`
#### `**playwright.devices[...]`
Playwright provides launch & context arg presets to [emulate common device fingerprints](https://playwright.dev/python/docs/emulation).
```python
BrowserProfile(
...
**playwright.devices['iPhone 13'], # playwright = await async_playwright().start()
)
```
Because `BrowserSession` and `BrowserProfile` take all the standard playwright args, we are able to support these device presets as well.
### Playwright Security Options
> See `allowed_domains` above too!
#### `offline`
```python
offline: bool = False
```
Emulate network being offline.
#### `http_credentials`
```python
http_credentials: dict | None = None
```
Credentials for HTTP authentication.
#### `extra_http_headers`
```python
extra_http_headers: dict[str, str] = {}
```
Additional HTTP headers to be sent with every request.
#### `ignore_https_errors`
```python
ignore_https_errors: bool = False
```
Whether to ignore HTTPS errors when sending network requests.
#### `bypass_csp`
```python
bypass_csp: bool = False
```
Toggles bypassing Content-Security-Policy.
#### `java_script_enabled`
```python
java_script_enabled: bool = True
```
Whether or not to enable JavaScript in the context.
#### `service_workers`
```python
service_workers: ServiceWorkers = 'allow'
```
Whether to allow sites to register Service workers: `'allow'`, `'block'`
#### `base_url`
```python
base_url: str | None = None
```
Base URL to be used in `page.goto()` and similar operations.
#### `strict_selectors`
```python
strict_selectors: bool = False
```
If true, selector passed to Playwright methods will throw if more than one element matches.
#### `client_certificates`
```python
client_certificates: list[ClientCertificate] = []
```
Client certificates to be used with requests.
### Playwright Recording Options
Note: Browser Use also provides some of our own recording-related options not listed below (see above).
#### `record_video_dir`
<a name="record-video-dir"></a>
<a name="save-recording-path"></a>
```python
record_video_dir: str | Path | None = None
```
Directory to save `.webm` video recordings. [Playwright Docs: `record_video_dir`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-record-video-dir)
<Note>
This parameter also has an alias `save_recording_path` for backwards compatibility with past versions, but we recommend using the standard Playwright name `record_video_dir` going forward.
</Note>
#### `record_video_size`
```python
record_video_size: dict | None = None. [Playwright Docs: `record_video_size`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-record-video-size)
```
Video size. Example: `{"width": 1280, "height": 720}`
#### `record_har_path`
<a name="record-har-path"></a>
<a name="save-har-path"></a>
```python
record_har_path: str | Path | None = None
```
Path to save `.har` network trace files. [Playwright Docs: `record_har_path`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-record-har-path)
<Note>
This parameter also has an alias `save_har_path` for backwards compatibility with past versions, but we recommend using the standard Playwright name `record_har_path` going forward.
</Note>
#### `record_har_content`
```python
record_har_content: RecordHarContent = 'embed'
```
How to persist HAR content: `'omit'`, `'embed'`, `'attach'`
#### `record_har_mode`
```python
record_har_mode: RecordHarMode = 'full'
```
HAR recording mode: `'full'`, `'minimal'`
#### `record_har_omit_content`
```python
record_har_omit_content: bool = False
```
Whether to omit request content from the HAR.
#### `record_har_url_filter`
```python
record_har_url_filter: str | Pattern | None = None
```
URL filter for HAR recording.
#### `downloads_path`
```python
downloads_path: str | Path | None = '~/.config/browseruse/downloads'
```
(aliases: `downloads_dir`, `save_downloads_path`)
Local filesystem directory to save browser file downloads to.
#### `traces_dir`
<a name="traces-dir"></a>
<a name="trace-path"></a>
```python
traces_dir: str | Path | None = None
```
Directory to save all-in-one trace files. Files are automatically named as `{traces_dir}/{context_id}.zip`. [Playwright Docs: `traces_dir`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-traces-dir)
<Note>
This parameter also has an alias `trace_path` for backwards compatibility with past versions, but we recommend using the standard Playwright name `traces_dir` going forward.
</Note>
#### `handle_sighup`
```python
handle_sighup: bool = True
```
Whether playwright should swallow SIGHUP signals and kill the browser.
#### `handle_sigint`
```python
handle_sigint: bool = False
```
Whether playwright should swallow SIGINT signals and kill the browser.
#### `handle_sigterm`
```python
handle_sigterm: bool = False
```
Whether playwright should swallow SIGTERM signals and kill the browser.
---
## Full Example
```python
from browser_use import BrowserSession, BrowserProfile, Agent
browser_profile = BrowserProfile(
headless=False,
storage_state="path/to/storage_state.json",
wait_for_network_idle_page_load_time=3.0,
viewport={"width": 1280, "height": 1100},
locale='en-US',
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
highlight_elements=True,
viewport_expansion=500,
allowed_domains=['*.google.com', 'http*://*.wikipedia.org'],
user_data_dir=None,
)
browser_session = BrowserSession(
browser_profile=browser_profile,
headless=True, # extra kwargs to the session override the defaults in the profile
)
# you can drive a session without the agent / reuse it between agents
await browser_session.start()
page = await browser_session.get_current_page()
await page.goto('https://example.com/first/page')
async def run_search():
agent = Agent(
task='Your task',
llm=llm,
page=page, # optional: pass a specific playwright page to start on
browser_session=browser_session, # optional: pass an existing browser session to an agent
)
```
---
## Summary
- **BrowserSession** (defined in `browser_use/browser/session.py`) handles the live browser connection and runtime state
- **BrowserProfile** (defined in `browser_use/browser/profile.py`) is a template that can store default config parameters for a `BrowserSession(...)`
Configuration parameters defined in both scopes consumed by these calls depending on whether we're connecting/launching:
- `BrowserConnectArgs` - args for `playwright.BrowserType.connect_over_cdp(...)`
- `BrowserLaunchArgs` - args for `playwright.BrowserType.launch(...)`
- `BrowserNewContextArgs` - args for `playwright.BrowserType.new_context(...)`
- `BrowserLaunchPersistentContextArgs` - args for `playwright.BrowserType.launch_persistent_context(...)`
- Browser Use's own internal methods
For more details on Playwright's browser context options, see their [launch args documentation](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context).
---

View file

@ -1,82 +0,0 @@
---
applyTo: '**'
---
## 🧠 General Guidelines for Contributing to `browser-use`
**Browser-Use** is an AI agent that autonomously interacts with the web. It takes a user-defined task, navigates web pages using Chromium via Playwright, processes HTML, and repeatedly queries a language model (like `gpt-4o`) to decide the next action—until the task is completed.
### 🗂️ File Documentation
When you create a **new file**:
* **For humans**: At the top of the file, include a docstring in natural language explaining:
* What this file does.
* How it fits into the browser-use system.
* If it introduces a new abstraction or replaces an old one.
* **For LLMs/AI**: Include structured metadata using standardized comments such as:
```python
# @file purpose: Defines <purpose>
```
---
### 🧰 Development Rules
* ✅ **Always use [`uv`](mdc:https:/github.com/astral-sh/uv) instead of `pip`**
For deterministic and fast dependency installs.
```bash
uv venv --python 3.11
source .venv/bin/activate
uv sync
```
* ✅ **Use real model names**
Do **not** replace `gpt-4o` with `gpt-4`. The model `gpt-4o` is a distinct release and supported.
* ✅ **Type-safe coding**
Use **Pydantic v2 models** for all internal action schemas, task inputs/outputs, and controller I/O. This ensures robust validation and LLM-call integrity.
---
## ⚙️ Adding New Actions
To add a new action that your browser agent can execute:
```python
from playwright.async_api import Page
from browser_use.core.controller import Controller, ActionResult
controller = Controller()
@controller.registry.action("Search the web for a specific query")
async def search_web(query: str, page: Page):
# Implement your logic here, e.g., query a search engine and return results
result = ...
return ActionResult(extracted_content=result, include_in_memory=True)
```
### Notes:
* Use descriptive names and docstrings for each action.
* Prefer returning `ActionResult` with structured content to help the agent reason better.
---
## 🧠 Creating and Running an Agent
To define a task and run a browser-use agent:
```python
from browser_use import Agent
from langchain.chat_models import ChatOpenAI
task = "Find the CEO of OpenAI and return their name"
model = ChatOpenAI(model="gpt-4o")
agent = Agent(task=task, llm=model, controller=controller)
history = await agent.run()
```

View file

@ -1,249 +0,0 @@
---
description: "Extend default agent and write custom action functions to do certain tasks"
applyTo: '**'
---
Custom actions are functions *you* provide, that are added to our [default actions](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) the agent can use to accomplish tasks.
Action functions can request [arbitrary parameters](#action-parameters-via-pydantic-model) that the LLM has to come up with + a fixed set of [framework-provided arguments](#framework-provided-parameters) for browser APIs / `Agent(context=...)` / etc.
<Note>
Our default set of actions is already quite powerful, the built-in `Controller` provides basics like `open_tab`, `scroll_down`, `extract_content`, [and more](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py).
</Note>
It's easy to add your own actions to implement additional custom behaviors, integrations with other apps, or performance optimizations.
For examples of custom actions (e.g. uploading files, asking a human-in-the-loop for help, drawing a polygon with the mouse, and more), see [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).
## Action Function Registration
To register your own custom functions (which can be `sync` or `async`), decorate them with the `@controller.action(...)` decorator. This saves them into the `controller.registry`.
```python
from browser_use import Controller, ActionResult
controller = Controller()
@controller.action('Ask human for help with a question', domains=['example.com']) # pass allowed_domains= or page_filter= to limit actions to certain pages
def ask_human(question: str) -> ActionResult:
answer = input(f'{question} > ')
return ActionResult(extracted_content=f'The human responded with: {answer}', include_in_memory=True)
```
```python
# Then pass your controller to the agent to use it
agent = Agent(
task='...',
llm=llm,
controller=controller,
)
```
<Note>
Keep your action function names and descriptions short and concise:
- The LLM chooses between actions to run solely based on the function name and description
- The LLM decides how to fill action params based on their names, type hints, & defaults
</Note>
---
## Action Parameters
Browser Use supports two patterns for defining action parameters: normal function arguments, or a Pydantic model.
### Function Arguments
For simple actions that don't need default values, you can define the action parameters directly as arguments to the function. This one takes a single string argument, `css_selector`.
When the LLM calls an action, it sees its argument names & types, and will provide values that fit.
```python
@controller.action('Click element')
def click_element(css_selector: str, page: Page) -> ActionResult:
# css_selector is an action param the LLM must provide when calling
# page is a special framework-provided param to access the browser APIs (see below)
await page.locator(css_selector).click()
return ActionResult(extracted_content=f"Clicked element {css_selector}")
```
### Pydantic Model
You can define a pydantic model for the parameters your action expects by setting a `@controller.action(..., param_model=MyParams)`.
This allows you to use optional parameters, default values, `Annotated[...]` types with custom validation, field descriptions, and other features offered by pydantic.
When the agent calls calls your agent function, an instance of your model with the values filled by the LLM will be passed as the argument named `params` to your action function.
Using a pydantic model is helpful because it allows more flexibility and power to enforce the schema of the values the LLM should provide.
The LLM gets the entire pydantic JSON schema for your `param_model`, it will see the function name & description + individual field names, types, descriptions, and default values.
```python
from typing import Annotated
from pydantic import BaseModel, AfterValidator
from browser_use import ActionResult
class MyParams(BaseModel):
field1: int
field2: str = 'default value'
field3: Annotated[str, AfterValidator(lambda s: s.lower())] # example: enforce always lowercase
field4: str = Field(default='abc', description='Detailed description for the LLM')
@controller.action('My action', param_model=MyParams)
def my_action(params: MyParams, page: Page) -> ActionResult:
await page.keyboard.type(params.field2)
return ActionResult(extracted_content=f"Inputted {params} on {page.url}")
```
Any special framework-provided arguments (e.g. `page`) will be passed as separate positional arguments after `params`.
<Important>
To use a `BaseModel` the arg *must* be called `params`. Action function args are matched and filled like named arguments; arg order doesn't matter but names and types do.
</Important>
### Framework-Provided Parameters
These special action parameters are injected by the `Controller` and are passed as extra args to any actions that expect them.
For example, actions that need to run playwright code to interact with the browser should take the argument `page` or `browser_session`.
- `page: Page` - The current Playwright page (shortcut for `browser_session.get_current_page()`)
- `browser_session: BrowserSession` - The current browser session (and playwright context via `browser_session.browser_context`)
- `context: AgentContext` - Any optional top-level context object passed to the Agent, e.g. `Agent(context=user_provided_obj)`
- `page_extraction_llm: BaseChatModel` - LLM instance used for page content extraction
- `available_file_paths: list[str]` - List of available file paths for upload / processing
- `has_sensitive_data: bool` - Whether the action content contains sensitive data markers (check this to avoid logging sensitive data to terminal by accident)
#### Example: Action uses the current `page`
```python
from playwright.async_api import Page
from browser_use import Controller, ActionResult
controller = Controller()
@controller.action('Type keyboard input into a page')
async def input_text_into_page(text: str, page: Page) -> ActionResult:
await page.keyboard.type(text)
return ActionResult(extracted_content='Website opened')
```
#### Example: Action uses the `browser_context`
```python
from browser_use import BrowserSession, Controller, ActionResult
controller = Controller()
@controller.action('Open website')
async def open_website(url: str, browser_session: BrowserSession) -> ActionResult:
# find matching existing tab by looking through all pages in playwright browser_context
all_tabs = await browser_session.browser_context.pages
for tab in all_tabs:
if tab.url == url:
await tab.bring_to_foreground()
return ActionResult(extracted_content=f'Switched to tab with url {url}')
# otherwise, create a new tab
new_tab = await browser_session.browser_context.new_page()
await new_tab.goto(url)
return ActionResult(extracted_content=f'Opened new tab with url {url}')
```
---
## Important Rules
1. **Return an [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code)**: All actions should return an `ActionResult | str | None`. The stringified version of the result is passed back to the LLM, and optionally persisted in the long-term memory when `ActionResult(..., include_in_memory=True)`.
2. **Type hints on arguments are required**: They are used to verify that action params don't conflict with special arguments injected by the controller (e.g. `page`)
3. **Actions functions called directly must be passed kwargs**: When calling actions from other actions or python code, you must **pass all parameters as kwargs only**, even though the actions are usually defined using positional args (for the same reasons as [pluggy](https://pluggy.readthedocs.io/en/stable/index.html#calling-hooks)).
Action arguments are always matched by name and type, **not** positional order, so this helps prevent ambiguity / reordering issues while keeping action signatures short.
```python
@controller.action('Fill in the country form field')
def input_country_field(country: str, page: Page) -> ActionResult:
await some_action(123, page=page) # ❌ not allowed: positional args, use kwarg syntax when calling
await some_action(abc=123, page=page) # ✅ allowed: action params & special kwargs
await some_other_action(params=OtherAction(abc=123), page=page) # ✅ allowed: params=model & special kwargs
```
```python
# Using Pydantic Model to define action params (recommended)
class PinCodeParams(BaseModel):
code: int
retries: int = 3 # ✅ supports optional/defaults
@controller.action('...', param_model=PinCodeParams)
async def input_pin_code(params: PinCodeParams, page: Page): ... # ✅ special params at the end
# Using function arguments to define action params
async def input_pin_code(code: int, retries: int, page: Page): ... # ✅ params first, special params second, no defaults
async def input_pin_code(code: int, retries: int=3): ... # ✅ defaults ok only if no special params needed
async def input_pin_code(code: int, retries: int=3, page: Page): ... # ❌ Python SyntaxError! not allowed
```
---
## Reusing Custom Actions Across Agents
You can use the same controller for multiple agents.
```python
controller = Controller()
# ... register actions to the controller
agent = Agent(
task="Go to website X and find the latest news",
llm=llm,
controller=controller
)
# Run the agent
await agent.run()
agent2 = Agent(
task="Go to website Y and find the latest news",
llm=llm,
controller=controller
)
await agent2.run()
```
<Note>
The controller is stateless and can be used to register multiple actions and
multiple agents.
</Note>
## Exclude functions
If you want to exclude some registered actions and make them unavailable to the agent, you can do:
```python
controller = Controller(exclude_actions=['open_tab', 'search_google'])
agent = Agent(controller=controller, ...)
```
If you want actions to only be available on certain pages, and to not tell the LLM about them on other pages,
you can use the `allowed_domains` and `page_filter`:
```python
from pydantic import BaseModel
from browser_use import Controller, ActionResult
controller = Controller()
async def is_ai_allowed(page: Page):
if api.some_service.check_url(page.url):
logger.warning('Allowing AI agent to visit url:', page.url)
return True
return False
@controller.action('Fill out secret_form', allowed_domains=['https://*.example.com'], page_filter=is_ai_allowed)
def fill_out_form(...) -> ActionResult:
... will only be runnable by LLM on pages that match https://*.example.com *AND* where is_ai_allowed(page) returns True
```

View file

@ -1,381 +0,0 @@
---
description: "Customize agent behavior with lifecycle hooks"
applyTo: '**'
---
Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution.
Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications.
## Available Hooks
Currently, Browser-Use provides the following hooks:
| Hook | Description | When it's called |
| ---- | ----------- | ---------------- |
| `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action |
| `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step |
```python
await agent.run(on_step_start=..., on_step_end=...)
```
Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter.
### Basic Example
```python
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def my_step_hook(agent: Agent):
# inside a hook you can access all the state and methods under the Agent object:
# agent.settings, agent.state, agent.task
# agent.controller, agent.llm, agent.browser_session
# agent.pause(), agent.resume(), agent.add_new_task(...), etc.
# You also have direct access to the playwright Page and Browser Context
page = await agent.browser_session.get_current_page()
# https://playwright.dev/python/docs/api/class-page
current_url = page.url
visit_log = agent.state.history.urls()
previous_url = visit_log[-2] if len(visit_log) >= 2 else None
print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
# Example: listen for events on the page, interact with the DOM, run JS directly, etc.
await page.on('domcontentloaded', lambda: print('page navigated to a new url...'))
await page.locator("css=form > input[type=submit]").click()
await page.evaluate('() => alert(1)')
await page.browser.new_tab
await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */')
# Example: monitor or intercept all network requests
async def handle_request(route):
# Print, modify, block, etc. do anything to the requests here
# https://playwright.dev/python/docs/network#handle-requests
print(route.request, route.request.headers)
await route.continue_(headers=route.request.headers)
await page.route("**/*", handle_route)
# Example: pause agent execution and resume it based on some custom code
if '/completed' in current_url:
agent.pause()
Path('result.txt').write_text(await page.content())
input('Saved "completed" page content to result.txt, press [Enter] to resume...')
agent.resume()
agent = Agent(
task="Search for the latest news about AI",
llm=ChatOpenAI(model="gpt-4o"),
)
await agent.run(
on_step_start=my_step_hook,
# on_step_end=...
max_steps=10
)
```
## Data Available in Hooks
When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access:
- `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one
- `agent.controller` give access to the `Controller()` object and `Registry()` containing the available actions
- `agent.controller.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)`
- `agent.context` lets you access any user-provided context object passed in to `Agent(context=...)`
- `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items
- `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time
- `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`)
- `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc.
- `agent.state.history.model_thoughts()`: Reasoning from Browser Use's model.
- `agent.state.history.model_outputs()`: Raw outputs from the Browsre Use's model.
- `agent.state.history.model_actions()`: Actions taken by the agent
- `agent.state.history.extracted_content()`: Content extracted from web pages
- `agent.state.history.urls()`: URLs visited by the agent
- `agent.browser_session` gives direct access to the `BrowserSession()` and playwright objects
- `agent.browser_session.get_current_page()`: Get the current playwright `Page` object the agent is focused on
- `agent.browser_session.browser_context`: Get the current playwright `BrowserContext` object
- `agent.browser_session.browser_context.pages`: Get all the tabs currently open in the context
- `agent.browser_session.get_page_html()`: Current page HTML
- `agent.browser_session.take_screenshot()`: Screenshot of the current page
## Tips for Using Hooks
- **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
- **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow.
- **Use custom actions instead**: hooks are fairly advanced, most things can be implemented with [custom action functions](/customize/custom-functions) instead
---
## Complex Example: Agent Activity Recording System
This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.
### Setup Instructions
To use this example, you'll need to:
1. Set up the required dependencies:
```bash
pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use langchain-openai
```
2. Create two separate Python files:
- `api.py` - The FastAPI server component
- `client.py` - The Browser-Use agent with recording hook
3. Run both components:
- Start the API server first: `python api.py`
- Then run the client: `python client.py`
### Server Component (api.py)
The server component handles receiving and storing the agent's activity data:
```python
#!/usr/bin/env python3
#
# FastAPI API to record and save Browser-Use activity data.
# Save this code to api.py and run with `python api.py`
#
import json
import base64
from pathlib import Path
from fastapi import FastAPI, Request
import prettyprinter
import uvicorn
prettyprinter.install_extras()
# Utility function to save screenshots
def b64_to_png(b64_string: str, output_file):
"""
Convert a Base64-encoded string to a PNG file.
:param b64_string: A string containing Base64-encoded data
:param output_file: The path to the output PNG file
"""
with open(output_file, "wb") as f:
f.write(base64.b64decode(b64_string))
# Initialize FastAPI app
app = FastAPI()
@app.post("/post_agent_history_step")
async def post_agent_history_step(request: Request):
data = await request.json()
prettyprinter.cpprint(data)
# Ensure the "recordings" folder exists using pathlib
recordings_folder = Path("recordings")
recordings_folder.mkdir(exist_ok=True)
# Determine the next file number by examining existing .json files
existing_numbers = []
for item in recordings_folder.iterdir():
if item.is_file() and item.suffix == ".json":
try:
file_num = int(item.stem)
existing_numbers.append(file_num)
except ValueError:
# In case the file name isn't just a number
pass
if existing_numbers:
next_number = max(existing_numbers) + 1
else:
next_number = 1
# Construct the file path
file_path = recordings_folder / f"{next_number}.json"
# Save the JSON data to the file
with file_path.open("w") as f:
json.dump(data, f, indent=2)
# Optionally save screenshot if needed
# if "website_screenshot" in data and data["website_screenshot"]:
# screenshot_folder = Path("screenshots")
# screenshot_folder.mkdir(exist_ok=True)
# b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")
return {"status": "ok", "message": f"Saved to {file_path}"}
if __name__ == "__main__":
print("Starting Browser-Use recording API on http://0.0.0.0:9000")
uvicorn.run(app, host="0.0.0.0", port=9000)
```
### Client Component (client.py)
The client component runs the Browser-Use agent with a recording hook:
```python
#!/usr/bin/env python3
#
# Client to record and save Browser-Use activity.
# Save this code to client.py and run with `python client.py`
#
import asyncio
import requests
from dotenv import load_dotenv
from pyobjtojson import obj_to_json
from langchain_openai import ChatOpenAI
from browser_use import Agent
# Load environment variables (for API keys)
load_dotenv()
def send_agent_history_step(data):
"""Send the agent step data to the recording API"""
url = "http://127.0.0.1:9000/post_agent_history_step"
response = requests.post(url, json=data)
return response.json()
async def record_activity(agent_obj):
"""Hook function that captures and records agent activity at each step"""
website_html = None
website_screenshot = None
urls_json_last_elem = None
model_thoughts_last_elem = None
model_outputs_json_last_elem = None
model_actions_json_last_elem = None
extracted_content_json_last_elem = None
print('--- ON_STEP_START HOOK ---')
# Capture current page state
website_html = await agent_obj.browser_session.get_page_html()
website_screenshot = await agent_obj.browser_session.take_screenshot()
# Make sure we have state history
if hasattr(agent_obj, "state"):
history = agent_obj.state.history
else:
history = None
print("Warning: Agent has no state history")
return
# Process model thoughts
model_thoughts = obj_to_json(
obj=history.model_thoughts(),
check_circular=False
)
if len(model_thoughts) > 0:
model_thoughts_last_elem = model_thoughts[-1]
# Process model outputs
model_outputs = agent_obj.state.history.model_outputs()
model_outputs_json = obj_to_json(
obj=model_outputs,
check_circular=False
)
if len(model_outputs_json) > 0:
model_outputs_json_last_elem = model_outputs_json[-1]
# Process model actions
model_actions = agent_obj.state.history.model_actions()
model_actions_json = obj_to_json(
obj=model_actions,
check_circular=False
)
if len(model_actions_json) > 0:
model_actions_json_last_elem = model_actions_json[-1]
# Process extracted content
extracted_content = agent_obj.state.history.extracted_content()
extracted_content_json = obj_to_json(
obj=extracted_content,
check_circular=False
)
if len(extracted_content_json) > 0:
extracted_content_json_last_elem = extracted_content_json[-1]
# Process URLs
urls = agent_obj.state.history.urls()
urls_json = obj_to_json(
obj=urls,
check_circular=False
)
if len(urls_json) > 0:
urls_json_last_elem = urls_json[-1]
# Create a summary of all data for this step
model_step_summary = {
"website_html": website_html,
"website_screenshot": website_screenshot,
"url": urls_json_last_elem,
"model_thoughts": model_thoughts_last_elem,
"model_outputs": model_outputs_json_last_elem,
"model_actions": model_actions_json_last_elem,
"extracted_content": extracted_content_json_last_elem
}
print("--- MODEL STEP SUMMARY ---")
print(f"URL: {urls_json_last_elem}")
# Send data to the API
result = send_agent_history_step(data=model_step_summary)
print(f"Recording API response: {result}")
async def run_agent():
"""Run the Browser-Use agent with the recording hook"""
agent = Agent(
task="Compare the price of gpt-4o and DeepSeek-V3",
llm=ChatOpenAI(model="gpt-4o"),
)
try:
print("Starting Browser-Use agent with recording hook")
await agent.run(
on_step_start=record_activity,
max_steps=30
)
except Exception as e:
print(f"Error running agent: {e}")
if __name__ == "__main__":
# Check if API is running
try:
requests.get("http://127.0.0.1:9000")
print("Recording API is available")
except:
print("Warning: Recording API may not be running. Start api.py first.")
# Run the agent
asyncio.run(run_agent())
```
Contribution by Carlos A. Planchón.
### Working with the Recorded Data
After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data:
1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step
2. **Extract screenshots**: You can modify the API to save screenshots separately
3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites
### Extending the Example
You can extend this recording system in several ways:
1. **Save screenshots separately**: Uncomment the screenshot saving code in the API
2. **Add a web dashboard**: Create a simple web interface to view recorded sessions
3. **Add session IDs**: Modify the API to group steps by agent session
4. **Add filtering**: Implement filters to record only specific types of actions

View file

@ -1,49 +0,0 @@
---
description: "The default is text. But you can define a structured output format to make post-processing easier."
applyTo: '**'
---
## Custom output format
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py) you can define what output format the agent should return to you.
```python
from pydantic import BaseModel
# Define the output format as a Pydantic model
class Post(BaseModel):
post_title: str
post_url: str
num_comments: int
hours_since_post: int
class Posts(BaseModel):
posts: List[Post]
controller = Controller(output_model=Posts)
async def main():
task = 'Go to hackernews show hn and give me the first 5 posts'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(task=task, llm=model, controller=controller)
history = await agent.run()
result = history.final_result()
if result:
parsed: Posts = Posts.model_validate_json(result)
for post in parsed.posts:
print('\n--------------------------------')
print(f'Title: {post.post_title}')
print(f'URL: {post.post_url}')
print(f'Comments: {post.num_comments}')
print(f'Hours since post: {post.hours_since_post}')
else:
print('No result')
if __name__ == '__main__':
asyncio.run(main())
```

View file

@ -1,414 +0,0 @@
---
description: "Connect to a remote browser or launch a new local browser."
applyTo: '**'
---
## Overview
Browser Use supports a wide variety of ways to launch or connect to a browser:
- Launch a new local browser using playwright/patchright chromium (the default)
- Connect to a remote browser using CDP or WSS
- Use an existing playwright `Page`, `Browser`, or `BrowserContext` object
- Connect to a local browser already running using `browser_pid`
<Tip>
Don't want to manage your own browser infrastructure? Try [☁️ Browser Use Cloud](https://browser-use.com) ➡️
We provide automatic CAPTCHA solving, proxies, human-in-the-loop automation, and more!
</Tip>
## Connection Methods
### Method A: Launch a New Local Browser (Default)
Launch a local browser using built-in default (playwright `chromium`) or a provided `executable_path`:
```python
from browser_use import Agent, BrowserSession
# If no executable_path provided, uses Playwright/Patchright's built-in Chromium
browser_session = BrowserSession(
# Path to a specific Chromium-based executable (optional)
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', # macOS
# For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
# For Linux: '/usr/bin/google-chrome'
# Use a specific data directory on disk (optional, set to None for incognito)
user_data_dir='~/.config/browseruse/profiles/default', # this is the default
# ... any other BrowserProfile or playwright launch_persistnet_context config...
# headless=False,
)
agent = Agent(
task="Your task here",
llm=llm,
browser_session=browser_session,
)
```
We support most `chromium`-based browsers in `executable_path`, including [Brave](https://github.com/browser-use/browser-use/tree/main/examples/browser/stealth.py), [patchright chromium](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright), [rebrowser](https://rebrowser.net/), Edge, and more. See [`examples/browser/stealth.py`](https://github.com/browser-use/browser-use/tree/main/examples/browser) for more. We do not support Firefox or Safari at the moment.
<Warning>
[As of Chrome v136](https://github.com/browser-use/browser-use/issues/1520), driving browsers with the default profile is [no longer supported](https://developer.chrome.com/blog/remote-debugging-port) for security reasons. Browser-Use has transitioned to creating a new dedicated profile for agents in: `~/.config/browseruse/profiles/default`. You can [open this profile](https://superuser.com/questions/377186/how-do-i-start-chrome-using-a-specified-user-profile) and log into everything you need your agent to have access to, and it will persist over time.
</Warning>
### Method B: Connect Using Existing Playwright Objects
Pass existing Playwright `Page`, `BrowserContext`, `Browser`, and/or `playwright` API object to `BrowserSession(...)`:
```python
from browser_use import Agent, BrowserSession
from playwright.async_api import async_playwright
# from patchright.async_api import async_playwright # stealth alternative
async with async_playwright() as playwright:
browser = await playwright.chromium.launch()
context = await browser.new_context()
page = await context.new_page()
browser_session = BrowserSession(
page=page,
# browser_context=context, # all these are supported
# browser=browser,
# playwright=playwright,
)
agent = Agent(
task="Your task here",
llm=llm,
browser_session=browser_session,
)
```
You can also pass `page` directly to `Agent(...)` as a shortcut.
```python
agent = Agent(
task="Your task here",
llm=llm,
page=page,
)
```
### Method C: Connect to Local Browser Using Browser PID
Connect to a browser with open `--remote-debugging-port`:
```python
from browser_use import Agent, BrowserSession
# First, start Chrome with remote debugging:
# /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-port=9242
# Then connect using the process ID
browser_session = BrowserSession(browser_pid=12345) # Replace with actual Chrome PID
agent = Agent(
task="Your task here",
llm=llm,
browser_session=browser_session,
)
```
### Method D: Connect to remote Playwright Node.js Browser Server via WSS URL
Connect to Playwright Node.js server providers:
```python
from browser_use import Agent, BrowserSession
# Connect to a playwright server
browser_session = BrowserSession(wss_url="wss://your-playwright-server.com/ws")
agent = Agent(
task="Your task here",
llm=llm,
browser_session=browser_session,
)
```
### Method E: Connect to Remote Browser via CDP URL
Connect to any remote Chromium-based browser:
```python
from browser_use import Agent, BrowserSession
# Connect to Chrome via CDP
browser_session = BrowserSession(cdp_url="http://localhost:9222")
agent = Agent(
task="Your task here",
llm=llm,
browser_session=browser_session,
)
```
## Security Considerations
<Warning>
When using any browser profile, the agent will have access to:
- All its logged-in sessions and cookies
- Saved passwords (if autofill is enabled)
- Browser history and bookmarks
- Extensions and their data
Always review the task you're giving to the agent and ensure it aligns with your security requirements!
Use `Agent(sensitive_data={'https://auth.example.com': {x_key: value}})` for any secrets, and restrict the browser with `BrowserSession(allowed_domains=['https://*.example.com'])`.
</Warning>
## Best Practices
1. **Use isolated profiles**: Create separate Chrome profiles for different agents to limit scope of risk:
```python
browser_session = BrowserSession(
user_data_dir='~/.config/browseruse/profiles/banking',
# profile_directory='Default'
)
```
2. **Limit domain access**: Restrict which sites the agent can visit:
```python
browser_session = BrowserSession(
allowed_domains=['example.com', 'http*://*.github.com'],
)
```
3. **Enable `keep_alive=True`** If you want to use a single `BrowserSession` with more than one agent:
```python
browser_session = BrowserSession(
keep_alive=True,
...
)
await browser_session.start() # start the session yourself before passing to Agent
...
agent = Agent(..., browser_session=browser_session)
await agent.run()
...
await browser_session.kill() # end the session yourself, shortcut for keep_alive=False + .stop()
```
## Re-Using a Browser
A `BrowserSession` starts when the browser is launched/connected, and ends when the browser process exits/disconnects. A session internally manages a single live playwright browser context, and is normally auto-closed by the agent when its task is complete (*if* the agent started the session itself). If you pass an existing `BrowserSession` into an Agent, or if you set `BrowserSession(keep_alive=True)`, the session will not be closed and can be re-used between agents.
Browser Use provides a number of ways to re-use profiles, sessions, and other configuration across multiple agents.
- ✅ sequential agents can re-use a single `user_data_dir` in new `BrowserSession`s
- ✅ sequential agents can re-use a single `BrowserSession` without closing it
- ❌ parallel agents cannot run separate `BrowserSession`s using the same `user_data_dir`
- ✅ parallel agents can run separate `BrowserSession`s using the same `storage_state`
- ✅ parallel agents can share a single `BrowserSession`, working in different tabs
- ⚠️ parallel agents can share a single `BrowserSession`, working in the same tab
<Important>
Multiple `BrowserSession`s (aka chrome processes) cannot share the same `user_data_dir` at the same time, but they can share a `storage_state` file or `BrowserProfile` config.
</Important>
### Sequential Agents, Same Profile, Different Browser
If you are only running one agent & browser at a time, they can re-use the same `user_data_dir` sequentially.
```python
from browser_use import Agent, BrowserSession
from langchain_openai import ChatOpenAI
reused_profile = BrowserProfile(user_data_dir='~/.config/browseruse/profiles/default')
agent1 = Agent(
task="The first task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_profile=reused_profile, # pass the profile in, it will auto-create a session
)
await agent1.run()
agent2 = Agent(
task="The second task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_profile=reused_profile, # agent will auto-create its own new session
)
await agent2.run()
```
> Make sure to never mix different browser versions or `executable_path`s with the same `user_data_dir`. Once run with a newer browser version, some migrations are applied to the dir and older browsers wont be able to read it.
### Sequential Agents, Same Profile, Same Browser
If you are only running one agent at a time, they can re-use the same active `BrowserSession` and avoid having to relaunch chrome.
Each agent will start off looking at the same tab the last agent ended off on.
```python
from browser_use import Agent, BrowserSession
from langchain_openai import ChatOpenAI
reused_session = BrowserSession(
user_data_dir='~/.config/browseruse/profiles/default',
keep_alive=True, # dont close browser after 1st agent.run() ends
)
await reused_session.start() # when keep_alive=True, session must be started manually
agent1 = Agent(
task="The first task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=reused_session,
)
await agent1.run()
agent2 = Agent(
task="The second task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=reused_session, # re-use the same session
)
await agent2.run()
await reused_session.close()
```
### Parallel Agents, Same Browser, Multiple Tabs
```python
from browser_use import Agent, BrowserSession
from langchain_openai import ChatOpenAI
shared_browser = BrowserSession(
storage_state='/tmp/cookies.json',
user_data_dir=None,
keep_alive=True,
headless=True,
)
await shared_browser.start() # when keep_alive=True, you must start the session yourself
agent1 = Agent(
task="The first task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=shared_browser, # pass the session in
)
agent2 = Agent(
task="The second task...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=shared_browser, # re-use the same session
)
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
await shared_browser.close()
```
### Parallel Agents, Same Browser, Same Tab
<Warning>
⚠️ This mode is not recommended. Agents are not yet optimized to share the same tab in the same browser, they may interfere with each other or cause errors.
</Warning>
```python
from browser_use import Agent, BrowserSession
from langchain_openai import ChatOpenAI
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context()
shared_page = await context.new_page()
await shared_page.goto('https://example.com', wait_until='domcontentloaded')
shared_session = BrowserSession(page=shared_page, keep_alive=True)
await shared_session.start()
agent1 = Agent(
task="Fill out the form in section A...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=shared_session
)
agent2 = Agent(
task="Fill out the form in section B...",
llm=ChatOpenAI(model="gpt-4o-mini"),
browser_session=shared_session,
)
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
await shared_session.kill()
```
### Parallel Agents, Same Profile, Different Browsers
<Tip>
This mode is the recommended default.
</Tip>
To share a single set of configuration or cookies, but still have agents working in their own browser sessions (potentially in parallel), use our provided `BrowserProfile` object.
The recommended way to re-use cookies and localStorage state between separate parallel sessions is to use the [`storage_state`](https://docs.browser-use.com/customize/browser-settings#storage-state) option.
```bash
# open a browser to log into sites you want the Agent to have access to
playwright open https://example.com/ --save-storage=/tmp/auth.json
playwright open https://example.com/ --load-storage=/tmp/auth.json
```
```python
from browser_use.browser import BrowserProfile, BrowserSession
shared_profile = BrowserProfile(
headless=True,
user_data_dir=None, # use dedicated tmp user_data_dir per session
storage_state='/tmp/auth.json', # load/save cookies to/from json file
keep_alive=True, # don't close the browser after the agent finishes
)
window1 = BrowserSession(browser_profile=profile_a)
await window1.start()
agent1 = Agent(browser_session=window1)
window2 = BrowserSession(browser_profile=profile_a)
await window2.start()
agent2 = Agent(browser_session=window2)
await asyncio.gather(agent1.run(), agent2.run()) # run in parallel
await window1.save_storage_state() # write storage state (cookies, localStorage, etc.) to auth.json
await window2.save_storage_state() # you must decide when to save manually
# can also reload the cookies from the file into the active session if they change
await window1.load_storage_state()
await window1.close()
await window2.close()
```
---
## Troubleshooting
### Chrome Won't Connect
If you're having trouble connecting:
1. **Close all Chrome instances** before trying to launch with a custom profile
2. **Check if Chrome is running with debugging port**:
```bash
ps aux | grep chrome | grep remote-debugging-port
```
3. **Verify the executable path** is correct for your system
4. **Check profile permissions** - ensure your user has read/write access
### Profile Lock Issues
If you get a "profile is already in use" error:
1. Close all Chrome instances
2. The profile will automatically be unlocked when BrowserSession starts
3. Alternatively, manually delete the `SingletonLock` file in the profile directory
<Note>
For more configuration options, see the [Browser Settings](/customize/browser-settings) documentation.
</Note>
### Profile Version Issues
The browser version you run must always be equal to or greater than the version used to create the `user_data_dir`.
If you see errors like `Failed to parse Extensions` when launching, you're likely attempting to run an older browser with an incompatible `user_data_dir` that's already been migrated to a newer Chrome version.
Playwright ships a version of chromium that's newer than the default stable Google Chrome release channel, so this can happen if you try to use
a profile created by the default playwright chromium (e.g. `user_data_dir='~/.config/browseruse/profiles/default'`) with an older
local browser like `executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'`.

View file

@ -1,198 +0,0 @@
---
description: "Handle sensitive information securely and avoid sending PII & passwords to the LLM."
applyTo: '**'
---
## Handling Sensitive Data
When working with sensitive information like passwords or PII, you can use the `Agent(sensitive_data=...)` parameter to provide sensitive strings that the model can use in actions without ever seeing directly.
```python
agent = Agent(
task='Log into example.com as user x_username with password x_password',
sensitive_data={
'https://example.com': {
'x_username': 'abc@example.com',
'x_password': 'abc123456', # 'x_placeholder': '<actual secret value>',
},
},
)
```
<Note>
You should also configure [`BrowserSession(allowed_domains=...)`](https://docs.browser-use.com/customize/browser-settings#allowed-domains) to prevent the Agent from visiting URLs not needed for the task.
</Note>
### Basic Usage
Here's a basic example of how to use sensitive data:
```python
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession
llm = ChatOpenAI(model='gpt-4o', temperature=0.0)
# Define sensitive data
# The LLM will only see placeholder names (x_member_number, x_passphrase), never the actual values
sensitive_data = {
'https://*.example.com': {
'x_member_number': '123235325',
'x_passphrase': 'abcwe234',
},
}
# Use the placeholder names in your task description
task = """
1. go to https://travel.example.com
2. sign in with your member number x_member_number and private access code x_passphrase
3. extract today's list of travel deals as JSON
"""
# Recommended: Limit the domains available for the entire browser so the Agent can't be tricked into visiting untrusted URLs
browser_session = BrowserSession(allowed_domains=['https://*.example.com'])
agent = Agent(
task=task,
llm=llm,
sensitive_data=sensitive_data, # Pass the sensitive data to the agent
browser_session=browser_session, # Pass the restricted browser_session to limit URLs Agent can visit
use_vision=False, # Disable vision or else the LLM might see entered values in screenshots
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
In this example:
1. The LLM only ever sees the `x_member_number` and `x_passphrase` placeholders in prompts
2. When the model wants to use your password it outputs x_passphrase - and we replace it with the actual value in the DOM
3. When sensitive data appear in the content of the current page, we replace it in the page summary fed to the LLM - so that the model never has it in its state.
4. The browser will be entirely prevented from going to any site not under `https://*.example.com`
This approach ensures that sensitive information remains secure while still allowing the agent to perform tasks that require authentication.
---
### Best Practices
- Always restrict your sensitive data to only the exact domains that need it, `https://travel.example.com` is better than `*.example.com`
- Always restrict [`BrowserSession(allowed_domains=[...])`](https://docs.browser-use.com/customize/browser-settings#allowed-domains) to only the domains the agent needs to visit to accomplish its task. This helps guard against prompt injection attacks, jailbreaks, and LLM mistakes.
- Only use `sensitive_data` for strings that can be inputted verbatim as text. The LLM never sees the actual values, so it can't "understand" them, adapt them, or split them up for multiple input fields. For example, you can't ask the Agent to click through a datepicker UI to input the sensitive value `1990-12-31`. For these situations you can implement a [custom function](/customize/custom-functions) the LLM can call that updates the DOM using Python / JS.
- Don't use `sensitive_data` for login credentials, it's better to use [`storage_state`](docs.browser-use.com/customize/browser-settings#storage-state) or a [`user_data_dir`](/customize/browser-settings#user-data-dir) to log into the sites the agent needs in advance & reuse the cookies:
```bash
# open a browser to log into the sites you need & save the cookies
$ playwright open https://accounts.google.com --save-storage auth.json
```
Then use those cookies when the agent runs:
```python
agent = Agent(..., browser_session=BrowserSession(storage_state='./auth.json'))
```
<Warning>
Warning: Vision models still see the screenshot of the page by default - where the sensitive data might be visible.
It's recommended to set `Agent(use_vision=False)` when working with `sensitive_data`.
</Warning>
<a name="allowed_domains"></a>
<a name="domain-pattern-format"></a>
### Allowed Domains
Domain patterns in `sensitive_data` follow the same format as [`allowed_domains`](https://docs.browser-use.com/customize/browser-settings#allowed-domains):
- `example.com` - Matches only `https://example.com/*`
- `*.example.com` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*`
- `http*://example.com` - Matches both `http://` and `https://` protocols for `example.com/*`
- `chrome-extension://*` - Matches any Chrome extension URL e.g. `chrome-extension://anyextensionid/options.html`
> **Security Warning**: For security reasons, certain patterns are explicitly rejected:
>
> - Wildcards in TLD part (e.g., `example.*`) are **not allowed** (`google.*` would match `google.ninja`, `google.pizza`, etc. which is a bad idea)
> - Embedded wildcards (e.g., `g*e.com`) are rejected to prevent overly broad matches
> - Multiple wildcards like `*.*.domain` are not supported currently, open an issue if you need this feature
The default protocol when no scheme is specified is now `https://` for enhanced security.
For convenience the system will validate that all domain patterns used in `Agent(sensitive_data)` are also included in `BrowserSession(allowed_domains)`.
### Missing or Empty Values
When working with sensitive data, keep these details in mind:
- If a key referenced by the model (`<secret>key_name</secret>`) is missing from your `sensitive_data` dictionary, a warning will be logged but the substitution tag will be preserved.
- If you provide an empty value for a key in the `sensitive_data` dictionary, it will be treated the same as a missing key.
- The system will always attempt to process all valid substitutions, even if some keys are missing or empty.
---
### Full Example
Here's a more complex example demonstrating multiple domains and sensitive data values.
```python
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession
llm = ChatOpenAI(model='gpt-4o', temperature=0.0)
# Domain-specific sensitive data
sensitive_data = {
'https://*.google.com': {'x_email': '...', 'x_pass': '...'},
'chrome-extension://abcd1243': {'x_api_key': '...'},
'http*://example.com': {'x_authcode': '123123'}
}
# Set browser session with allowed domains that match all domain patterns in sensitive_data
browser_session = BrowserSession(
allowed_domains=[
'https://*.google.com',
'chrome-extension://abcd',
'http://example.com', # Explicitly include http:// if needed
'https://example.com' # By default, only https:// is matched
]
)
# Pass the sensitive data to the agent
agent = Agent(
task="Log into Google, then check my account information",
llm=llm,
sensitive_data=sensitive_data,
browser_session=browser_session,
use_vision=False,
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
With this approach:
1. The Google credentials (`x_email` and `x_pass`) will only be used on Google domains (any subdomain, https only)
2. The API key (`x_api_key`) will only be used on pages served by the specific Chrome extension `abcd1243`
3. The auth code (`x_authcode`) will only be used on `http://example.com/*` or `https://example.com/*`

View file

@ -1,294 +0,0 @@
---
description: "Guide to using different LangChain chat models with Browser Use"
applyTo: '**'
---
## Overview
Browser Use supports various LangChain chat models. Here's how to configure and use the most popular ones. The full list is available in the [LangChain documentation](https://python.langchain.com/docs/integrations/chat/).
## Model Recommendations
We have yet to test performance across all models. Currently, we achieve the best results using GPT-4o with an 89% accuracy on the [WebVoyager Dataset](https://browser-use.com/posts/sota-technical-report). DeepSeek-V3 is 30 times cheaper than GPT-4o. Gemini-2.0-exp is also gaining popularity in the community because it is currently free.
We also support local models, like Qwen 2.5, but be aware that small models often return the wrong output structure-which lead to parsing errors. We believe that local models will improve significantly this year.
<Note>
All models require their respective API keys. Make sure to set them in your
environment variables before running the agent.
</Note>
## Supported Models
All LangChain chat models, which support tool-calling are available. We will document the most popular ones here.
### OpenAI
OpenAI's GPT-4o models are recommended for best performance.
```python
from langchain_openai import ChatOpenAI
from browser_use import Agent
# Initialize the model
llm = ChatOpenAI(
model="gpt-4o",
temperature=0.0,
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables:
```bash .env
OPENAI_API_KEY=
```
### Anthropic
```python
from langchain_anthropic import ChatAnthropic
from browser_use import Agent
# Initialize the model
llm = ChatAnthropic(
model_name="claude-3-5-sonnet-20240620",
temperature=0.0,
timeout=100, # Increase for complex tasks
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
And add the variable:
```bash .env
ANTHROPIC_API_KEY=
```
### Azure OpenAI
```python
from langchain_openai import AzureChatOpenAI
from browser_use import Agent
from pydantic import SecretStr
import os
# Initialize the model
llm = AzureChatOpenAI(
model="gpt-4o",
api_version='2024-10-21',
azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT', ''),
api_key=SecretStr(os.getenv('AZURE_OPENAI_KEY', '')),
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables:
```bash .env
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_KEY=
```
### Gemini
> [!IMPORTANT]
> `GEMINI_API_KEY` was the old environment var name, it should be called `GOOGLE_API_KEY` as of 2025-05.
```python
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent
from dotenv import load_dotenv
# Read GOOGLE_API_KEY into env
load_dotenv()
# Initialize the model
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp')
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables:
```bash .env
GOOGLE_API_KEY=
```
### DeepSeek-V3
The community likes DeepSeek-V3 for its low price, no rate limits, open-source nature, and good performance.
The example is available [here](https://github.com/browser-use/browser-use/blob/main/examples/models/deepseek.py).
```python
from langchain_deepseek import ChatDeepSeek
from browser_use import Agent
from pydantic import SecretStr
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("DEEPSEEK_API_KEY")
# Initialize the model
llm=ChatDeepSeek(base_url='https://api.deepseek.com/v1', model='deepseek-chat', api_key=SecretStr(api_key))
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm,
use_vision=False
)
```
Required environment variables:
```bash .env
DEEPSEEK_API_KEY=
```
### DeepSeek-R1
We support DeepSeek-R1. Its not fully tested yet, more and more functionality will be added, like e.g. the output of it'sreasoning content.
The example is available [here](https://github.com/browser-use/browser-use/blob/main/examples/models/deepseek-r1.py).
It does not support vision. The model is open-source so you could also use it with Ollama, but we have not tested it.
```python
from langchain_deepseek import ChatDeepSeek
from browser_use import Agent
from pydantic import SecretStr
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("DEEPSEEK_API_KEY")
# Initialize the model
llm=ChatDeepSeek(base_url='https://api.deepseek.com/v1', model='deepseek-reasoner', api_key=SecretStr(api_key))
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm,
use_vision=False
)
```
Required environment variables:
```bash .env
DEEPSEEK_API_KEY=
```
### Ollama
Many users asked for local models. Here they are.
1. Download Ollama from [here](https://ollama.ai/download)
2. Run `ollama pull model_name`. Pick a model which supports tool-calling from [here](https://ollama.com/search?c=tools)
3. Run `ollama start`
```python
from langchain_ollama import ChatOllama
from browser_use import Agent
from pydantic import SecretStr
# Initialize the model
llm=ChatOllama(model="qwen2.5", num_ctx=32000)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm
)
```
Required environment variables: None!
### Novita AI
[Novita AI](https://novita.ai) is an LLM API provider that offers a wide range of models. Note: choose a model that supports function calling.
```python
from langchain_openai import ChatOpenAI
from browser_use import Agent
from pydantic import SecretStr
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("NOVITA_API_KEY")
# Initialize the model
llm = ChatOpenAI(base_url='https://api.novita.ai/v3/openai', model='deepseek/deepseek-v3-0324', api_key=SecretStr(api_key))
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm,
use_vision=False
)
```
Required environment variables:
```bash .env
NOVITA_API_KEY=
```
### X AI
[X AI](https://x.ai) is an LLM API provider that offers a wide range of models. Note: choose a model that supports function calling.
```python
from langchain_openai import ChatOpenAI
from browser_use import Agent
from pydantic import SecretStr
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("GROK_API_KEY")
# Initialize the model
llm = ChatOpenAI(
base_url='https://api.x.ai/v1',
model='grok-3-beta',
api_key=SecretStr(api_key)
)
# Create agent with the model
agent = Agent(
task="Your task here",
llm=llm,
use_vision=False
)
```
Required environment variables:
```bash .env
GROK_API_KEY=
```
## Coming soon
(We are working on it)
- Groq
- Github
- Fine-tuned models

View file

@ -1,76 +0,0 @@
---
description: "Customize the system prompt to control agent behavior and capabilities"
applyTo: '**'
---
## Overview
You can customize the system prompt in two ways:
1. Extend the default system prompt with additional instructions
2. Override the default system prompt entirely
<Note>
Custom system prompts allow you to modify the agent's behavior at a
fundamental level. Use this feature carefully as it can significantly impact
the agent's performance and reliability.
</Note>
### Extend System Prompt (recommended)
To add additional instructions to the default system prompt:
```python
extend_system_message = """
REMEMBER the most important RULE:
ALWAYS open first a new tab and go first to url wikipedia.com no matter the task!!!
"""
```
### Override System Prompt
<Warning>
Not recommended! If you must override the [default system
prompt](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompt.md),
make sure to test the agent yourself.
</Warning>
Anyway, to override the default system prompt:
```python
# Define your complete custom prompt
override_system_message = """
You are an AI agent that helps users with web browsing tasks.
[Your complete custom instructions here...]
"""
# Create agent with custom system prompt
agent = Agent(
task="Your task here",
llm=ChatOpenAI(model='gpt-4'),
override_system_message=override_system_message
)
```
### Extend Planner System Prompt
You can customize the behavior of the planning agent by extending its system prompt:
```python
extend_planner_system_message = """
PRIORITIZE gathering information before taking any action.
Always suggest exploring multiple options before making a decision.
"""
# Create agent with extended planner system prompt
llm = ChatOpenAI(model='gpt-4o')
planner_llm = ChatOpenAI(model='gpt-4o-mini')
agent = Agent(
task="Your task here",
llm=llm,
planner_llm=planner_llm,
extend_planner_system_message=extend_planner_system_message
)
```

View file

@ -5,8 +5,10 @@ description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"black>=25.1.0",
"browser-use[memory]==0.3.3",
"chardet>=5.2.0",
"isort>=6.0.1",
"lmnr[all]>=0.6.10",
"patchright>=1.52.5",
]

11
run.py
View file

@ -1,9 +1,10 @@
import sys
import subprocess
import os
import requests
from datetime import datetime
import argparse
import os
import subprocess
import sys
from datetime import datetime
import requests
#!/usr/bin/env python3

View file

@ -1,7 +1,7 @@
from lib.browser_use.agents import *
from lib.browser_use.clean_resources import *
from lib.browser_use.func import *
from lib.browser_use.model import *
from lib.browser_use.init_profile import *
from lib.browser_use.sensitive_data import *
from lib.browser_use.agents import *
from lib.browser_use.model import *
from lib.browser_use.scanner import *
from lib.browser_use.sensitive_data import *

View file

@ -1,31 +1,26 @@
import asyncio
import os
import json
from typing import Dict, Any, Optional
import os
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Any, Dict, Optional
from browser_use import Agent, BrowserSession, Controller
from patchright.async_api import async_playwright as async_patchright
from lib.browser_use import (
GetProfile,
GetSensitiveData,
clean_resources,
)
from lib.utils import (
logger,
config,
)
from lib.browser_use import GetProfile, GetSensitiveData, clean_resources
from lib.llm import CreateChatGoogle, get_prompt
from lib.utils import config, logger
# Exponential backoff settings
INITIAL_BACKOFF = int(os.getenv("INITIAL_BACKOFF", "60")) # seconds
MAX_BACKOFF = int(os.getenv("MAX_BACKOFF", "600")) # seconds
@dataclass
class RetryTask:
"""재시도할 작업을 나타내는 클래스"""
task_type: str # "oauth_list" or "oauth_login"
url: str
oauth_provider: Optional[str] = None
@ -33,19 +28,23 @@ class RetryTask:
next_retry_time: Optional[datetime] = None
max_retries: int = 5
# 전역 재시도 큐
retry_queue: list[RetryTask] = []
retry_queue_lock = asyncio.Lock()
async def add_to_retry_queue(task: RetryTask):
"""작업을 재시도 큐에 추가"""
async with retry_queue_lock:
# 중복 작업 확인
existing_task = None
for existing in retry_queue:
if (existing.task_type == task.task_type and
existing.url == task.url and
existing.oauth_provider == task.oauth_provider):
if (
existing.task_type == task.task_type
and existing.url == task.url
and existing.oauth_provider == task.oauth_provider
):
existing_task = existing
break
@ -53,11 +52,16 @@ async def add_to_retry_queue(task: RetryTask):
# 기존 작업이 있으면 재시도 횟수 업데이트
existing_task.retry_count = task.retry_count
existing_task.next_retry_time = task.next_retry_time
print(f"📝 기존 작업 업데이트: {task.task_type} - {task.url} (재시도: {task.retry_count})")
print(
f"📝 기존 작업 업데이트: {task.task_type} - {task.url} (재시도: {task.retry_count})"
)
else:
# 새 작업 추가
retry_queue.append(task)
print(f" 재시도 큐에 작업 추가: {task.task_type} - {task.url} (재시도: {task.retry_count})")
print(
f" 재시도 큐에 작업 추가: {task.task_type} - {task.url} (재시도: {task.retry_count})"
)
async def process_retry_queue():
"""재시도 큐 처리"""
@ -82,20 +86,25 @@ async def process_retry_queue():
else:
await _handle_retry_failure(task)
elif task.task_type == "oauth_login":
result = await _test_oauth_login_internal(task.url, task.oauth_provider)
result = await _test_oauth_login_internal(
task.url, task.oauth_provider
)
if result:
print(f"✅ 재시도 성공: {task.oauth_provider} 로그인 - {task.url}")
print(
f"✅ 재시도 성공: {task.oauth_provider} 로그인 - {task.url}"
)
else:
await _handle_retry_failure(task)
except Exception as e:
print(f"❌ 재시도 중 에러: {e}")
await _handle_retry_failure(task)
async def _handle_retry_failure(task: RetryTask):
"""재시도 실패 처리"""
if task.retry_count < task.max_retries:
task.retry_count += 1
wait_time = min(INITIAL_BACKOFF * (2 ** task.retry_count), MAX_BACKOFF)
wait_time = min(INITIAL_BACKOFF * (2**task.retry_count), MAX_BACKOFF)
task.next_retry_time = datetime.now() + timedelta(seconds=wait_time)
await add_to_retry_queue(task)
print(f"{wait_time}초 후 재시도 예정: {task.task_type} - {task.url}")
@ -103,6 +112,7 @@ async def _handle_retry_failure(task: RetryTask):
print(f"❌ 최대 재시도 횟수 초과: {task.task_type} - {task.url}")
logger(f"❌ 최대 재시도 횟수 초과: {task.task_type} - {task.url}")
async def get_retry_queue_status():
"""재시도 큐 상태 조회"""
async with retry_queue_lock:
@ -114,12 +124,17 @@ async def get_retry_queue_status():
"url": task.url,
"oauth_provider": task.oauth_provider,
"retry_count": task.retry_count,
"next_retry_time": task.next_retry_time.isoformat() if task.next_retry_time else None
"next_retry_time": (
task.next_retry_time.isoformat()
if task.next_retry_time
else None
),
}
for task in retry_queue
]
],
}
async def _run_agent_with_retry(agent_config):
"""Agent 실행을 위한 내부 헬퍼 함수 (재시도 로직 포함)"""
agent = None
@ -134,25 +149,30 @@ async def _run_agent_with_retry(agent_config):
browser_profile=await GetProfile(),
)
agent = Agent(
browser_session=session,
**agent_config["agent_params"]
)
agent = Agent(browser_session=session, **agent_config["agent_params"])
response = await agent.run()
await clean_resources(agent, session)
if any(keyword in str(response) for keyword in [
"429", "resource_exhausted", "resourceexhausted",
"quota", "rate limit", "too many requests",
"exceeded", "limit reached"
]):
if any(
keyword in str(response)
for keyword in [
"429",
"resource_exhausted",
"resourceexhausted",
"quota",
"rate limit",
"too many requests",
"exceeded",
"limit reached",
]
):
print(f"⚠️ API 쿼터 에러 발생, 재시도 큐에 추가: {url}")
task = RetryTask(
task_type=agent_config.get("task_type", "unknown"),
url=url,
retry_count=try_cnt + 1,
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF)
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF),
)
await add_to_retry_queue(task)
return None
@ -166,7 +186,9 @@ async def _run_agent_with_retry(agent_config):
try_cnt += 1
if try_cnt >= 3:
error_msg = f"최대 재시도 횟수 초과."
logger(f"{url} - {agent_config['log_context']} 실패: {error_msg}: {e}")
logger(
f"{url} - {agent_config['log_context']} 실패: {error_msg}: {e}"
)
print(f"{url} - {agent_config['log_context']} 실패: {error_msg}")
return None
@ -197,7 +219,8 @@ async def _extract_oauth_list_internal(url: str):
"llm": CreateChatGoogle(config.GOOGLE_MODEL),
"planner_llm": (
CreateChatGoogle(config.GOOGLE_PLANNER_MODEL)
if config.GOOGLE_PLANNER_MODEL and os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LIST")
if config.GOOGLE_PLANNER_MODEL
and os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LIST")
else None
),
"controller": Controller(
@ -206,7 +229,7 @@ async def _extract_oauth_list_internal(url: str):
),
"extend_system_message": prompt,
"extend_planner_system_message": prompt,
}
},
}
response = await _run_agent_with_retry(agent_config)
@ -241,17 +264,25 @@ async def extract_oauth_list(url: str):
return await _extract_oauth_list_internal(url)
except Exception as e:
error_str = str(e).lower()
if any(keyword in error_str for keyword in [
"429", "resource_exhausted", "resourceexhausted",
"quota", "rate limit", "too many requests",
"exceeded", "limit reached"
]):
if any(
keyword in error_str
for keyword in [
"429",
"resource_exhausted",
"resourceexhausted",
"quota",
"rate limit",
"too many requests",
"exceeded",
"limit reached",
]
):
print(f"⚠️ API 쿼터 에러 발생, 재시도 큐에 추가: {url}")
task = RetryTask(
task_type="oauth_list",
url=url,
retry_count=1,
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF)
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF),
)
await add_to_retry_queue(task)
return []
@ -282,7 +313,8 @@ async def _test_oauth_login_internal(url: str, oauth_provider: str):
"llm": CreateChatGoogle(config.GOOGLE_MODEL),
"planner_llm": (
CreateChatGoogle(config.GOOGLE_PLANNER_MODEL)
if config.GOOGLE_PLANNER_MODEL and os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LOGIN")
if config.GOOGLE_PLANNER_MODEL
and os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LOGIN")
else None
),
"controller": Controller(
@ -291,7 +323,7 @@ async def _test_oauth_login_internal(url: str, oauth_provider: str):
),
"extend_system_message": prompt,
"extend_planner_system_message": prompt,
}
},
}
response = await _run_agent_with_retry(agent_config)
@ -312,26 +344,36 @@ async def test_oauth_login(url: str, oauth_provider: str):
return await _test_oauth_login_internal(url, oauth_provider)
except Exception as e:
error_str = str(e).lower()
if any(keyword in error_str for keyword in [
"429", "resource_exhausted", "resourceexhausted",
"quota", "rate limit", "too many requests",
"exceeded", "limit reached"
]):
if any(
keyword in error_str
for keyword in [
"429",
"resource_exhausted",
"resourceexhausted",
"quota",
"rate limit",
"too many requests",
"exceeded",
"limit reached",
]
):
print(f"⚠️ API 쿼터 에러 발생, 재시도 큐에 추가: {oauth_provider} - {url}")
task = RetryTask(
task_type="oauth_login",
url=url,
oauth_provider=oauth_provider,
retry_count=1,
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF)
next_retry_time=datetime.now() + timedelta(seconds=INITIAL_BACKOFF),
)
await add_to_retry_queue(task)
return False
else:
raise e
async def start_retry_queue_processor():
"""재시도 큐 처리기를 백그라운드에서 시작"""
async def queue_processor():
while True:
try:
@ -345,6 +387,7 @@ async def start_retry_queue_processor():
asyncio.create_task(queue_processor())
print("🔄 재시도 큐 처리기 시작됨")
# 모듈 로딩 시 자동으로 백그라운드 처리기 시작
# (실제 애플리케이션에서는 main 함수에서 호출하는 것이 좋음)
def init_retry_system():

View file

@ -1,5 +1,6 @@
from pathlib import Path
async def clean_resources(agent=None, session=None):
"""리소스를 정리하는 함수"""
storage_state_temp_path = Path("./data/storage_state_temp.json").resolve()

View file

@ -1,14 +1,14 @@
import os
import json
import os
from pathlib import Path
from dotenv import load_dotenv
from browser_use import BrowserProfile
import json
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv(override=True)
async def setup_storage_state():
"""Setup browser storage state for session persistence."""
# Get the script directory to ensure correct path resolution
@ -24,10 +24,10 @@ async def setup_storage_state():
if storage_state_temp_path.exists():
storage_state_temp_path.unlink()
with open(storage_state_path, 'r') as f:
with open(storage_state_path, "r") as f:
storage_data = json.load(f)
with open(storage_state_temp_path, 'w') as f:
with open(storage_state_temp_path, "w") as f:
json.dump(storage_data, f, indent=4)
print(f"🔄 Using existing storage state: {storage_state_temp_path}")

View file

@ -1,9 +1,11 @@
import os
from lib.browser_use.func import *
# Initialize configuration
proxy_url = setup_proxy()
async def GetProfile():
storage_state_path = await setup_storage_state()
@ -11,11 +13,13 @@ async def GetProfile():
try:
if storage_state_path and os.path.exists(storage_state_path):
# Test if file can be read properly, if not, skip it
with open(storage_state_path, 'r', encoding='utf-8') as f:
with open(storage_state_path, "r", encoding="utf-8") as f:
f.read()
storage_state = storage_state_path
else:
print("⚠️ Storage state file not found or inaccessible, proceeding without it.")
print(
"⚠️ Storage state file not found or inaccessible, proceeding without it."
)
storage_state = None
except (UnicodeDecodeError, FileNotFoundError):
# If there's an encoding error, don't use the storage state
@ -25,20 +29,16 @@ async def GetProfile():
# Security settings
disable_security=True,
stealth=True,
# Display settings
headless=False,
device_scale_factor=1,
window_size={"width": 1600, "height": 900},
viewport={"width": 1600, "height": 900},
# Data persistence
user_data_dir=None,
storage_state=storage_state,
# Network settings
proxy={"server": proxy_url} if proxy_url else None,
# Additional arguments
args=get_browser_args(),
)

View file

@ -1,6 +1,8 @@
from typing import List
from pydantic import BaseModel
# 출력 모델
class OAuth(BaseModel):
provider: str

View file

@ -1,10 +1,21 @@
import asyncio
import os
import csv
import os
from lib.browser_use.agents import (
extract_oauth_list,
get_retry_queue_status,
start_retry_queue_processor,
test_oauth_login,
)
from lib.utils import is_html_url, notify_backend, read_lines_between
from lib.utils.progress import (
current_progress,
load_progress,
progress_file,
save_progress,
)
from lib.utils import notify_backend, read_lines_between, is_html_url
from lib.browser_use.agents import extract_oauth_list, test_oauth_login, start_retry_queue_processor, get_retry_queue_status
from lib.utils.progress import current_progress, load_progress, save_progress, progress_file
async def scan_one_url(url: str, skip_html_check: bool = False):
"""URL 스캔 통합 함수: OAuth 리스트 추출 → 개별 OAuth 로그인 시도"""
@ -45,9 +56,7 @@ async def scan_one_url(url: str, skip_html_check: bool = False):
# 2단계: 각 OAuth 제공자별로 개별 로그인 시도
for i, oauth_entry in enumerate(oauth_entries):
print(
f"\n🔄 OAuth 로그인 테스트 {i+1}/{len(oauth_entries)}: {oauth_entry}"
)
print(f"\n🔄 OAuth 로그인 테스트 {i+1}/{len(oauth_entries)}: {oauth_entry}")
# OAuth 간 대기 시간
if i > 0:
@ -82,11 +91,13 @@ async def main_loop(
prev_progress = load_progress()
if prev_progress and prev_progress.get("start_line") == start_line:
print("📋 이전 진행 상황을 발견했습니다:")
print(f" - 이전 완료: {prev_progress['current_index']}/{prev_progress['total']}")
print(
f" - 이전 완료: {prev_progress['current_index']}/{prev_progress['total']}"
)
print(f" - 마지막 처리: {prev_progress.get('current_url', 'N/A')}")
resume = input("이어서 진행하시겠습니까? (y/n): ").lower().strip()
if resume == 'y':
if resume == "y":
start_index = prev_progress.get("current_index", 0)
current_progress["current_index"] = start_index
# 전체 개수는 원래 목록 길이로 유지
@ -99,8 +110,12 @@ async def main_loop(
current_url_index = current_progress["current_index"]
current_progress["current_url"] = url
print(f"\n🔄 Processing {current_url_index + 1}/{current_progress['total']}: {url}")
print(f"📍 {os.path.basename(filepath)}{start_line + current_url_index}번째 줄")
print(
f"\n🔄 Processing {current_url_index + 1}/{current_progress['total']}: {url}"
)
print(
f"📍 {os.path.basename(filepath)}{start_line + current_url_index}번째 줄"
)
# 재시도 큐 상태 확인 및 출력
retry_status = await get_retry_queue_status()
@ -116,7 +131,9 @@ async def main_loop(
# 스캔 완료 후 재시도 큐 상태 확인
retry_status_after = await get_retry_queue_status()
if retry_status_after["queue_length"] > 0:
print(f"📊 스캔 완료 후 재시도 큐 상태: {retry_status_after['queue_length']}개 작업 대기 중")
print(
f"📊 스캔 완료 후 재시도 큐 상태: {retry_status_after['queue_length']}개 작업 대기 중"
)
# 다음 URL로 진행
current_progress["current_index"] = current_url_index + 1
@ -128,7 +145,9 @@ async def main_loop(
retry_status = await get_retry_queue_status()
if retry_status["queue_length"] == 0:
break
print(f"⏳ 재시도 큐에 {retry_status['queue_length']}개 작업 남음. 30초 후 다시 확인...")
print(
f"⏳ 재시도 큐에 {retry_status['queue_length']}개 작업 남음. 30초 후 다시 확인..."
)
await asyncio.sleep(30)
print(f"\n🎉 모든 스캔이 완료되었습니다! ({total_count}개 URL)")

View file

@ -3,6 +3,7 @@
import json
import os
def GetSensitiveData():
"""
Reads sensitive data from a .sensitive.json file in the current directory.
@ -10,12 +11,12 @@ def GetSensitiveData():
Returns:
dict: A dictionary containing the sensitive data.
"""
file_path = os.path.join(os.getcwd(), '.sensitive.json')
file_path = os.path.join(os.getcwd(), ".sensitive.json")
if not os.path.exists(file_path):
return None
with open(file_path, 'r') as file:
with open(file_path, "r") as file:
sensitive_data = json.load(file)
return sensitive_data

View file

@ -1,3 +1,2 @@
from lib.llm.create import *
from lib.llm.prompt import *

View file

@ -4,6 +4,7 @@ from dotenv import load_dotenv
# 환경 변수 로드 (GOOGLE_API_KEY 필요)
load_dotenv(override=True)
def CreateChatGoogle(model: str):
"""Browser Use용 Google 모델 생성"""
if model == "fallback":

View file

@ -1,6 +1,8 @@
from typing import Union, Type
from typing import Type, Union
from pydantic import BaseModel
def get_prompt(type: str) -> tuple[str, Type[BaseModel]] | str:
"""
Prompt를 반환합니다.
@ -9,29 +11,36 @@ def get_prompt(type: str) -> tuple[str, Type[BaseModel]] | str:
:return: 해당하는 프롬프트 문자열 또는 (프롬프트, 모델) 튜플
"""
if type.lower() == "auth":
from lib.llm.prompt._get_oauth import prompt, model
from lib.llm.prompt._get_oauth import model, prompt
return prompt, model
elif type.lower() in ["google", "google account"]:
from lib.llm.prompt.google import prompt, model
from lib.llm.prompt.google import model, prompt
return prompt, model
elif type.lower() in ["microsoft", "microsoftonline"]:
from lib.llm.prompt.microsoft import prompt, model
from lib.llm.prompt.microsoft import model, prompt
return prompt, model
elif type.lower() in ["meta", "facebook"]:
from lib.llm.prompt.facebook import prompt, model
from lib.llm.prompt.facebook import model, prompt
return prompt, model
elif type.lower() in ["apple"]:
from lib.llm.prompt.apple import prompt, model
from lib.llm.prompt.apple import model, prompt
return prompt, model
elif type.lower() in ["github"]:
from lib.llm.prompt.github import prompt, model
from lib.llm.prompt.github import model, prompt
return prompt, model
else:
from lib.llm.prompt._fallback import model, prompt
return prompt, model

View file

@ -1,2 +1,2 @@
from lib.llm.prompt._fallback.prompt import prompt
from lib.llm.prompt._fallback.model import model
from lib.llm.prompt._fallback.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,2 +1,2 @@
from lib.llm.prompt._get_oauth.prompt import prompt
from lib.llm.prompt._get_oauth.model import model
from lib.llm.prompt._get_oauth.prompt import prompt

View file

@ -1,5 +1,6 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
url: str | None = None

View file

@ -1,2 +1,2 @@
from lib.llm.prompt.apple.prompt import prompt
from lib.llm.prompt.apple.model import model
from lib.llm.prompt.apple.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "apple_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "apple_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,2 +1,2 @@
from lib.llm.prompt.facebook.prompt import prompt
from lib.llm.prompt.facebook.model import model
from lib.llm.prompt.facebook.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "facebook_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "facebook_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,4 +1,5 @@
import os
# Extended planner prompt
prompt = f"""
You are a web automation agent.

View file

@ -1,2 +1,2 @@
from lib.llm.prompt.github.prompt import prompt
from lib.llm.prompt.github.model import model
from lib.llm.prompt.github.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "github_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "github_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,2 +1,2 @@
from lib.llm.prompt.google.prompt import prompt
from lib.llm.prompt.google.model import model
from lib.llm.prompt.google.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "google_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "google_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,2 +1,2 @@
from lib.llm.prompt.microsoft.prompt import prompt
from lib.llm.prompt.microsoft.model import model
from lib.llm.prompt.microsoft.prompt import prompt

View file

@ -1,6 +1,9 @@
from pydantic import BaseModel
class model(BaseModel):
msg: str | None = None
status: str | None = None # "success", "mfa_required", "microsoft_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
status: str | None = (
None # "success", "mfa_required", "microsoft_blocked", "sso_not_found", "login_page_not_found", "invalid_credentials"
)
final_url: str | None = None

View file

@ -1,7 +1,7 @@
# export from show_info
from lib.utils.agent_info import *
from lib.utils.data import *
from lib.utils.config import *
from lib.utils.data import *
from lib.utils.parsing.is_html import *
from lib.utils.parsing.read_txt import *

View file

@ -1,13 +1,17 @@
import os
from dotenv import load_dotenv
from lib.utils.config import (
BACKEND_URL,
GOOGLE_API_KEY,
GOOGLE_MODEL,
GOOGLE_PLANNER_MODEL,
)
import os
from dotenv import load_dotenv
load_dotenv(override=True)
def show_info():
print("🔧 환경 설정:")
print(browser_use_version())
@ -40,7 +44,10 @@ def browser_use_version():
def env_cheker():
if GOOGLE_API_KEY is None:
raise ValueError("GOOGLE_API_KEY 환경변수가 설정되지 않았습니다.")
if GOOGLE_PLANNER_MODEL != None and (not os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LOGIN") or not os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LIST")):
if GOOGLE_PLANNER_MODEL != None and (
not os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LOGIN")
or not os.getenv("ENABLE_PLANNER_MODEL_OAUTH_LIST")
):
print(
"⚠️ GOOGLE_PLANNER_MODEL이 설정되어 있지만, ENABLE_PLANNER_MODEL_OAUTH_LOGIN 또는 ENABLE_PLANNER_MODEL_OAUTH_LIST가 활성화되지 않았습니다."
)
@ -50,9 +57,8 @@ def env_cheker():
print(
"‼️ 하지만 현재 Planner 모델을 사용하는 것이 권장되지 않습니다. 이 기능은 오작동을 일으킬 수 있습니다."
)
print(
"⚠️ 이 경고는 1초동안 정지합니다."
)
print("⚠️ 이 경고는 1초동안 정지합니다.")
# 이 경고는 1초동안 sleep
import time
time.sleep(1)

View file

@ -1,5 +1,7 @@
import os
from dotenv import load_dotenv
load_dotenv(verbose=True, override=True)
BACKEND_URL = os.getenv("BACKEND_URL", "http://localhost:11081")

View file

@ -2,6 +2,7 @@ import requests
from lib.utils.config import BACKEND_URL
def notify_backend(target_url):
# Backend에 스캔 시작을 알림
try:

View file

@ -1,9 +1,10 @@
from pathlib import Path
from datetime import datetime
from pathlib import Path
# 미리 정해진 파일 경로
FILE_PATH = Path("data/log.txt")
def logger(msg: str) -> None:
try:
"""

View file

@ -1,5 +1,6 @@
import requests
def is_html_url(url: str, timeout: float = 10.0) -> bool:
"""
주어진 URL에 HEAD 요청을 보내고, 응답 헤더의 Content-Type이 HTML인지 확인합니다.
@ -16,17 +17,18 @@ def is_html_url(url: str, timeout: float = 10.0) -> bool:
if not response.ok:
return False
content_type = response.headers.get('Content-Type', '')
content_type = response.headers.get("Content-Type", "")
# Content-Type에 'text/html'이 포함되어 있으면 HTML로 간주
return content_type.lower().startswith('text/html')
return content_type.lower().startswith("text/html")
except requests.RequestException:
return False
if __name__ == '__main__':
if __name__ == "__main__":
test_urls = [
'https://www.example.com',
'https://api.github.com', # JSON API라서 HTML이 아닐 확률이 높음
'https://raw.githubusercontent.com' # 텍스트 파일 등 다양한 타입
"https://www.example.com",
"https://api.github.com", # JSON API라서 HTML이 아닐 확률이 높음
"https://raw.githubusercontent.com", # 텍스트 파일 등 다양한 타입
]
for url in test_urls:

View file

@ -20,10 +20,12 @@ def read_lines_between(filepath: str, start_line: int, end_line: int) -> list[st
"""
if start_line < 1 or end_line < start_line:
raise ValueError("start_line은 1 이상이어야 하며, end_line은 start_line 이상이어야 합니다.")
raise ValueError(
"start_line은 1 이상이어야 하며, end_line은 start_line 이상이어야 합니다."
)
selected_lines: list[str] = []
with open(filepath, 'r', encoding='utf-8') as f:
with open(filepath, "r", encoding="utf-8") as f:
for idx, line in enumerate(f, start=1):
if idx < start_line:
# 아직 읽기 시작 전
@ -32,5 +34,5 @@ def read_lines_between(filepath: str, start_line: int, end_line: int) -> list[st
# 읽을 범위를 벗어났으므로 중단
break
# 줄 끝의 개행 문자를 제거하고 리스트에 추가
selected_lines.append(line.rstrip('\n'))
selected_lines.append(line.rstrip("\n"))
return selected_lines

View file

@ -7,12 +7,14 @@ from pathlib import Path
current_progress = {"current_index": 0, "total": 0, "current_url": "", "start_line": 0}
progress_file = Path("data/scan_progress.json")
def save_progress():
"""현재 진행 상황을 파일에 저장"""
progress_file.parent.mkdir(parents=True, exist_ok=True)
with open(progress_file, "w", encoding="utf-8") as f:
json.dump(current_progress, f, ensure_ascii=False, indent=2)
def load_progress():
"""이전 진행 상황을 파일에서 불러오기"""
if os.path.exists(progress_file):
@ -23,6 +25,7 @@ def load_progress():
return None
return None
def signal_handler(signum, frame):
"""Ctrl+C 시그널 핸들러"""
print("\n" + "=" * 60)
@ -34,7 +37,7 @@ def signal_handler(signum, frame):
print(
f" - domains.txt의 {current_progress['start_line'] + current_progress['current_index']}번째 줄"
)
if current_progress['total'] > 0:
if current_progress["total"] > 0:
print(
f" - 진행률: {current_progress['current_index']}/{current_progress['total']} ({current_progress['current_index']/current_progress['total']*100:.1f}%)"
)
@ -43,6 +46,7 @@ def signal_handler(signum, frame):
print(f"💾 진행 상황이 {progress_file}에 저장되었습니다.")
exit(0)
def setup_signal_handler():
"""시그널 핸들러 등록"""
signal.signal(signal.SIGINT, signal_handler)

View file

@ -1,32 +1,35 @@
import asyncio
import argparse
import asyncio
import os
import sys
from dotenv import load_dotenv
from lib.utils import env_cheker
from lib.browser_use.scanner import main_loop
from lib.utils.progress import setup_signal_handler, progress_file
from lib.utils import env_cheker
from lib.utils.progress import progress_file, setup_signal_handler
# .env 파일 로드
load_dotenv(verbose=True, override=True)
# 환경 변수 체크
env_cheker()
def setup_environment():
"""환경 변수 로드 및 관련 라이브러리를 초기화합니다."""
# .env 파일 로드
load_dotenv(verbose=True, override=True)
# Laminar 초기화 (선택적)
if os.getenv("LMNR_PROJECT_API_KEY"):
# 환경 변수 체크
env_cheker()
# Laminar 초기화 (선택적)
if os.getenv("LMNR_PROJECT_API_KEY"):
try:
from lmnr import Laminar
Laminar.initialize(project_api_key=os.getenv("LMNR_PROJECT_API_KEY"))
except ImportError:
print("⚠️ Laminar 라이브러리가 설치되지 않았습니다. 관련 기능이 비활성화됩니다.")
def main():
"""애플리케이션 메인 진입점"""
# 시그널 핸들러 설정
setup_signal_handler()
def parse_arguments():
"""커맨드 라인 인자를 파싱합니다."""
parser = argparse.ArgumentParser(
prog="domain_scanner",
description="도메인 목록 파일에서 지정한 줄 범위를 읽어 SSO 스캔을 수행합니다.",
@ -48,11 +51,18 @@ def main():
parser.add_argument(
"-skh",
"--skip-html-check",
action='store_true', # 플래그 형식으로 변경
action="store_true",
help="HTML 페이지 체크를 건너뛰고 모든 URL을 스캔합니다.",
)
args = parser.parse_args()
return parser.parse_args()
def main():
"""애플리케이션 메인 진입점"""
setup_environment()
setup_signal_handler()
args = parse_arguments()
try:
asyncio.run(
@ -64,15 +74,16 @@ def main():
)
)
except KeyboardInterrupt:
# signal_handler가 처리하므로 여기서는 별도 처리 불필요
pass
print("\n프로그램이 사용자에 의해 중단되었습니다.")
sys.exit(1)
finally:
# 정상 종료 시 진행 상황 파일 삭제
if os.path.exists(progress_file):
try:
os.remove(progress_file)
print("진행 상황 파일이 삭제되었습니다.")
except OSError as e:
print(f"오류: 진행 상황 파일을 삭제하지 못했습니다. {e}")
print(f"오류: 진행 상황 파일을 삭제하지 못했습니다. {e}", file=sys.stderr)
if __name__ == "__main__":

72
uv.lock generated
View file

@ -94,6 +94,26 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/50/cd/30110dc0ffcf3b131156077b90e9f60ed75711223f306da4db08eff8403b/beautifulsoup4-4.13.4-py3-none-any.whl", hash = "sha256:9bbbb14bfde9d79f38b8cd5f8c7c85f4b8f2523190ebed90e950a8dea4cb1c4b", size = 187285, upload-time = "2025-04-15T17:05:12.221Z" },
]
[[package]]
name = "black"
version = "25.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click" },
{ name = "mypy-extensions" },
{ name = "packaging" },
{ name = "pathspec" },
{ name = "platformdirs" },
]
sdist = { url = "https://files.pythonhosted.org/packages/94/49/26a7b0f3f35da4b5a65f081943b7bcd22d7002f5f0fb8098ec1ff21cb6ef/black-25.1.0.tar.gz", hash = "sha256:33496d5cd1222ad73391352b4ae8da15253c5de89b93a80b3e2c8d9a19ec2666", size = 649449, upload-time = "2025-01-29T04:15:40.373Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/98/87/0edf98916640efa5d0696e1abb0a8357b52e69e82322628f25bf14d263d1/black-25.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8f0b18a02996a836cc9c9c78e5babec10930862827b1b724ddfe98ccf2f2fe4f", size = 1650673, upload-time = "2025-01-29T05:37:20.574Z" },
{ url = "https://files.pythonhosted.org/packages/52/e5/f7bf17207cf87fa6e9b676576749c6b6ed0d70f179a3d812c997870291c3/black-25.1.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:afebb7098bfbc70037a053b91ae8437c3857482d3a690fefc03e9ff7aa9a5fd3", size = 1453190, upload-time = "2025-01-29T05:37:22.106Z" },
{ url = "https://files.pythonhosted.org/packages/e3/ee/adda3d46d4a9120772fae6de454c8495603c37c4c3b9c60f25b1ab6401fe/black-25.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:030b9759066a4ee5e5aca28c3c77f9c64789cdd4de8ac1df642c40b708be6171", size = 1782926, upload-time = "2025-01-29T04:18:58.564Z" },
{ url = "https://files.pythonhosted.org/packages/cc/64/94eb5f45dcb997d2082f097a3944cfc7fe87e071907f677e80788a2d7b7a/black-25.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:a22f402b410566e2d1c950708c77ebf5ebd5d0d88a6a2e87c86d9fb48afa0d18", size = 1442613, upload-time = "2025-01-29T04:19:27.63Z" },
{ url = "https://files.pythonhosted.org/packages/09/71/54e999902aed72baf26bca0d50781b01838251a462612966e9fc4891eadd/black-25.1.0-py3-none-any.whl", hash = "sha256:95e8176dae143ba9097f351d174fdaf0ccd29efb414b362ae3fd72bf0f710717", size = 207646, upload-time = "2025-01-29T04:15:38.082Z" },
]
[[package]]
name = "browser-use"
version = "0.3.3"
@ -140,16 +160,20 @@ name = "browser-use-test"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
{ name = "black" },
{ name = "browser-use", extra = ["memory"] },
{ name = "chardet" },
{ name = "isort" },
{ name = "lmnr", extra = ["all"] },
{ name = "patchright" },
]
[package.metadata]
requires-dist = [
{ name = "black", specifier = ">=25.1.0" },
{ name = "browser-use", extras = ["memory"], specifier = "==0.3.3" },
{ name = "chardet", specifier = ">=5.2.0" },
{ name = "isort", specifier = ">=6.0.1" },
{ name = "lmnr", extras = ["all"], specifier = ">=0.6.10" },
{ name = "patchright", specifier = ">=1.52.5" },
]
@ -241,6 +265,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/20/94/c5790835a017658cbfabd07f3bfb549140c3ac458cfc196323996b10095a/charset_normalizer-3.4.2-py3-none-any.whl", hash = "sha256:7f56930ab0abd1c45cd15be65cc741c28b1c9a34876ce8c17a2fa107810c0af0", size = 52626, upload-time = "2025-05-02T08:34:40.053Z" },
]
[[package]]
name = "click"
version = "8.2.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/60/6c/8ca2efa64cf75a977a0d7fac081354553ebe483345c734fb6b6515d96bbc/click-8.2.1.tar.gz", hash = "sha256:27c491cc05d968d271d5a1db13e3b5a184636d9d930f148c50b038f0d0646202", size = 286342, upload-time = "2025-05-20T23:19:49.832Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/85/32/10bb5764d90a8eee674e9dc6f4db6a0ab47c8c4d0d83c27f7c39ac415a4d/click-8.2.1-py3-none-any.whl", hash = "sha256:61a3265b914e850b85317d0b3109c7f8cd35a670f963866005d6ef1d5175a12b", size = 102215, upload-time = "2025-05-20T23:19:47.796Z" },
]
[[package]]
name = "colorama"
version = "0.4.6"
@ -582,6 +618,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/59/91/aa6bde563e0085a02a435aa99b49ef75b0a4b062635e606dab23ce18d720/inflection-0.5.1-py2.py3-none-any.whl", hash = "sha256:f38b2b640938a4f35ade69ac3d053042959b62a0f1076a5bbaa1b9526605a8a2", size = 9454, upload-time = "2020-08-22T08:16:27.816Z" },
]
[[package]]
name = "isort"
version = "6.0.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/b8/21/1e2a441f74a653a144224d7d21afe8f4169e6c7c20bb13aec3a2dc3815e0/isort-6.0.1.tar.gz", hash = "sha256:1cb5df28dfbc742e490c5e41bad6da41b805b0a8be7bc93cd0fb2a8a890ac450", size = 821955, upload-time = "2025-02-26T21:13:16.955Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c1/11/114d0a5f4dabbdcedc1125dee0888514c3c3b16d3e9facad87ed96fad97c/isort-6.0.1-py3-none-any.whl", hash = "sha256:2dc5d7f65c9678d94c88dfc29161a320eec67328bc97aad576874cb4be1e9615", size = 94186, upload-time = "2025-02-26T21:13:14.911Z" },
]
[[package]]
name = "jinja2"
version = "3.1.6"
@ -758,6 +803,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c", size = 536198, upload-time = "2023-03-07T16:47:09.197Z" },
]
[[package]]
name = "mypy-extensions"
version = "1.1.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a2/6e/371856a3fb9d31ca8dac321cda606860fa4548858c0cc45d9d1d4ca2628b/mypy_extensions-1.1.0.tar.gz", hash = "sha256:52e68efc3284861e772bbcd66823fde5ae21fd2fdb51c62a211403730b916558", size = 6343, upload-time = "2025-04-22T14:54:24.164Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/79/7b/2c79738432f5c924bef5071f933bcc9efd0473bac3b4aa584a6f7c1c8df8/mypy_extensions-1.1.0-py3-none-any.whl", hash = "sha256:1be4cccdb0f2482337c4743e60421de3a356cd97508abadd57d47403e94f5505", size = 4963, upload-time = "2025-04-22T14:54:22.983Z" },
]
[[package]]
name = "networkx"
version = "3.4.2"
@ -1522,6 +1576,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/9e/6a/b8f0fd8c513667b59b85d3969a5af65a5f2410ff41aff04d597ed5b872d0/patchright-1.52.5-py3-none-win_arm64.whl", hash = "sha256:f406911b5b3b21d70e3b1d1a2780b732575e31f2b012483622cc764166a31d78", size = 30670751, upload-time = "2025-06-05T21:54:23.336Z" },
]
[[package]]
name = "pathspec"
version = "0.12.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ca/bc/f35b8446f4531a7cb215605d100cd88b7ac6f44ab3fc94870c120ab3adbf/pathspec-0.12.1.tar.gz", hash = "sha256:a482d51503a1ab33b1c67a6c3813a26953dbdc71c31dacaef9a838c4e29f5712", size = 51043, upload-time = "2023-12-10T22:30:45Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl", hash = "sha256:a0d503e138a4c123b27490a4f7beda6a01c6f288df0e4a8b79c7eb0dc7b4cc08", size = 31191, upload-time = "2023-12-10T22:30:43.14Z" },
]
[[package]]
name = "pillow"
version = "11.2.1"
@ -1552,6 +1615,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/67/32/32dc030cfa91ca0fc52baebbba2e009bb001122a1daa8b6a79ad830b38d3/pillow-11.2.1-cp313-cp313t-win_arm64.whl", hash = "sha256:225c832a13326e34f212d2072982bb1adb210e0cc0b153e688743018c94a2681", size = 2417234, upload-time = "2025-04-12T17:49:08.399Z" },
]
[[package]]
name = "platformdirs"
version = "4.3.8"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/fe/8b/3c73abc9c759ecd3f1f7ceff6685840859e8070c4d947c93fae71f6a0bf2/platformdirs-4.3.8.tar.gz", hash = "sha256:3d512d96e16bcb959a814c9f348431070822a6496326a4be0911c40b5a74c2bc", size = 21362, upload-time = "2025-05-07T22:47:42.121Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fe/39/979e8e21520d4e47a0bbe349e2713c0aac6f3d853d0e5b34d76206c439aa/platformdirs-4.3.8-py3-none-any.whl", hash = "sha256:ff7059bb7eb1179e2685604f4aaf157cfd9535242bd23742eadc3c13542139b4", size = 18567, upload-time = "2025-05-07T22:47:40.376Z" },
]
[[package]]
name = "playwright"
version = "1.52.0"