Add comprehensive documentation for Browser Use features

- Introduced custom output format instructions with example code. - Detailed connection methods for launching and connecting to browsers, including local and remote options. - Provided guidelines for handling sensitive data securely, including best practices and examples. - Documented supported LangChain chat models with setup instructions and environment variable requirements. - Added instructions for customizing the system prompt to control agent behavior.
2026-06-04 06:21:52 +09:00 · 2025-06-21 16:18:16 +09:00 · 2025-06-21 16:18:16 +09:00 · 638a3d47ce
commit 638a3d47ce
parent 34ee66b4e8
10 changed files with 3056 additions and 0 deletions
--- a/.github/instructions/browser-use.instructions.md
+++ b/.github/instructions/browser-use.instructions.md
@ -0,0 +1,82 @@
+---
+applyTo: '**'
+---
+## 🧠 General Guidelines for Contributing to `browser-use`
+
+**Browser-Use** is an AI agent that autonomously interacts with the web. It takes a user-defined task, navigates web pages using Chromium via Playwright, processes HTML, and repeatedly queries a language model (like `gpt-4o`) to decide the next action—until the task is completed.
+
+### 🗂️ File Documentation
+
+When you create a **new file**:
+
+* **For humans**: At the top of the file, include a docstring in natural language explaining:
+
+  * What this file does.
+  * How it fits into the browser-use system.
+  * If it introduces a new abstraction or replaces an old one.
+* **For LLMs/AI**: Include structured metadata using standardized comments such as:
+
+  ```python
+  # @file purpose: Defines <purpose>
+  ```
+
+---
+
+### 🧰 Development Rules
+
+* ✅ **Always use [`uv`](mdc:https:/github.com/astral-sh/uv) instead of `pip`**
+  For deterministic and fast dependency installs.
+
+```bash
+uv venv --python 3.11
+source .venv/bin/activate
+uv sync
+```
+
+* ✅ **Use real model names**
+  Do **not** replace `gpt-4o` with `gpt-4`. The model `gpt-4o` is a distinct release and supported.
+
+* ✅ **Type-safe coding**
+  Use **Pydantic v2 models** for all internal action schemas, task inputs/outputs, and controller I/O. This ensures robust validation and LLM-call integrity.
+
+---
+
+## ⚙️ Adding New Actions
+
+To add a new action that your browser agent can execute:
+
+```python
+from playwright.async_api import Page
+from browser_use.core.controller import Controller, ActionResult
+
+controller = Controller()
+
+@controller.registry.action("Search the web for a specific query")
+async def search_web(query: str, page: Page):
+    # Implement your logic here, e.g., query a search engine and return results
+    result = ...
+    return ActionResult(extracted_content=result, include_in_memory=True)
+```
+
+### Notes:
+
+* Use descriptive names and docstrings for each action.
+* Prefer returning `ActionResult` with structured content to help the agent reason better.
+
+---
+
+## 🧠 Creating and Running an Agent
+
+To define a task and run a browser-use agent:
+
+```python
+from browser_use import Agent
+from langchain.chat_models import ChatOpenAI
+
+task = "Find the CEO of OpenAI and return their name"
+model = ChatOpenAI(model="gpt-4o")
+
+agent = Agent(task=task, llm=model, controller=controller)
+
+history = await agent.run()
+```