展示HN：Webctl – 基于CLI而非MCP的代理浏览器自动化工具

展示HN：Webctl – 基于CLI而非MCP的代理浏览器自动化工具
Show HN: Webctl – Browser automation for agents based on CLI instead of MCP

原始链接: https://github.com/cosinusalpha/webctl

## webctl：AI 代理与人类的 CLI 浏览器自动化工具 webctl 是一款用于控制浏览器的命令行工具，专为 AI 代理和直接人工使用而设计。与传统的浏览器自动化不同，**webctl 优先考虑对上下文的用户控制**，允许在数据被处理*之前*进行过滤——这对于大型语言模型至关重要。主要功能包括：使用 ARIA 角色进行语义定位（例如 `role=button name~="Submit"`），通过 Unix 工具（grep、jq）进行过滤，以及一个无状态 CLI 与管理浏览器的持久守护进程通信。这实现了缓存、脚本编写、并行化以及与 AI 代理轻松集成的能力。 **核心命令：** `start`（启动浏览器）、`navigate`（导航）、`snapshot`（捕获页面元素）、`click`（点击）、`type`（输入）、`wait`（等待）和 `stop`（停止）。`snapshot` 可以被限制和过滤以提高效率。Cookie 在会话之间持久保存，配置文件允许独立的浏览上下文。 **对于 AI 代理：** webctl 提供清晰、一致的接口。配置文件 (`CLAUDE.md`、`GEMINI.md`) 简化了集成。该工具设计为易于通过类似“使用 webctl 浏览…”的提示进行指令。 **安装：** `pip install webctl`（需要 Python 3.11+），然后运行 `webctl setup` 下载 Chromium。更多详细信息和源代码可在 [GitHub](https://github.com/cosinusalpha/webctl) 上找到。

## Webctl：通过命令行进行浏览器自动化 Webctl 是一款新型浏览器自动化工具，作为 Playwright 等框架的简化替代方案，旨在弥合 `curl` 等简单工具与复杂浏览器自动化之间的差距。由 cosinusalpha 创建，它解决了 AI 浏览器工具中常见的“上下文转储”问题——不必要的数据被发送到 LLM。 Webctl 采用 Unix 风格的 CLI，允许用户在浏览器输出到达 AI 代理*之前*进行过滤，从而提高 token 效率。它具有守护进程架构，用于持久化浏览器状态（cookie、会话），并使用带有 ARIA 角色的语义定位，以实现更可靠的元素选择。本质上，它被描述为“终端中的 Playwright”，专为构建需要与 Web 应用程序交互的本地 AI 代理而设计，例如管理内联网上的任务。虽然状态持久性仍处于实验阶段，但开发者认为这种方法提供了一种更可控和高效的解决方案。它与 Vercel 的 Agent Browser 相似，但初始帖子中未详细说明具体差异。

原文

Browser automation for AI agents and humans, built on the command line.

webctl start
webctl navigate "https://google.com"
webctl type 'role=combobox name~="Search"' "best restaurants nearby" --submit
webctl snapshot --interactive-only --limit 20
webctl stop --daemon

MCP browser tools have a fundamental problem: the server controls what enters your context. With Playwright MCP, every response includes the full accessibility tree plus console messages (default: "info" level). After a few page queries, your context is full.

CLI flips this around: you control what enters context.

# Filter before context
webctl snapshot --interactive-only --limit 30      # Only buttons, links, inputs
webctl snapshot --within "role=main"               # Skip nav, footer, ads

# Pipe through Unix tools
webctl snapshot | grep -i "submit"                 # Find specific elements
webctl --format jsonl snapshot | jq '.data.role'   # Extract with jq
webctl snapshot | head -50                         # Truncate output

Beyond filtering, CLI gives you:

Capability	CLI	MCP
Filter output	Built-in flags + grep/jq/head	Server decides
Debug	Run same command as agent	Opaque
Cache	`webctl snapshot > cache.txt`	Every call hits server
Script	Save to .sh, version control	Ephemeral
Timeout	`timeout 30 webctl ...`	Internal only
Parallelize	`parallel`, `xargs`, `&`	Server-dependent
Human takeover	Same commands	Different interface

pip install webctl      # Requires Python 3.11+
webctl setup            # Downloads Chromium (~150MB)

Verify it works:

webctl start
webctl navigate "https://example.com"
webctl snapshot --interactive-only
webctl stop --daemon

Install from source

git clone https://github.com/cosinusalpha/webctl
cd webctl
uv sync && uv run webctl setup

Linux system dependencies

playwright install-deps chromium
# Or manually: sudo apt-get install libnss3 libatk1.0-0 libatk-bridge2.0-0 ...

Browser stays open across commands. Cookies persist to disk.

webctl start                    # Visible browser
webctl start --mode unattended  # Headless
webctl -s work start            # Named profile (separate cookies)
webctl stop --daemon            # Shutdown everything

Semantic targeting based on ARIA roles - stable across CSS refactors:

role=button                     # Any button
role=button name="Submit"       # Exact match
role=button name~="Submit"      # Contains (preferred)
role=textbox name~="Email"      # Input field
role=link name~="Sign in"       # Link

webctl snapshot                                    # Human-readable
webctl --quiet navigate "..."                      # Suppress events
webctl --result-only --format jsonl navigate "..." # Pure JSON, final result only

webctl navigate "https://..."   # Go to URL
webctl back                     # History back
webctl forward                  # History forward
webctl reload                   # Refresh

webctl snapshot                           # Full a11y tree
webctl snapshot --interactive-only        # Buttons, links, inputs only
webctl snapshot --limit 30                # Cap output
webctl snapshot --within "role=main"      # Scope to container
webctl snapshot --roles "button,link"     # Filter by role
webctl query "role=button name~=Submit"   # Debug query, get suggestions
webctl screenshot --path shot.png         # Screenshot

webctl click 'role=button name~="Submit"'
webctl type 'role=textbox name~="Email"' "[email protected]"
webctl type 'role=textbox name~="Search"' "query" --submit  # Type + Enter
webctl select 'role=combobox name~="Country"' --label "Germany"
webctl check 'role=checkbox name~="Remember"'
webctl press Enter
webctl scroll down
webctl upload 'role=button name~="Upload"' --file ./doc.pdf

webctl wait network-idle
webctl wait 'exists:role=button name~="Continue"'
webctl wait 'visible:role=dialog'
webctl wait 'hidden:role=progressbar'
webctl wait 'url-contains:"/dashboard"'

webctl status                   # Current state (includes console error counts)
webctl save                     # Persist cookies now
webctl sessions                 # List profiles
webctl pages                    # List tabs
webctl focus p2                 # Switch tab
webctl close-page p1            # Close tab

webctl console                  # Get last 100 logs
webctl console --count          # Just counts by level (LLM-friendly)
webctl console --level error    # Filter to errors only
webctl console --follow         # Stream new logs continuously
webctl console -n 50 -l warn    # Last 50 warnings

webctl setup                    # Install browser
webctl doctor                   # Diagnose installation
webctl init                     # Add to agent configs (CLAUDE.md, etc.)
webctl config show              # Show settings
webctl config set idle_timeout 1800

Tell your AI agent to use webctl. The easiest way:

webctl init                     # Creates CLAUDE.md, GEMINI.md, etc.
webctl init --agents claude     # Only specific agents

Or manually add to your agent's config:

For web browsing, use webctl CLI. Run `webctl agent-prompt` for instructions.

This section is designed to be read by AI agents directly.

Control a browser via CLI. Start with webctl start, end with webctl stop --daemon.

Commands:

webctl start                              # Open browser
webctl navigate "URL"                     # Go to URL
webctl snapshot --interactive-only        # See clickable elements
webctl click 'role=button name~="Text"'   # Click element
webctl type 'role=textbox name~="Field"' "text"           # Type
webctl type 'role=textbox name~="Field"' "text" --submit  # Type + Enter
webctl select 'role=combobox' --label "Option"            # Dropdown
webctl wait 'exists:role=button name~="..."'              # Wait for element
webctl stop --daemon                      # Close browser

Query syntax:

role=button - By ARIA role (button, link, textbox, combobox, checkbox)
name~="partial" - Partial match (preferred, more robust)
name="exact" - Exact match

Example - Login:

webctl start
webctl navigate "https://site.com/login"
webctl type 'role=textbox name~="Email"' "[email protected]"
webctl type 'role=textbox name~="Password"' "secret" --submit
webctl wait 'url-contains:"/dashboard"'

Tips:

Use --interactive-only to reduce output (only buttons, links, inputs)
Use name~= for partial matching (handles minor text changes)
Use webctl query "..." if element not found - shows suggestions
Use --quiet to suppress event output
Sessions persist cookies - login once, stay logged in
Check webctl status for console error counts before investigating
Use webctl console --count for log summary, --level error for details

┌─────────────┐     TCP/IPC      ┌─────────────┐
│   CLI       │ ◄──────────────► │   Daemon    │
│  (webctl)   │    JSON-RPC      │  (browser)  │
└─────────────┘                  └─────────────┘
      │                                 │
      ▼                                 ▼
  Agent/User                      Chromium + Playwright

CLI: Stateless, sends commands to daemon
Daemon: Manages browser, auto-starts on first command
Profiles: ~/.local/share/webctl/profiles/
Config: ~/.config/webctl/config.json

MIT

展示HN：Webctl – 基于CLI而非MCP的代理浏览器自动化工具 Show HN: Webctl – Browser automation for agents based on CLI instead of MCP

展示HN：Webctl – 基于CLI而非MCP的代理浏览器自动化工具
Show HN: Webctl – Browser automation for agents based on CLI instead of MCP