展示HN:Webctl – 基于CLI而非MCP的代理浏览器自动化工具
Show HN: Webctl – Browser automation for agents based on CLI instead of MCP

原始链接: https://github.com/cosinusalpha/webctl

## webctl:AI 代理与人类的 CLI 浏览器自动化工具 webctl 是一款用于控制浏览器的命令行工具,专为 AI 代理和直接人工使用而设计。与传统的浏览器自动化不同,**webctl 优先考虑对上下文的用户控制**,允许在数据被处理*之前*进行过滤——这对于大型语言模型至关重要。 主要功能包括:使用 ARIA 角色进行语义定位(例如 `role=button name~="Submit"`),通过 Unix 工具(grep、jq)进行过滤,以及一个无状态 CLI 与管理浏览器的持久守护进程通信。这实现了缓存、脚本编写、并行化以及与 AI 代理轻松集成的能力。 **核心命令:** `start`(启动浏览器)、`navigate`(导航)、`snapshot`(捕获页面元素)、`click`(点击)、`type`(输入)、`wait`(等待)和 `stop`(停止)。`snapshot` 可以被限制和过滤以提高效率。Cookie 在会话之间持久保存,配置文件允许独立的浏览上下文。 **对于 AI 代理:** webctl 提供清晰、一致的接口。配置文件 (`CLAUDE.md`、`GEMINI.md`) 简化了集成。该工具设计为易于通过类似“使用 webctl 浏览…”的提示进行指令。 **安装:** `pip install webctl`(需要 Python 3.11+),然后运行 `webctl setup` 下载 Chromium。更多详细信息和源代码可在 [GitHub](https://github.com/cosinusalpha/webctl) 上找到。

## Webctl:通过命令行进行浏览器自动化 Webctl 是一款新型浏览器自动化工具,作为 Playwright 等框架的简化替代方案,旨在弥合 `curl` 等简单工具与复杂浏览器自动化之间的差距。由 cosinusalpha 创建,它解决了 AI 浏览器工具中常见的“上下文转储”问题——不必要的数据被发送到 LLM。 Webctl 采用 Unix 风格的 CLI,允许用户在浏览器输出到达 AI 代理*之前*进行过滤,从而提高 token 效率。它具有守护进程架构,用于持久化浏览器状态(cookie、会话),并使用带有 ARIA 角色的语义定位,以实现更可靠的元素选择。 本质上,它被描述为“终端中的 Playwright”,专为构建需要与 Web 应用程序交互的本地 AI 代理而设计,例如管理内联网上的任务。虽然状态持久性仍处于实验阶段,但开发者认为这种方法提供了一种更可控和高效的解决方案。它与 Vercel 的 Agent Browser 相似,但初始帖子中未详细说明具体差异。
相关文章

原文

Browser automation for AI agents and humans, built on the command line.

webctl start
webctl navigate "https://google.com"
webctl type 'role=combobox name~="Search"' "best restaurants nearby" --submit
webctl snapshot --interactive-only --limit 20
webctl stop --daemon

MCP browser tools have a fundamental problem: the server controls what enters your context. With Playwright MCP, every response includes the full accessibility tree plus console messages (default: "info" level). After a few page queries, your context is full.

CLI flips this around: you control what enters context.

# Filter before context
webctl snapshot --interactive-only --limit 30      # Only buttons, links, inputs
webctl snapshot --within "role=main"               # Skip nav, footer, ads

# Pipe through Unix tools
webctl snapshot | grep -i "submit"                 # Find specific elements
webctl --format jsonl snapshot | jq '.data.role'   # Extract with jq
webctl snapshot | head -50                         # Truncate output

Beyond filtering, CLI gives you:

Capability CLI MCP
Filter output Built-in flags + grep/jq/head Server decides
Debug Run same command as agent Opaque
Cache webctl snapshot > cache.txt Every call hits server
Script Save to .sh, version control Ephemeral
Timeout timeout 30 webctl ... Internal only
Parallelize parallel, xargs, & Server-dependent
Human takeover Same commands Different interface
pip install webctl      # Requires Python 3.11+
webctl setup            # Downloads Chromium (~150MB)

Verify it works:

webctl start
webctl navigate "https://example.com"
webctl snapshot --interactive-only
webctl stop --daemon
Install from source
git clone https://github.com/cosinusalpha/webctl
cd webctl
uv sync && uv run webctl setup
Linux system dependencies
playwright install-deps chromium
# Or manually: sudo apt-get install libnss3 libatk1.0-0 libatk-bridge2.0-0 ...

Browser stays open across commands. Cookies persist to disk.

webctl start                    # Visible browser
webctl start --mode unattended  # Headless
webctl -s work start            # Named profile (separate cookies)
webctl stop --daemon            # Shutdown everything

Semantic targeting based on ARIA roles - stable across CSS refactors:

role=button                     # Any button
role=button name="Submit"       # Exact match
role=button name~="Submit"      # Contains (preferred)
role=textbox name~="Email"      # Input field
role=link name~="Sign in"       # Link
webctl snapshot                                    # Human-readable
webctl --quiet navigate "..."                      # Suppress events
webctl --result-only --format jsonl navigate "..." # Pure JSON, final result only

webctl navigate "https://..."   # Go to URL
webctl back                     # History back
webctl forward                  # History forward
webctl reload                   # Refresh
webctl snapshot                           # Full a11y tree
webctl snapshot --interactive-only        # Buttons, links, inputs only
webctl snapshot --limit 30                # Cap output
webctl snapshot --within "role=main"      # Scope to container
webctl snapshot --roles "button,link"     # Filter by role
webctl query "role=button name~=Submit"   # Debug query, get suggestions
webctl screenshot --path shot.png         # Screenshot
webctl click 'role=button name~="Submit"'
webctl type 'role=textbox name~="Email"' "[email protected]"
webctl type 'role=textbox name~="Search"' "query" --submit  # Type + Enter
webctl select 'role=combobox name~="Country"' --label "Germany"
webctl check 'role=checkbox name~="Remember"'
webctl press Enter
webctl scroll down
webctl upload 'role=button name~="Upload"' --file ./doc.pdf
webctl wait network-idle
webctl wait 'exists:role=button name~="Continue"'
webctl wait 'visible:role=dialog'
webctl wait 'hidden:role=progressbar'
webctl wait 'url-contains:"/dashboard"'
webctl status                   # Current state (includes console error counts)
webctl save                     # Persist cookies now
webctl sessions                 # List profiles
webctl pages                    # List tabs
webctl focus p2                 # Switch tab
webctl close-page p1            # Close tab
webctl console                  # Get last 100 logs
webctl console --count          # Just counts by level (LLM-friendly)
webctl console --level error    # Filter to errors only
webctl console --follow         # Stream new logs continuously
webctl console -n 50 -l warn    # Last 50 warnings
webctl setup                    # Install browser
webctl doctor                   # Diagnose installation
webctl init                     # Add to agent configs (CLAUDE.md, etc.)
webctl config show              # Show settings
webctl config set idle_timeout 1800

Tell your AI agent to use webctl. The easiest way:

webctl init                     # Creates CLAUDE.md, GEMINI.md, etc.
webctl init --agents claude     # Only specific agents

Or manually add to your agent's config:

For web browsing, use webctl CLI. Run `webctl agent-prompt` for instructions.

This section is designed to be read by AI agents directly.

Control a browser via CLI. Start with webctl start, end with webctl stop --daemon.

Commands:

webctl start                              # Open browser
webctl navigate "URL"                     # Go to URL
webctl snapshot --interactive-only        # See clickable elements
webctl click 'role=button name~="Text"'   # Click element
webctl type 'role=textbox name~="Field"' "text"           # Type
webctl type 'role=textbox name~="Field"' "text" --submit  # Type + Enter
webctl select 'role=combobox' --label "Option"            # Dropdown
webctl wait 'exists:role=button name~="..."'              # Wait for element
webctl stop --daemon                      # Close browser

Query syntax:

  • role=button - By ARIA role (button, link, textbox, combobox, checkbox)
  • name~="partial" - Partial match (preferred, more robust)
  • name="exact" - Exact match

Example - Login:

webctl start
webctl navigate "https://site.com/login"
webctl type 'role=textbox name~="Email"' "[email protected]"
webctl type 'role=textbox name~="Password"' "secret" --submit
webctl wait 'url-contains:"/dashboard"'

Tips:

  • Use --interactive-only to reduce output (only buttons, links, inputs)
  • Use name~= for partial matching (handles minor text changes)
  • Use webctl query "..." if element not found - shows suggestions
  • Use --quiet to suppress event output
  • Sessions persist cookies - login once, stay logged in
  • Check webctl status for console error counts before investigating
  • Use webctl console --count for log summary, --level error for details

┌─────────────┐     TCP/IPC      ┌─────────────┐
│   CLI       │ ◄──────────────► │   Daemon    │
│  (webctl)   │    JSON-RPC      │  (browser)  │
└─────────────┘                  └─────────────┘
      │                                 │
      ▼                                 ▼
  Agent/User                      Chromium + Playwright
  • CLI: Stateless, sends commands to daemon
  • Daemon: Manages browser, auto-starts on first command
  • Profiles: ~/.local/share/webctl/profiles/
  • Config: ~/.config/webctl/config.json

MIT

联系我们 contact @ memedata.com