展示 HN:在 Chrome 中本地运行的设备端浏览器代理 (Qwen)
Show HN: On-device browser agent (Qwen) running locally in Chrome

原始链接: https://github.com/RunanywhereAI/on-device-browser-agent

## 本地浏览器:设备端网页自动化 Local Browser 是一个 Chrome 扩展程序,它能够实现由人工智能驱动的网页自动化,**完全在您的设备上运行**,优先考虑隐私和离线功能。它利用 WebLLM 和 WebGPU 进行本地 LLM 推理,无需云 API 或密钥。 该扩展程序使用多代理系统——一个 **Planner(规划者)** 用于战略性任务分解,以及一个 **Navigator(导航者)** 用于战术性动作执行——来浏览、点击、输入和提取网页数据。用户通过 React 弹出窗口输入任务(例如,“在维基百科上搜索‘WebGPU’…”)。 **主要特点:** 完整的隐私性、~1GB 初始模型下载后的离线运行(默认使用 Qwen2.5-1.5B-Instruct,并提供 Phi-3.5-mini 和 Llama-3.2 选项),以及开发过程中的自动重建。 **要求:** Chrome 124+、Node.js 18+、兼容 WebGPU 的 GPU,以及足够的磁盘空间。这是一个概念验证,专注于单个标签页内的基于文本的 DOM 分析,可能难以处理复杂任务或被阻止的页面。

一个由RunanywhereAI(github.com/runanywhereai)开发的Chrome扩展程序,允许用户在本地设备上运行浏览器代理。该代理由阿里巴巴的Qwen模型和Web GPU Liquid LFM提供支持,可以在浏览器内执行任务——例如打开YouTube上的All in Podcast。 该项目目前支持移动SDK,并正在开发Web SDK支持。用户对小型3B Qwen模型的性能印象深刻,并对通过Ollama等工具本地托管更大的本地模型(如gpt-oss-20b)的潜力感到好奇。 讨论还涉及潜在的安全问题,特别是与基于浏览器的加密货币挖矿类似的恶意使用可能性,可能导致用于运行大型语言模型的分布式僵尸网络。
相关文章

原文

A Chrome extension that uses WebLLM to run AI-powered web automation entirely on-device. No cloud APIs, no API keys, fully private.

demo.mp4
  • On-Device AI: Uses WebLLM with WebGPU acceleration for local LLM inference
  • Multi-Agent System: Planner + Navigator agents for intelligent task execution
  • Browser Automation: Navigate, click, type, extract data from web pages
  • Privacy-First: All AI runs locally, no data leaves your device
  • Offline Support: Works offline after initial model download
  • Chrome 124+ (required for WebGPU in service workers)
  • Node.js 18+ and npm
  • GPU with WebGPU support (most modern GPUs work)
  1. Clone and install dependencies:

    cd local-browser
    npm install
  2. Build the extension:

  3. Load in Chrome:

    • Open chrome://extensions
    • Enable "Developer mode" (top right)
    • Click "Load unpacked"
    • Select the dist folder from this project
  4. First run:

    • Click the extension icon in your toolbar
    • The first run will download the AI model (~1GB)
    • This is cached for future use
  1. Navigate to any webpage
  2. Click the Local Browser extension icon
  3. Type a task like:
    • "Search for 'WebGPU' on Wikipedia and extract the first paragraph"
    • "Go to example.com and tell me what's there"
    • "Find the search box and search for 'AI news'"
  4. Watch the AI execute the task step by step

This watches for changes and rebuilds automatically.

local-browser/
├── manifest.json           # Chrome extension manifest (MV3)
├── src/
│   ├── background/         # Service worker
│   │   ├── index.ts        # Entry point & message handling
│   │   ├── llm-engine.ts   # WebLLM wrapper
│   │   └── agents/         # AI agent system
│   │       ├── base-agent.ts
│   │       ├── planner-agent.ts
│   │       ├── navigator-agent.ts
│   │       └── executor.ts
│   ├── content/            # Content scripts
│   │   ├── dom-observer.ts # Page state extraction
│   │   └── action-executor.ts
│   ├── popup/              # React popup UI
│   │   ├── App.tsx
│   │   └── components/
│   └── shared/             # Shared types & constants
└── dist/                   # Build output
  1. User enters a task in the popup UI
  2. Planner Agent analyzes the task and creates a high-level strategy
  3. Navigator Agent examines the current page DOM and decides on the next action
  4. Content Script executes the action (click, type, extract, etc.)
  5. Loop continues until task is complete or fails

The extension uses a two-agent architecture inspired by Nanobrowser:

  • PlannerAgent: Strategic planning, creates step-by-step approach
  • NavigatorAgent: Tactical execution, chooses specific actions based on page state

Both agents output structured JSON that is parsed and executed.

Default model: Qwen2.5-1.5B-Instruct-q4f16_1-MLC (~1GB)

Alternative models (configured in src/shared/constants.ts):

  • Phi-3.5-mini-instruct-q4f16_1-MLC (~2GB, better reasoning)
  • Llama-3.2-1B-Instruct-q4f16_1-MLC (~0.7GB, smaller)
  • Update Chrome to version 124 or later
  • Check chrome://gpu to verify WebGPU status
  • Some GPUs may not support WebGPU
  • Ensure you have enough disk space (~2GB free)
  • Check browser console for errors
  • Try clearing the extension's storage and reloading
  • Some pages block content scripts (chrome://, extension pages)
  • Try on a regular webpage like wikipedia.org

Extension not working after Chrome update

  • Go to chrome://extensions
  • Click the reload button on the extension
  • POC Scope: This is a proof-of-concept, not production software
  • No Vision: Uses text-only DOM analysis (no screenshot understanding)
  • Single Tab: Only works with the currently active tab
  • Basic Actions: Supports navigate, click, type, extract, scroll, wait
  • Model Size: Smaller models may struggle with complex tasks
  • WebLLM: On-device LLM inference with WebGPU
  • React: Popup UI
  • TypeScript: Type-safe development
  • Vite + CRXJS: Chrome extension bundling
  • Chrome Extension Manifest V3: Modern extension architecture

This project is inspired by:

  • Nanobrowser - Multi-agent web automation (MIT License)
  • WebLLM - In-browser LLM inference (Apache-2.0 License)
Package License
@mlc-ai/web-llm Apache-2.0
React MIT
Vite MIT
@crxjs/vite-plugin MIT
TypeScript Apache-2.0

MIT License - See LICENSE file for details.

联系我们 contact @ memedata.com