拦截代理，用于对已访问页面进行语义搜索。

拦截代理，用于对已访问页面进行语义搜索。
Show HN: Intercepting proxy for semantic search over visited pages

原始链接: https://github.com/mlang/llm-embed-proxy

## llm-embed-proxy：网页嵌入与搜索 `llm-embed-proxy` 是一个 `llm` 插件，通过嵌入访问过的网页来增强网页浏览的相似性搜索功能。它作为一个代理，将网页内容（不包括localhost）转换为干净的Markdown格式，然后使用LLM创建嵌入向量。用户可以通过一个简单的Flask UI来搜索这些嵌入的网页。安装需要 `llm` (通过 `pipx` 安装) 以及插件本身 (`llm install git+https://github.com/mlang/llm-embed-proxy`)。你可以使用OpenAI的API或本地嵌入模型（需要 `llm-sentence-transformers` 和模型注册 – 例如 `llm sentence-transformers register Qwen/Qwen3-Embedding-0.6B`）。启动后 (`llm embed-proxy`)，配置你的浏览器/系统使用 `localhost:8080` 作为代理。注意：内部使用了mitmproxy，可能需要安装CA证书以避免浏览器警告。访问搜索界面：`http://localhost:8080/`。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Show HN: 用于对浏览过的页面进行语义搜索的拦截代理 (github.com/mlang) 11 分，lynx97 发表于 7 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 jauntywundrkind 发表于 6 小时前 [–] 我喜欢这个想法，即对你浏览过的内容进行语义记录。这长期以来一直是一个目标/梦想，而这是一种有趣的巧妙方法来实现它，一条通往这个之前尚未开放的未来的道路。我还没有运行这个程序/不知道用户界面是什么样子（提示），但看起来这似乎不是为了记录保存？更多的是为了找到与你现在看到的内容相似的东西，尽管它确实有一个缓存视图？回复考虑申请 YC 2025 秋季批次！申请截止日期为 8 月 4 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式搜索：

原文

A proxy that embeds every web page you visit and lets you run similarity searches.

Each successful HTTP GET 200 response (except for localhost) is re-fetched from pure.md to obtain clean Markdown.
The cleaned text is embedded through llm.
A minimal Flask UI provides search and cached-page views.

This is not a stand-alone program. It is a plugin for llm. If you are not using llm yet, install it with pipx first.

Now you can install this plugin:

llm install git+https://github.com/mlang/llm-embed-proxy

To be able to run a local embedding model, you need to install the llm-sentence-transformers plugin and register/download a model. This step is optional if you happen to have an OpenAI API key and want to use their embedding endpoint.

llm install llm-sentence-transformers
llm sentence-transformers register Qwen/Qwen3-Embedding-0.6B

llm embed-proxy --model sentence-transformers/Qwen/Qwen3-Embedding-0.6B

Point your browser/system proxy to localhost:8080 and visit http://localhost:8080/ to search.

llm-embed-proxy uses mitmproxy under the hood. If you haven't used mitmproxy in the past, the first time you launch llm embed-proxy mitmproxy will generate a CA certificate in ~/.mitmproxy/. To avoid certificate warnings, you can add the mitmproxy CA certificate to your system.

Here is how it would work on a Debian system:

sudo cp ~/.mitmproxy/mitmproxy-ca-cert.pem /usr/local/share/ca-certificates/mitmproxy-ca-cert.crt
sudo /sbin/update-ca-certificates

拦截代理，用于对已访问页面进行语义搜索。 Show HN: Intercepting proxy for semantic search over visited pages

拦截代理，用于对已访问页面进行语义搜索。
Show HN: Intercepting proxy for semantic search over visited pages