LLM 时代构建 API 和 CLI
Composing APIs and CLIs in the LLM era

原始链接: https://walters.app/blog/composing-apis-clis

## LLM 代理设计中的转变:Shells 取代工具 (2026年初) LLM 代理开发中的一个关键争论在于如何最好地为模型配备工具。 传统上,代理使用定义明确、细粒度的工具,但一种日益增长的趋势是利用 `exec_bash` 调用来利用现有的命令行界面 (CLI)。 这种方法受 Unix shell 命令组合的强大功能启发,具有显著的优势——通过管道降低 token 成本,可脚本化,以及对人类和机器都熟悉的界面。 作者通过访问 Google Docs & Groups 演示了这一点。 他们没有构建自定义代码,而是使用 `Restish` 来解释 Google 的 OpenAPI 规范,有效地将 API 文档转换为可执行命令。 OAuth 身份验证也使用 `oauth2c` 进行了简化,从而用简洁的 shell 脚本取代了数百行 Python 代码。 对于*没有*现成 API 的服务(如 Google Groups),作者成功地通过捕获浏览器中的网络流量、清理数据并提示 LLM 生成客户端来逆向工程 API。 这突显了 LLM 在缺乏官方 API 时弥合差距的潜力。 最终,作者提倡优先考虑 CLI 组合,认为它可以减少维护、最大限度地减少错误,并使个人能够使用最少的定制代码构建强大的代理。

## LLM、API 和 CLI 的兴起 最近一篇 Hacker News 上的帖子引发了关于使用大型语言模型 (LLM) 构建 API 和 CLI 的讨论。核心思想是利用 LLM 直接使用类似 `curl` 的工具,基于 API 规范 (OAS),从而绕过传统模型调用过程 (MCP) 或像 "restish" 这样的封装器。 用户报告说,提供 LLM OAS 链接和授权信息可以快速集成 API 功能。虽然有效,但安全处理身份验证仍然是一个挑战,因为 LLM 本身并不具备像 MCP 那样管理密钥的能力。 许多评论者认为,CLI 特别适合由 LLM 驱动的副驾驶,因为它们在训练数据中普遍存在,具有通用的帮助系统,并且其顺序性与令牌生成对齐。对话表明,LLM 将直接与命令行工具交互,可能会取代现有的基于技能的方法。进一步的开发重点是“钩子”,以改善代理工具的使用并防止出现问题循环。
相关文章

原文

It’s early 2026. Industry practice is divided on how to structure tool descriptions within the context window of an LLM. One strategy is to provide top-level tools that perform fine grained actions (e.g. list pull requests in a GitHub repo). Another increasingly popular strategy is to eschew new tools per se and to simply inform the model of useful shell commands it may invoke. In both cases reusable skills can be defined that give the model tips on how to perform useful work with the tools; the main difference is whether the model emits a direct tool call or instead an exec_bash call containing a reference to CLIs.

To me it is clear that the latter represents an innovation on the former. The best feature of the unix shell is command composition. Enabling the model to form pipelines of tool calls without re-prompting the model after each stage should present huge savings in token cost. The resulting pipelines can also be saved to scripts or be customized and interactively executed by human operators.

The command line is an interface compatible with humans and machines. If the model is adept at using it (it’s already text), why fall back to a machine-native protocol?

One good response is that MCP is an easy way to expose SaaS functionality to agents. In lieu of MCP, how can we achieve that? I’ll answer this question by providing two quite different examples from my recent work: giving an agent access to Google Docs and to Google Groups.

HTTP APIs

comments are harder). I did the obvious thing and spun up a Google Cloud project, pasted the API documentation into an LLM, and the result was a gdrive CLI with subcommands to list files and to export a particular one.

That worked. But as in the title of this post, the best code is no code. This script seemed entirely like boilerplate which shouldn’t have to exist. This would be true even if the script were to use an SDK rather than make HTTP calls directly. In reality, Google—and many SaaS vendors—already define a program which can be used to call all of their APIs. It’s their OpenAPI spec! The program just needs a sufficient interpreter.

I Googled around and was thrilled to discover Restish, a tool which nearly perfectly matches my philosophy. If OpenAPI specs are programs, Restish is their interpreter. Sample usage (cribbed from their docs):

# Register a new API called `cool-api`
$ restish api configure cool-api https://api.rest.sh/openapi.yaml

# This will show you all available commands from the API description.
$ restish cool-api --help

# Call an API operation (`list-images`)
$ restish -r cool-api -H 'Accept: application/json' list-images | jq '.[0].name'
"Dragonfly macro"

Restish even generates shell completions for the API endpoints (subcomands) and parameters (options/args)!

I have only two complaints:

  • Restish wants to handle API authorization for me (persisting e.g. OAuth tokens). I want it to just be an “interpreter for OpenAPI programs”. I’ll manage my own auth flows and inject my own tokens.
  • Executing commands against an api spec requires registering the spec with Restish ahead of time. See above—I want just an interpreter.

Both points imply that I’ll want a wrapper script around restish. The wrapper script will manage the second issue (it will create a temporary spec directory to satisfy Restish). The script will also perform my desired authorization flow and inject tokens into Restish ephemerally.

API Authorization

oauth2c is a command-line client for OAuth 2.0-compliant authorization servers. You input the aforementioned program (i.e. URL, grant type, ...) and it begins the ensuing flow (usually by opening your browser) then prints the resulting tokens to stdout.

With this missing piece, what was previously a couple-hundred lines of dense Python is now an order-of-magnitude smaller shell script which performs the logical equivalent of oauth2c "https://accounts.google.com/..." | restish google drive-files-list.

The final script can be found at bmwalters/gdrive-client. One more interesting trick in there is how fish completions are forwarded.

Detour: secure token storage for macOS CLI scripts

man page through Claude Opus 4.5 and the model made a very interesting discovery.

-T appPath	Specify an application which may access this item (multiple -T options are allowed)

It turns out that the keychain remembers which application stored the password—by default this is probably security itself or perhaps my shell; I haven’t checked—and that application is permitted to read back the password without user-interactive authorization. Providing the -T flag to security when creating the password allows overriding said program entry, and crucially the empty string may be used to remove the default application entry.

In other words this code:

security add-generic-password -T"" ...

will prevent security find-generic-password from simply returning the secret, even when invoked immediately after secret creation. In practice, attempts to read the secret will prompt me for my device passcode, which is definitely good enough for my use case.

Putting it all together, I had a CLI that, when invoked, would try to use the stored access token with Restish (no passcode prompt needed). If the access token was invalid, it would invoke oauth2c to refresh the token and retry. This would prompt me for my devcie passcode. If that also failed, it would invoke the Authorization Code flow using oauth2c which would seamlessly open my browser and retry the command on success.

All with only shell pipelines, no bespoke code. Vastly reduced surface area for future maintenance and for bugs to hide in.

Adversarial interoperability

pollenpub to serve as a Q&A knowledge base while developing this blog site. However my research turned up no such API from Google.

I love using LLMs to solve this class of problem. My workflow is as follows:

  1. Open a fresh private browser (to capture any authorization flow, if needed).
  2. Open Devtools > Network and filter to HTML, XHR, WS, Other.
  3. Perform the actions that I would like to automate, i.e. load the Google Group site, navigate to the next page, and read a particular conversation.
  4. Firefox Devtools > Network > right click > “Save All As HAR”.
  5. Run the file through cloudflare/har-sanitizer
  6. Prompt an LLM with: “in this directory there is a large HAR file captured while I did actions xyz; please create a Python client for this API”.
  7. Edit the generated file to add a meaningful User-Agent string with a backlink.

I’ve repeated this workflow about three times and I have near-term plans for a couple more.

Note that I haven’t tried combining the above two workflows yet; I haven’t asked the model to produce an OpenAPI spec + reverse-engineered OAuth parameters for any site yet, but that’s a logical next step.

Conclusion