LLM 时代构建 API 和 CLI

LLM 时代构建 API 和 CLI
Composing APIs and CLIs in the LLM era

原始链接: https://walters.app/blog/composing-apis-clis

## LLM 代理设计中的转变：Shells 取代工具 (2026年初) LLM 代理开发中的一个关键争论在于如何最好地为模型配备工具。传统上，代理使用定义明确、细粒度的工具，但一种日益增长的趋势是利用 `exec_bash` 调用来利用现有的命令行界面 (CLI)。这种方法受 Unix shell 命令组合的强大功能启发，具有显著的优势——通过管道降低 token 成本，可脚本化，以及对人类和机器都熟悉的界面。作者通过访问 Google Docs & Groups 演示了这一点。他们没有构建自定义代码，而是使用 `Restish` 来解释 Google 的 OpenAPI 规范，有效地将 API 文档转换为可执行命令。 OAuth 身份验证也使用 `oauth2c` 进行了简化，从而用简洁的 shell 脚本取代了数百行 Python 代码。对于*没有*现成 API 的服务（如 Google Groups），作者成功地通过捕获浏览器中的网络流量、清理数据并提示 LLM 生成客户端来逆向工程 API。这突显了 LLM 在缺乏官方 API 时弥合差距的潜力。最终，作者提倡优先考虑 CLI 组合，认为它可以减少维护、最大限度地减少错误，并使个人能够使用最少的定制代码构建强大的代理。

## LLM、API 和 CLI 的兴起最近的 Hacker News 讨论强调了一种增长趋势：利用命令行界面 (CLI) 和 `curl` 等工具与大型语言模型 (LLM) 结合，用于 API 交互，可能挑战模型调用程序 (MCP) 的主导地位。许多开发者发现，只需向 LLM 提供 API 规范（如 OpenAPI/OAS），并让它们利用现有的 shell 工具，就能取得成功。这种方法提供了可组合性和可发现性，能够快速创建和集成技能。然而，安全的凭证管理仍然是一个挑战。还有人正在构建像 `tpmjs.com` 这样的平台，自动生成文档并提供多种交互方式——MCP 服务器、CLI 和 REST 端点，从而为代理提供灵活性。一个关键的争论集中在 MCP 的结构化数据与解析 CLI 输出的灵活性之间的权衡。虽然 MCP 强制执行模式，但 CLI 由于其在训练数据中的普遍性以及顺序性，为 LLM 提供了一个更自然的接口。最终，这场讨论表明，正在向赋予代理更广泛的交互方式转变，超越单一标准化方法。

原文

It’s early 2026. Industry practice is divided on how to structure tool descriptions within the context window of an LLM. One strategy is to provide top-level tools that perform fine grained actions (e.g. list pull requests in a GitHub repo). Another increasingly popular strategy is to eschew new tools per se and to simply inform the model of useful shell commands it may invoke. In both cases reusable skills can be defined that give the model tips on how to perform useful work with the tools; the main difference is whether the model emits a direct tool call or instead an exec_bash call containing a reference to CLIs.

To me it is clear that the latter represents an innovation on the former. The best feature of the unix shell is command composition. Enabling the model to form pipelines of tool calls without re-prompting the model after each stage should present huge savings in token cost. The resulting pipelines can also be saved to scripts or be customized and interactively executed by human operators.

The command line is an interface compatible with humans and machines. If the model is adept at using it (it’s already text), why fall back to a machine-native protocol?

One good response is that MCP is an easy way to expose SaaS functionality to agents. In lieu of MCP, how can we achieve that? I’ll answer this question by providing two quite different examples from my recent work: giving an agent access to Google Docs and to Google Groups.

HTTP APIs

comments are harder). I did the obvious thing and spun up a Google Cloud project, pasted the API documentation into an LLM, and the result was a gdrive CLI with subcommands to list files and to export a particular one.

That worked. But as in the title of this post, the best code is no code. This script seemed entirely like boilerplate which shouldn’t have to exist. This would be true even if the script were to use an SDK rather than make HTTP calls directly. In reality, Google—and many SaaS vendors—already define a program which can be used to call all of their APIs. It’s their OpenAPI spec! The program just needs a sufficient interpreter.

I Googled around and was thrilled to discover Restish, a tool which nearly perfectly matches my philosophy. If OpenAPI specs are programs, Restish is their interpreter. Sample usage (cribbed from their docs):

              # Register a new API called `cool-api`
$ restish api configure cool-api https://api.rest.sh/openapi.yaml

# This will show you all available commands from the API description.
$ restish cool-api --help

# Call an API operation (`list-images`)
$ restish -r cool-api -H 'Accept: application/json' list-images | jq '.[0].name'
"Dragonfly macro"

            

Restish even generates shell completions for the API endpoints (subcomands) and parameters (options/args)!

I have only two complaints:

Restish wants to handle API authorization for me (persisting e.g. OAuth tokens). I want it to just be an “interpreter for OpenAPI programs”. I’ll manage my own auth flows and inject my own tokens.
Executing commands against an api spec requires registering the spec with Restish ahead of time. See above—I want just an interpreter.

Both points imply that I’ll want a wrapper script around restish. The wrapper script will manage the second issue (it will create a temporary spec directory to satisfy Restish). The script will also perform my desired authorization flow and inject tokens into Restish ephemerally.

API Authorization

Looking back at the omnibus script that I generated initially, it contained an OAuth 2.0 client to hit Google’s authorization flow, get tokens, and refresh them upon expiry. OAuth 2.0 is a standard. A particular set of parameters (Google’s OAuth URL, client id, client secret, grant type, scopes) could be thought of as a valid program in the OAuth 2.0 client language. I, again, just needed an interpreter. I, again, found one.

oauth2c is a command-line client for OAuth 2.0-compliant authorization servers. You input the aforementioned program (i.e. URL, grant type, ...) and it begins the ensuing flow (usually by opening your browser) then prints the resulting tokens to stdout.

With this missing piece, what was previously a couple-hundred lines of dense Python is now an order-of-magnitude smaller shell script which performs the logical equivalent of oauth2c "https://accounts.google.com/..." | restish google drive-files-list.

The final script can be found at bmwalters/gdrive-client. One more interesting trick in there is how fish completions are forwarded.

Detour: secure token storage for macOS CLI scripts

While I’m dispensing pro-tips, I should also note that I found a pretty cool and under-documented way to securely store data (like a long-lived refresh token) from a macOS shell script.

Let me introduce the problem. The results of Google’s OAuth flow are a short-lived access token (to hit APIs; valid for about an hour) and a long-lived refresh token (to mint new access tokens; valid for 6 months). I wasn’t comfortable with leaving that refresh token exposed on my machine. Services like the AWS CLI do indeed store plaintext credentials in ~, but those tend to expire much more frequently than 6 months.

I knew I wanted to reach for the macOS Keychain, and in particular some security level that would require biometrics / passcode when reading the refresh token.

macOS ships a handy keychain CLI named security. You can store secrets in Keychain with invocations like security add-generic-password -s google-api -a my-account -w $REFRESH_TOKEN. But biometrics are not trivially supported, and web search advised me to create a small Swift wrapper. After doing so, I learned that any kSecAttrAccessControl attribute that would lead to biometrics or device passcode would result in the binary requiring real signed entitlements through the Apple Developer Program. I was a bit stuck looking for a solution to what seemed to be a simple requirement.

man page through Claude Opus 4.5 and the model made a very interesting discovery.

              -T appPath	Specify an application which may access this item (multiple -T options are allowed)

            

It turns out that the keychain remembers which application stored the password—by default this is probably security itself or perhaps my shell; I haven’t checked—and that application is permitted to read back the password without user-interactive authorization. Providing the -T flag to security when creating the password allows overriding said program entry, and crucially the empty string may be used to remove the default application entry.

In other words this code:

              security add-generic-password -T"" ...

            

will prevent security find-generic-password from simply returning the secret, even when invoked immediately after secret creation. In practice, attempts to read the secret will prompt me for my device passcode, which is definitely good enough for my use case.

Putting it all together, I had a CLI that, when invoked, would try to use the stored access token with Restish (no passcode prompt needed). If the access token was invalid, it would invoke oauth2c to refresh the token and retry. This would prompt me for my devcie passcode. If that also failed, it would invoke the Authorization Code flow using oauth2c which would seamlessly open my browser and retry the command on success.

All with only shell pipelines, no bespoke code. Vastly reduced surface area for future maintenance and for bugs to hide in.

Adversarial interoperability

pollenpub to serve as a Q&A knowledge base while developing this blog site. However my research turned up no such API from Google.

I love using LLMs to solve this class of problem. My workflow is as follows:

Open a fresh private browser (to capture any authorization flow, if needed).
Open Devtools > Network and filter to HTML, XHR, WS, Other.
Perform the actions that I would like to automate, i.e. load the Google Group site, navigate to the next page, and read a particular conversation.
Firefox Devtools > Network > right click > “Save All As HAR”.
Run the file through cloudflare/har-sanitizer
Prompt an LLM with: “in this directory there is a large HAR file captured while I did actions xyz; please create a Python client for this API”.
Edit the generated file to add a meaningful User-Agent string with a backlink.

I’ve repeated this workflow about three times and I have near-term plans for a couple more.

Note that I haven’t tried combining the above two workflows yet; I haven’t asked the model to produce an OpenAPI spec + reverse-engineered OAuth parameters for any site yet, but that’s a logical next step.

Conclusion

There’s a lot of power in composing CLIs. You get human interaction and current-generation-LLM interaction for the price of one. And with some creativity, it’s often possible for one individual to maintain CLIs in place of an MCP server that has been developed for a given service, or even to do so before the comparable MCP server has been written.