在 shell 中（某种程度上）安全地处理密钥

在 shell 中（某种程度上）安全地处理密钥
Handling secrets (somewhat) securely in shells

原始链接: https://linus.schreibt.jetzt/posts/shell-secrets.html

## Shell 脚本中的密钥处理：总结在 shell 中处理敏感信息（如 API 令牌）时，标准做法可能会无意中导致泄露。直接将密钥包含在命令行中（例如 `curl -H 'Authorization: Bearer '`）会通过进程列表（Linux 上的 `/proc`）暴露它们。虽然 `hidepid` 可以缓解此问题，但更好的方法是完全避免在命令行中使用密钥。替代方案包括将 header 写入文件（具有限制性权限）或使用进程替换 (`curl -H @<(echo '...')`) 创建包含密钥的临时、非持久性“文件”。然而，shell 历史记录仍然存在风险。为了避免历史记录日志记录，使用 shell 变量代替直接赋值 (`token=...`)，或使用 `read -r -s` 安全地输入令牌。从剪贴板或密码管理器检索密钥也是可行的选择。避免将变量导出到环境变量中，因为这会将密钥传播到子进程，增加意外暴露的风险——尤其是在使用 Terraform 等工具时。虽然这些预防措施可能显得过分，但了解这些泄露途径对于安全处理敏感数据至关重要，并强调了使用更安全的编程语言的潜在好处。

这个Hacker News讨论集中在shell脚本中安全处理密钥上。原始帖子链接到一篇关于密钥管理的文章，引发了关于实用技术的对话。一个关键的建议是使用**命名FIFO（先进先出）**——本质上是命名管道——在命令之间传递敏感凭据。不再依赖标准输入/输出或将密钥存储在文件或shell历史记录中，一个命令将密钥输出到FIFO，另一个命令直接读取它。这种方法避免了密钥在磁盘或内存中残留。用户请求并获得了`mkfifo`手册页的链接，演示了这些命名管道的简单创建、使用和删除，以实现安全的数据传输。这是一种在shell操作期间最大限度地减少敏感信息暴露的技术。

原文

Sometimes, you need to deal with secrets in an interactive shell. Say, for example, you want to do things with the API of a GitLab instance for which you require authentication:

$ curl -fsSLH 'Authorization: Bearer 1s7zo2a-mzsLP6yAo2SM' https://gitlab.example.com/api/v4/projects

Oh no!

Process information leakage

By doing that, you’ve just made the token available to everything on your system that can see your processes! Process command lines are visible to all processes through /proc on most Linux distributions. This is how tools like ps and pgrep work on Linux – they walk through the per-process directories in /proc and read files describing the process, like stat or status and cmdline. You can use the hidepid mount option for the proc filesystem to prevent users from inspecting processes of other users.

macOS also hides other users’ processes by default.

However, many tools allow you to avoid passing secrets on the command line at all, and this is usually a better approach because you can apply it even on systems where you don’t have the necessary access to change mount options for /proc. In the curl example, you can write the header to a file and have curl read it from there instead of from the command line directly:

$ umask 077 # prevent the file from being readable for other users
# echo is a shell builtin, so it doesn't show up in the process table
$ echo 'Authorization: Bearer 1s7zo2a-mzsLP6yAo2SM' > auth-header
$ curl -fsSLH @auth-header https://gitlab.example.com/api/v4/projects

But Unix-like systems support fancy files that don’t behave like simple files, which lets you avoid actually storing the secret. Many shells support so-called “process substitution”, which launches a subshell and provides its output as a virtual file that doesn’t actually represent persistent storage, instead being a buffer which can only be read from once.

$ echo <(echo secret token)
/dev/fd/63
$ curl -fsSLH @<(echo 'Authorization: Bearer 1s7zo2a-mzsLP6yAo2SM') https://gitlab.example.com/api/v4/projects

This should prevent leakage of the token via the process table entirely.

So you’re done with your work and you exit your shell, and…

Shell history leakage

After going to all the effort of not putting the token in a file, your shell has helpfully saved then command you ran in your history file for all your processes to steal! One way to avoid this is to prevent the command from being written to history. Bash has a configuration variable named HISTCONTROL, which when set to include ignorespace prevents commands prefixed with whitespace from being saved in history. This is inconvenient though! History is really helpful for iterating on a command that you haven’t got quite right yet.

Fortunately, there’s another approach we can take here. Using a shell variable, we can avoid putting the secret in any shell commands directly:

$ token=1s7zo2a-mzsLP6yAo2SM
$ curl -fsSLH @<(echo "Authorization: Bearer $token") https://gitlab.example.com/api/v4/projects

But wait – the token= command ends up in the history again! Let’s try that again:

$ read -r token
1s7zo2a-mzsLP6yAo2SM
$ curl -fsSLH @<(echo "Authorization: Bearer $token") https://gitlab.example.com/api/v4/projects

Using read instead of setting the token directly in a command prevents the token from being saved in history, but keeps the command for it conveniently there. You can even add the -s option to the read command to prevent the token from being displayed on the screen as you type or paste it in.

Another approach that can be helpful here is getting the secret from the output of a command.

$ token=$(wl-paste || xsel -b || pbpaste) # get the token from the clipboard
$ token=$(rbw get gitlab-access-token) # get the token from a command-line password manager

This is more versatile and allows for some more convenient shortcuts than the read-based approach; it also works for secrets containing spaces or other characters that would cause read to split the input.

Why not environment?

You may have noticed that I set the variable using name=value rather than export name=value as is very commonly used. This is because export marks a variable as exported, i.e. stores it in the process environment, which is inherited by child processes. Putting secrets in environment variables is common but somewhat risky, because it makes the secrets available to all processes started from the environment – many of which have no business with them! This can result in the secrets being leaked, especially by accident, when programs dump all their environment variables into a log for debugging purposes or similar.

How much of a problem this is depends a lot on the use case. Programs that implement all of their functionality themselves are generally quite safe, since they don’t propagate their environment any further. However, other programs delegate functionality to other processes. This is good for many use cases! Git, for example, will invoke SSH when fetching from or pushing to a remote repository. You can use the PATH environment variable to influence where ssh is found, and replace it with a wrapper script that adds authentication behaviour or similar if all else fails.

Some software invokes more complex systems of processes, however. For instance, Terraform launches a process for each configured provider. Each of these inherits secrets from the process environment, providing more room for accidental leakage. That’s why I try to avoid using environment variables for secrets when possible, preferring shell variables that aren’t inherited by child processes and have to be passed into the commands that need them explicitly.

Conclusion

The approach I take here may be described as somewhat paranoid, and the cost of actually forming the habit of handling secrets this way may be greater than the risk of leaking secrets in the ways I describe. That’s up to individuals (or company security policy authors) to evaluate for themselves. There are also many bases that I don’t cover and routes through which sufficiently-smart malware could easily still obtain the secrets I’m working with. But I definitely feel a lot more comfortable when secrets are never written to persistent unencrypted files, and being aware of these leakage vectors is helpful to avoid that!

In my personal opinion, the pitfalls involved are also a testament to how we probably should be using less Bash and preferring languages where the obvious way to do something is safer. That’s a pretty low bar, given that not many languages that we use on a daily basis are shells that function on the principle of executing processes for everything; but I’m also intrigued by the potential that type systems have for “tagging” secrets and preventing their propagation beyond where they’re needed.