我用来为 shell 脚本创建出色的用户体验的技术
Techniques I use to create a great user experience for shell scripts

该文本描述了用户开发的 shell 脚本,用于帮助评估“十亿行挑战”竞赛的过程。 该脚本提供全面的错误处理和输入验证、清晰多彩的输出、详细的进度报告、使用“set -e”和“set +e”的战略错误处理、特定于平台的适应、多次运行的带时间戳的文件输出,并支持用户运行 并分析基准。 作者使用了六种技术来提高脚本的可用性和安全性,包括清晰的错误消息、使用说明、用于更好可见性的 ANSI 颜色、执行前精确的命令打印、选择性错误退出和日志记录时间戳。 该代码可以在“十亿行挑战”存储库中找到。 用户邀请其他人分享他们关于 shell 脚本最佳实践的意见或建议。

"You should go and check out the shell script in the repo because it's very nice. It has colored output, it's super safe... it's really a masterclass in terms of writing shell scripts."

Thank you Gunnar Morling for the shout-out! 😊

In January 2024, I along with a few dozen performance-minded geeks got nerd-sniped into participating in Gunnar's One Billion Row Challenge 1️⃣🐝🏎️ .

Gunnar quickly became overwhelmed being the (unpaid) evaluator of a constant stampede of entries. I jumped in to help him automate the evaluation steps with a shell script and received the above testimonial from Gunnar at his Javazone talk (check it out to hear all about the performance techniques used in the challenge: "# 1BRC–Nerd Sniping the Java Community - Gunnar Morling").

Here are 6 techniques I used in the #1BRC shell script to make it robust, safe and fun for Gunnar to use:

1. Comprehensive Error Handling and Input Validation

I believe that clear error messages are crucial for a good user experience. That's why I implemented thorough error handling and input validation throughout the script. For example:

if [ -z "$1" ] 
    echo "Usage: evaluate.sh <fork name> (<fork name 2> ...)" 
    echo " for each fork, there must be a 'calculate_average_<fork name>.sh' script and an optional 'prepare_<fork name>.sh'." 
    exit 1 

This approach helps users quickly identify and resolve issues, saving them time and frustration.

2. Clear and Colorful Output

To make the script's output more readable and user-friendly, I used ANSI color codes to highlight important information, warnings, and errors. For instance:

echo -e "${BOLD_RED}ERROR${RESET}: ./calculate_average_$fork.sh does not exist." >&2

This visual distinction helps users quickly grasp the nature of each message.

3. Detailed Progress Reporting

I wanted users to understand exactly what the script was doing at each step. To achieve this, I implemented a function that prints each command before executing it:

function print_and_execute() {
  echo "+ $@" >&2 

This matches the output format of Bash's builtin set -x tracing, but gives the script author more granular control of what is printed.

This level of transparency not only keeps users informed but also aids in debugging if something goes wrong.

4. Strategic Error Handling with "set -e" and "set +e"

I wanted to ensure that the script would exit immediately if there was an error in the script itself, but also allow it to continue running if individual forks encountered issues. To achieve this, I used the Bash options "set -e" and "set +e" strategically throughout the script. Here's how I implemented this technique:

set -eo pipefail

for fork in "$@"; do
  set +e 

  print_and_execute source "./prepare_$fork.sh"

  print_and_execute $TIMEOUT ./test.sh $fork

set -e  

This approach gives the script author fine-grained control over which errors cause the script to exit and which can be handled in other ways.

5. Platform-Specific Adaptations

Knowing that users might run this script on different operating systems, I added logic to detect the OS and adjust the script's behavior accordingly:

if [ "$(uname -s)" == "Linux" ]; then 
  TIMEOUT="timeout -v $RUN_TIME_LIMIT" 
  if [ -x "$(command -v gtimeout)" ]; then 
    TIMEOUT="gtimeout -v $RUN_TIME_LIMIT"
    echo -e "${BOLD_YELLOW}WARNING${RESET} gtimeout not available, install with `brew install coreutils` or benchmark runs may take indefinitely long." 

This ensures a consistent experience across different environments. Many #1BRC participants were developing on MacOS while the evaluation machine ran linux for example.

6. Timestamped File Outputs for Multiple Runs

To support multiple benchmark runs without overwriting previous results, I implemented a system of timestamped file outputs. This allows users to run the script multiple times and keep a historical record of all results. Here's how I did it:

filetimestamp=$(date +"%Y%m%d%H%M%S")

HYPERFINE_OPTS="--warmup 0 --runs $RUNS --export-json $fork-$filetimestamp-timing.json --output ./$fork-$filetimestamp.out"

echo "Raw results saved to file(s):"
for fork in "$@"; do
  if [ -f "$fork-$filetimestamp-timing.json" ]; then
      cat $fork-$filetimestamp-timing.json >> $fork-$filetimestamp.out
      rm $fork-$filetimestamp-timing.json

  if [ -f "$fork-$filetimestamp.out" ]; then
    echo "  $fork-$filetimestamp.out"


Check out the complete benchmark evaluation script in the #1BRC repo: evaluate.sh

By implementing these techniques, I aimed to create a user-friendly, informative, and robust shell script that provides a great experience for users running and analyzing benchmarks. I hope these ideas inspire you to enhance the user experience in your own shell scripts!

I'd love to hear your thoughts on these shell scripting techniques or any other tips you have for creating great user experiences in scripts. Feel free to join the discussion:

