OpenBSD IO 基准测试:多少个作业才算值得?
OpenBSD IO Benchmarking: How Many Jobs Are Worth It?

原始链接: https://rsadowski.de/posts/2025/fio_simple_benckmarking/

使用fio基准测试工具对OpenBSD 7.7的I/O性能进行了评估,重点测试了Crucial P3 Plus SSD上的随机读写操作。测试探讨了不同作业数量(1-32)对吞吐量和延迟的影响。 最佳性能在6-8个并发作业时达到,此时IOPS和延迟达到平衡。超过这个数量,竞争加剧,性能提升受阻。读取吞吐量峰值达到1712 MiB/s(8个作业),写入吞吐量在1428 MiB/s左右达到平台期(18个作业)。随着作业数量的增加,延迟显著增加。 与Linux(内核6.12.21)的比较表明,Linux具有相当大的性能优势(Linux基准测试中未设置O_DIRECT)。ThinkPad X1 Carbon的性能明显低于用于初始测试的工作站。 研究表明,盲目增加作业数量并不能保证OpenBSD的I/O性能得到提升。仔细调整以找到“最佳点”至关重要,尤其是在考虑实际的多任务场景时,过多的线程数量会对系统响应速度产生负面影响。未来的测试将研究USB性能。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 OpenBSD IO 基准测试:多少个作业才值得?(rsadowski.de) 6 分,来自 PaulHoule,1 小时前 | 隐藏 | 往期 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:
相关文章

原文

This post explores these questions through detailed fio(1) benchmarking, looking at random reads, random writes, and latency — all running on a recent build of OpenBSD 7.7-current.

OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr  9 09:38:45 MDT 2025
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Test Setup #

  • Storage: 1TB Crucial P3 Plus SSD M.2 2280 PCIe 4.0 x4 3D-NAND QLC (CT1000P3PSSD8)
  • Tool: fio, installed via OpenBSD packages
  • Test File Size: 64 GB (to bypass RAM cache)
  • Block Size: 4 KiB
  • I/O Depth: 32
  • Job Counts Tested: 1 to 32
  • Runtimes: 30s runtime per test, 10s ramp-up

Results at a Glance #

Throughput vs. Job Count (Random Read) Throughput vs. Job Count (Random Write) Average Read Latency Average Write Latency

Summary Tables #

Random Read Performance #

numjobs Total BW (MiB/s) IOPS Avg Latency (µs) Notes
1 473.0 121,318 26.41 Baseline
2 808.3 207,333 30.80 Strong scaling
4 1,302.4 334,219 37.97 Excellent parallel read gain
8 1,712.0 439,728 58.22 Near peak performance
18 1,715.6 439,661 117.12 Saturation reached
32 1,618.6 415,603 180.56 Slight regression, high latency

Random Write Performance #

numjobs Total BW (MiB/s) IOPS Avg Latency (µs) Notes
1 265.6 68,223 58.64 Baseline
2 476.6 122,246 63.27 Good scaling
4 829.9 212,610 70.84 Steady performance increase
8 1,259.1 323,439 95.72 Approaching write peak
18 1,428.1 366,830 172.17 Plateau with rising latency
32 1,408.2 361,404 230.42 Regression due to contention

Latency Overview (Read vs Write) #

numjobs Read Latency (µs) Write Latency (µs) Notes
1 26.41 58.64 Minimal latency, sequential load
2 30.80 63.27 Low contention
4 37.97 70.84 Balanced performance
8 58.22 95.72 Sweet spot for throughput vs latency
18 117.12 172.17 Steep latency increase
32 180.56 230.42 High CPU & queue contention

Observations #

  1. OpenBSD scales I/O quite well up to a point — notably better than expected.
  2. Job count sweet spot: Between 6 and 8 jobs gave the best balance of IOPS and latency.
  3. Too many jobs degrade performance due to increased contention and CPU overhead.
  4. NVMe write performance is sensitive to concurrency on OpenBSD, more so than reads.

fio(1) Linux vs. OpenBSD #

Based on this test script I ran a simple benchmark between Linux version 6.12.21-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) and OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr 9 09:38:45 MDT 2025 [email protected] on ThinkPad X1 Carbon Gen 10 (14" Intel).

#!/bin/sh

# Common fio parameters
BLOCK_SIZE="4k"
IODEPTH="1"
RUNTIME="30"
SIZE="1G"
FILENAME="benchfile"

# Output directory
OUTPUT_DIR="./fio-results"
mkdir -p "$OUTPUT_DIR"

# numjobs to test
NUMJOBS_LIST="1 2 4 8 16 32"

# Test types
for RW in randread randwrite; do
  echo "Starting $RW tests..."
  for J in $NUMJOBS_LIST; do
    OUTFILE="$OUTPUT_DIR/${RW}-${J}.json"
    echo "Running $RW with numjobs=$J..."

    fio --name="test-$RW" \
        --filename="$FILENAME" \
        --rw="$RW" \
        --bs="$BLOCK_SIZE" \
        --iodepth="$IODEPTH" \
        --numjobs="$J" \
        --size="$SIZE" \
        --time_based \
        --runtime="$RUNTIME" \
        --group_reporting \
        --output-format=json \
        --output="$OUTFILE"
  done
done
numjobs OpenBSD BW (MiB/s) OpenBSD IOPS OpenBSD Avg Latency (µs) Linux BW (MiB/s) Linux IOPS Linux Avg Latency (µs)
2 5.78 1478 1343 13.23 3388 595
4 9.92 2538 1563 25.75 6592 605
8 13.32 3403 2316 40.56 10382 735
16 13.89 3549 4511 53.17 13613 1169
32 14.02 3579 8758 53.83 13780 2317
Throughput vs. Job Count (Random Write) Average Write Latency

Please mind the gap. The test was not even performed with direct=1 (see below for details) under Linux. There is a lot of potential for OpenBSD.

     direct=bool
            If value is true, use non-buffered I/O. This is usually O_DIRECT.
            Note that OpenBSD and ZFS on Solaris don't support direct I/O. On
            Windows the synchronous ioengines don't support direct I/O.
            Default: false.

What I also noticed is that the performance on the ThinkPad damatically worse than on the workstation.

Conclusion #

If you’re tuning I/O performance on OpenBSD — whether for databases, file servers, or personal use — don’t fall into the “more jobs = more performance” trap. Our tests clearly show:

  • 6 to 8 parallel jobs is optimal for both reads and writes.
  • Beyond that, latency suffers and throughput gains are negligible.

I wanted to get a quick but solid overview of how well OpenBSD handles disk I/O across increasing thread counts. The results matched my expectations — scaling up works to a point, but there are trade-offs. What these benchmarks don’t show is that once the number of threads grows too large, my KDE desktop becomes almost unusable. This is something to keep in mind for real-world multitasking scenarios.

An upcoming test I plan to run will involve RW performance on USB sticks, which could offer more insight as we stress more subsystems.

联系我们 contact @ memedata.com