OpenBSD IO 基准测试：多少个作业才算值得？

OpenBSD IO 基准测试：多少个作业才算值得？
OpenBSD IO Benchmarking: How Many Jobs Are Worth It?

原始链接: https://rsadowski.de/posts/2025/fio_simple_benckmarking/

使用fio基准测试工具对OpenBSD 7.7的I/O性能进行了评估，重点测试了Crucial P3 Plus SSD上的随机读写操作。测试探讨了不同作业数量（1-32）对吞吐量和延迟的影响。最佳性能在6-8个并发作业时达到，此时IOPS和延迟达到平衡。超过这个数量，竞争加剧，性能提升受阻。读取吞吐量峰值达到1712 MiB/s（8个作业），写入吞吐量在1428 MiB/s左右达到平台期（18个作业）。随着作业数量的增加，延迟显著增加。与Linux（内核6.12.21）的比较表明，Linux具有相当大的性能优势（Linux基准测试中未设置O_DIRECT）。ThinkPad X1 Carbon的性能明显低于用于初始测试的工作站。研究表明，盲目增加作业数量并不能保证OpenBSD的I/O性能得到提升。仔细调整以找到“最佳点”至关重要，尤其是在考虑实际的多任务场景时，过多的线程数量会对系统响应速度产生负面影响。未来的测试将研究USB性能。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录 OpenBSD IO 基准测试：多少个作业才值得？（rsadowski.de） 6 分，来自 PaulHoule，1 小时前 | 隐藏 | 往期 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

This post explores these questions through detailed fio(1) benchmarking, looking at random reads, random writes, and latency — all running on a recent build of OpenBSD 7.7-current.

OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr  9 09:38:45 MDT 2025
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Test Setup #

Storage: 1TB Crucial P3 Plus SSD M.2 2280 PCIe 4.0 x4 3D-NAND QLC (CT1000P3PSSD8)
Tool: fio, installed via OpenBSD packages
Test File Size: 64 GB (to bypass RAM cache)
Block Size: 4 KiB
I/O Depth: 32
Job Counts Tested: 1 to 32
Runtimes: 30s runtime per test, 10s ramp-up

Results at a Glance #

Throughput vs. Job Count (Random Read) Throughput vs. Job Count (Random Write) Average Read Latency Average Write Latency

Summary Tables #

Random Read Performance #

numjobs	Total BW (MiB/s)	IOPS	Avg Latency (µs)	Notes
1	473.0	121,318	26.41	Baseline
2	808.3	207,333	30.80	Strong scaling
4	1,302.4	334,219	37.97	Excellent parallel read gain
8	1,712.0	439,728	58.22	Near peak performance
18	1,715.6	439,661	117.12	Saturation reached
32	1,618.6	415,603	180.56	Slight regression, high latency

Random Write Performance #

numjobs	Total BW (MiB/s)	IOPS	Avg Latency (µs)	Notes
1	265.6	68,223	58.64	Baseline
2	476.6	122,246	63.27	Good scaling
4	829.9	212,610	70.84	Steady performance increase
8	1,259.1	323,439	95.72	Approaching write peak
18	1,428.1	366,830	172.17	Plateau with rising latency
32	1,408.2	361,404	230.42	Regression due to contention

Latency Overview (Read vs Write) #

numjobs	Read Latency (µs)	Write Latency (µs)	Notes
1	26.41	58.64	Minimal latency, sequential load
2	30.80	63.27	Low contention
4	37.97	70.84	Balanced performance
8	58.22	95.72	Sweet spot for throughput vs latency
18	117.12	172.17	Steep latency increase
32	180.56	230.42	High CPU & queue contention

Observations #

OpenBSD scales I/O quite well up to a point — notably better than expected.
Job count sweet spot: Between 6 and 8 jobs gave the best balance of IOPS and latency.
Too many jobs degrade performance due to increased contention and CPU overhead.
NVMe write performance is sensitive to concurrency on OpenBSD, more so than reads.

fio(1) Linux vs. OpenBSD #

Based on this test script I ran a simple benchmark between Linux version 6.12.21-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) and OpenBSD 7.7 (GENERIC.MP) #624: Wed Apr 9 09:38:45 MDT 2025 [email protected] on ThinkPad X1 Carbon Gen 10 (14" Intel).

#!/bin/sh

# Common fio parameters
BLOCK_SIZE="4k"
IODEPTH="1"
RUNTIME="30"
SIZE="1G"
FILENAME="benchfile"

# Output directory
OUTPUT_DIR="./fio-results"
mkdir -p "$OUTPUT_DIR"

# numjobs to test
NUMJOBS_LIST="1 2 4 8 16 32"

# Test types
for RW in randread randwrite; do
  echo "Starting $RW tests..."
  for J in $NUMJOBS_LIST; do
    OUTFILE="$OUTPUT_DIR/${RW}-${J}.json"
    echo "Running $RW with numjobs=$J..."

    fio --name="test-$RW" \
        --filename="$FILENAME" \
        --rw="$RW" \
        --bs="$BLOCK_SIZE" \
        --iodepth="$IODEPTH" \
        --numjobs="$J" \
        --size="$SIZE" \
        --time_based \
        --runtime="$RUNTIME" \
        --group_reporting \
        --output-format=json \
        --output="$OUTFILE"
  done
done

numjobs	OpenBSD BW (MiB/s)	OpenBSD IOPS	OpenBSD Avg Latency (µs)	Linux BW (MiB/s)	Linux IOPS	Linux Avg Latency (µs)
2	5.78	1478	1343	13.23	3388	595
4	9.92	2538	1563	25.75	6592	605
8	13.32	3403	2316	40.56	10382	735
16	13.89	3549	4511	53.17	13613	1169
32	14.02	3579	8758	53.83	13780	2317

Throughput vs. Job Count (Random Write) Average Write Latency

Please mind the gap. The test was not even performed with direct=1 (see below for details) under Linux. There is a lot of potential for OpenBSD.

     direct=bool
            If value is true, use non-buffered I/O. This is usually O_DIRECT.
            Note that OpenBSD and ZFS on Solaris don't support direct I/O. On
            Windows the synchronous ioengines don't support direct I/O.
            Default: false.

What I also noticed is that the performance on the ThinkPad damatically worse than on the workstation.

Conclusion #

If you’re tuning I/O performance on OpenBSD — whether for databases, file servers, or personal use — don’t fall into the “more jobs = more performance” trap. Our tests clearly show:

6 to 8 parallel jobs is optimal for both reads and writes.
Beyond that, latency suffers and throughput gains are negligible.

I wanted to get a quick but solid overview of how well OpenBSD handles disk I/O across increasing thread counts. The results matched my expectations — scaling up works to a point, but there are trade-offs. What these benchmarks don’t show is that once the number of threads grows too large, my KDE desktop becomes almost unusable. This is something to keep in mind for real-world multitasking scenarios.

An upcoming test I plan to run will involve RW performance on USB sticks, which could offer more insight as we stress more subsystems.