Show HN: Dut – 快速 Linux 磁盘使用计算器
Show HN: Dut – a fast Linux disk usage calculator

原始链接: https://codeberg.org/201984/dut

名为“dut”的实用程序可以准确分析指定目录中的文件大小及其关联的硬链接。 其输出模仿 NCDU 的输出,具有适合在 Linux 终端中使用的纯 ASCII 格式。 用户可以通过命令行参数配置显示格式并调整显示文件的最大深度。 默认情况下,dut 列出当前工作目录下最大的目录及其各自的大小和由于硬链接而共享的大小。 当使用“-n 10”限制为十行时,dut 显示总空间消耗方面最大的条目。 通过“-d”参数调整深度限制使用户能够查看特定级别的嵌套。 例如,以下命令输出“System”和“WinSxS”文件夹的总大小,同时将输出限制为十行: ````bash $ dut -n 10 2.4G 0B //- Local # 输出本地文件夹及其子文件夹的总大小(深度 = 1) $ dut -n 10 964M 0B // System32 # 输出 System32 文件夹及其子文件夹的总大小(深度 = 1) ```` 当将深度限制为 1 时,仅显示直接子目录及其大小。 此外,该工具还可以识别硬链接与唯一数据占用的空间比例。 不同条目之间的共享数据显示在第一列中,而第二列显示报告的大小有多少可归因于分析的目录层次结构中的唯一数据。 要在克隆 GitHub 存储库后安装 dut,请执行“make install”。 或者,通过运行“make install PREFIX=”来设置自定义安装前缀,例如“~/.local/bin”。 详细的文档,包括构建说明,可通过“dut -h”获得。 基准测试表明,在大型目录层次结构中,dut 的执行速度明显快于其他流行实用程序(例如 pdu、dust 和 du),这使其成为任何 Linux 管理员工具包的宝贵补充。 有关详细的基准比较,请参阅提供的文本。

DUT 是一个用 C 语言编写的快速磁盘使用计算器,它的性能优于标准的“du”命令,并且在系统缓存优化时也优于其他命令。 该项目旨在超越 PDU 和 Dust 等流行工具。 DUT 提供具有链接大小的目录大小层次结构。 它的链接跟踪从 ncdu 中汲取了灵感,尽管它的显示格式被认为不太用户友好。 欢迎寻求改进建议的用户。 要安装,只需下载、编译 DUT 并将其放置在系统路径中的某个位置,例如 /usr/local/bin。 Git 历史记录了各种开发方法。 核心组件由数据结构组成:等待处理的目录队列和用于存储文件和目录统计信息的二进制堆。 由于性能不佳,最初尝试使用 C++ 队列和互斥体,然后再制定自定义解决方案。 增强功能包括使用 fstatat(2)、statx(2)、getdents(2) 以及最小化线程间交互。 尽管之前没有关于 fstatat 和 statx 相对于传统 stat 函数的优越性的文档,但这些见解可能会使潜在的开发人员受益。
相关文章

原文

Features

  • Accurate counting of hard links with an output inspired by NCDU.
  • Pure ASCII output which is fully compatible with the plain Linux tty.
  • Configurable output format. Changing the maximum depth of files shown is a simple command-line argument.

Examples

By default, dut will output a tree of the biggest directories it finds under your current directory.

$ dut -n 10
 2.4G    0B       /- Local
 2.4G    0B     /- AppData
 2.4G    0B   /- NetworkService
 2.4G    0B |- ServiceProfiles
 2.5G   63M |- servicing
 5.2G  423M |   /- FileRepository
 5.2G  426M | /- DriverStore
 9.6G  2.5G |- System32
  12G  7.2G /- WinSxS
  29G  225M .

The -n 10 option limits it to 10 rows. To limit the depth shown, use -d <n>.

$ dut -n 10 -d 1
 964M    0B |- MEMORY.DMP
1010M    0B |- SoftwareDistribution
 1.2G  1.0G |- SysWOW64
 1.3G  208M |- assembly
 1.8G  1.8G |- SystemApps
 2.4G    0B |- ServiceProfiles
 2.5G   63M |- servicing
 9.6G  2.5G |- System32
  12G  7.2G /- WinSxS
  29G  225M .

The first column in the output tells you how much space a given entry takes up on your disk. This can be an overcount, however, because of hard links (identical files that are only stored once on the disk). Hard links under a directory are deduplicated in the first column's number, but hard links that go outside of a directory to somewhere else will still be counted here.

That's where the second column comes in. It tells you how much of an entry's size is shared with other entries outside of it because of hard links. In the output above, we can see that most of the entries have a lot of data shared with other entries, but the root directory only has 225M shared with others. This tells us that there's a lot of hard links going between all of the entries shown above.

If you want to see how much of an entry's size is unique to just it, you can subtract the second column from the first one.

The full list of options can be seen with dut -h.

How to build

dut comes with a Makefile, so to install it on your system run:

git clone https://codeberg.org/201984/dut.git
sudo make install

The default install location is /usr/local/bin, but this can be changed by specifying a PREFIX value. For example, to install to ~/.local/bin:

make install PREFIX=$HOME/.local

Benchmarks

dut is remarkably fast, but it doesn't win in all cases. It loses to a couple programs when Linux's disk caches aren't populated yet, which is usually the first time you run it on a certain directory. On subsequent runs, dut beats everything else by a significant margin.

Benchmarked programs:

If you know of a faster program, let me know and I'll add it to these benchmarks.

Benchmark 1: Measuring performance from Linux's disk cache

The first benchmark is calculating the total disk usage of both of the SSDs in my laptop. I did warm-up runs beforehand to make sure everything is cached, so this benchmark doesn't touch the disk at all.

Specs

  • CPU: i5-10500h
  • RAM: 16 GB
  • OS: Arch Linux, kernel 6.8.4

In order to make things fair, I forced dut and dust to output in color and show 60 rows. I also added a 10 second sleep between each program's run to limit the effects of thermal throttling.

$ hyperfine 'dut -Cn 60 /' 'du -sh /' 'pdu /' 'dust -n 60 /' 'gdu --non-interactive /' 'dua /' -s 'sleep 10' -i
Benchmark 1: dut -Cn 60 /
  Time (mean ± σ):     467.4 ms ±  11.7 ms    [User: 410.3 ms, System: 4595.4 ms]
  Range (min … max):   442.5 ms … 485.4 ms    10 runs

Benchmark 2: du -sh /
  Time (mean ± σ):      3.566 s ±  0.049 s    [User: 0.775 s, System: 2.743 s]
  Range (min … max):    3.486 s …  3.615 s    10 runs

  Warning: Ignoring non-zero exit code.

Benchmark 3: pdu /
  Time (mean ± σ):     732.1 ms ±  13.8 ms    [User: 1887.3 ms, System: 6123.5 ms]
  Range (min … max):   717.6 ms … 755.8 ms    10 runs

Benchmark 4: dust -n 60 /
  Time (mean ± σ):      1.438 s ±  0.031 s    [User: 3.068 s, System: 6.962 s]
  Range (min … max):    1.397 s …  1.481 s    10 runs

Benchmark 5: gdu --non-interactive /
  Time (mean ± σ):      1.361 s ±  0.103 s    [User: 7.556 s, System: 7.034 s]
  Range (min … max):    1.298 s …  1.569 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: dua /
  Time (mean ± σ):      1.459 s ±  0.133 s    [User: 4.054 s, System: 9.640 s]
  Range (min … max):    1.346 s …  1.659 s    10 runs

Summary
  dut -Cn 60 / ran
    1.57 ± 0.05 times faster than pdu /
    2.91 ± 0.23 times faster than gdu --non-interactive /
    3.08 ± 0.10 times faster than dust -n 60 /
    3.12 ± 0.30 times faster than dua /
    7.63 ± 0.22 times faster than du -sh /

The warning about a non-zero exit code was due to du reporting an error for not being able to access directories in /proc and /root.

Benchmark 2: SSD Performance

This bechmark is operating on the same filesystem as above, except I'm flushing the disk caches in-between runs. This results in having to read all the data from the SSD each time instead of getting it from RAM.

This is a more niche use-case since most of the time dut will be running from the cache. It only has to read from the disk on its first run in a particular directory.

Drives:

  • Intel 660p 512G
  • SX8200PNP-512GT-S
$ sudo hyperfine 'dut -Cn 60 /' 'du -sh /' 'pdu /' 'dust -n 60 /' 'gdu --non-interactive /' 'dua /' -s 'sleep 10' -i -M 3 -p 'echo 1 > /proc/sys/vm/drop_caches'
Benchmark 1: dut -Cn 60 /
  Time (mean ± σ):      5.773 s ±  0.184 s    [User: 0.406 s, System: 4.694 s]
  Range (min … max):    5.561 s …  5.881 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: du -sh /
  Time (mean ± σ):     20.779 s ±  0.058 s    [User: 0.767 s, System: 3.709 s]
  Range (min … max):   20.712 s … 20.819 s    3 runs

  Warning: Ignoring non-zero exit code.

Benchmark 3: pdu /
  Time (mean ± σ):      4.279 s ±  0.292 s    [User: 1.701 s, System: 5.543 s]
  Range (min … max):    4.072 s …  4.613 s    3 runs

Benchmark 4: dust -n 60 /
  Time (mean ± σ):      5.009 s ±  0.348 s    [User: 2.608 s, System: 6.211 s]
  Range (min … max):    4.726 s …  5.397 s    3 runs

Benchmark 5: gdu --non-interactive /
  Time (mean ± σ):      4.090 s ±  0.081 s    [User: 7.027 s, System: 6.989 s]
  Range (min … max):    4.040 s …  4.183 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: dua /
  Time (mean ± σ):      6.269 s ±  0.133 s    [User: 4.541 s, System: 12.786 s]
  Range (min … max):    6.162 s …  6.418 s    3 runs

Summary
  gdu --non-interactive / ran
    1.05 ± 0.07 times faster than pdu /
    1.22 ± 0.09 times faster than dust -n 60 /
    1.41 ± 0.05 times faster than dut -Cn 60 /
    1.53 ± 0.04 times faster than dua /
    5.08 ± 0.10 times faster than du -sh /

Benchmark 3: HDD Performance

For this benchmark, I did the same benchmark as the last except I did it on an HDD instead. Some of the Rust programs perform quite badly in this scenario, but dua still beats dut narrowly.

The test location is my home directory on an old Linux installation. There are approximately 26k subdirectories.

The drive being measured is a 2 terabyte 5400rpm Western Digital WD20EFRX connected to my laptop with a USB enclosure.

$ sudo hyperfine 'dut -Cn 60' 'du -sh' 'pdu' 'dust -n 60' 'gdu --non-interactive' 'dua' -s 'sleep 10' -i -M 3 -p 'echo 1 > /proc/sys/vm/drop_caches'
Benchmark 1: dut -Cn 60
  Time (mean ± σ):     36.720 s ±  0.350 s    [User: 0.078 s, System: 0.740 s]
  Range (min … max):   36.411 s … 37.100 s    3 runs

Benchmark 2: du -sh
  Time (mean ± σ):     44.810 s ±  0.043 s    [User: 0.108 s, System: 0.657 s]
  Range (min … max):   44.767 s … 44.854 s    3 runs

Benchmark 3: pdu
  Time (mean ± σ):     81.361 s ±  0.954 s    [User: 0.320 s, System: 0.935 s]
  Range (min … max):   80.675 s … 82.451 s    3 runs

Benchmark 4: dust -n 60
  Time (mean ± σ):     86.991 s ±  2.449 s    [User: 0.337 s, System: 1.042 s]
  Range (min … max):   84.411 s … 89.286 s    3 runs

Benchmark 5: gdu --non-interactive
  Time (mean ± σ):     41.096 s ±  0.229 s    [User: 1.086 s, System: 1.165 s]
  Range (min … max):   40.837 s … 41.273 s    3 runs

Benchmark 6: dua
  Time (mean ± σ):     34.472 s ±  0.965 s    [User: 9.107 s, System: 29.192 s]
  Range (min … max):   33.733 s … 35.564 s    3 runs

Summary
  dua ran
    1.07 ± 0.03 times faster than dut -Cn 60
    1.19 ± 0.03 times faster than gdu --non-interactive
    1.30 ± 0.04 times faster than du -sh
    2.36 ± 0.07 times faster than pdu
    2.52 ± 0.10 times faster than dust -n 60

Why are pdu and dust so bad on HDD?

It's hard to say. My best guess is they have a really HDD-unfriendly access pattern, since they both use Rayon for multithreading which uses FIFO ordering for tasks. This results in them doing a breadth-first search of the filesystem, whereas dut and du both use depth-first search. I don't know why one ordering is better than the other, but the difference is pretty drastic.

I also think that ordering is the reason dut doesn't do so well on SSD either, but I'm not so sure of that.

联系我们 contact @ memedata.com