![]() |
|
![]() |
| > As someone who maintains critical open-source software, I can strongly empathize, even if it’s not an approach I would take.
Pretty strange thing to say in the context of a closed source app. |
![]() |
| Thank you for creating and sharing this utility.
I ran it over my Postgres development directories that have almost identical files. It saved me about 1.7GB. The project doesn't have any license associated with it. If you don't mind, can you please license this project with a license of your choice. As a gesture of thanks, I have attempted to improve the installation step slightly and have created this pull request: https://github.com/ttkb-oss/dedup/pull/6 |
![]() |
| In regards to the second point, this isn't correct for ZFS: "If several files contain the same pieces (blocks) of data or any other pool data occurs more than once in the pool, ZFS stores just one copy of it. Instead of storing many copies of a book it stores one copy and an arbitrary number of pointers to that one copy." [0]. So changing one byte of a large file will not suddenly result in writing the whole file to disk again.
[0] https://www.truenas.com/docs/references/zfsdeduplication/ |
![]() |
| This applies to modifying a byte. But inserting a byte will change every block from then on, and will force a rewrite.
Of course, that is true of most filesystems. |
![]() |
| APFS shares blocks so only blocks that changed are no longer shared. Since a block is the smallest atomic unit (except maybe an inode) in a FS, that’s the best level of granularity to expect. |
![]() |
| Now you understand why this app costs more than 2x the price of alternatives such as diskDedupe.
Any halfway-competent developer can write some code that does a SHA256 hash of all your files and uses the Apple filesystem API's to replace duplicates with shared-clones. I know swift, I could probably do it in an hour or two. Should you trust my bodgy quick script? Heck no. The author - John Siracusa - has been a professional programmer for decades and is an exceedingly meticulous kind of person. I've been listening to the ATP podcast where they've talked about it, and the app has undergone an absolute ton of testing. Look at the guardrails on the FAQ page https://hypercritical.co/hyperspace/ for an example of some of the extra steps the app takes to keep things safe. Plus you can review all the proposed file changes before you touch anything. You're not paying for the functionality, but rather the care and safety that goes around it. Personally, I would trust this app over just about any other on the mac. |
![]() |
| Yeah, the lack of involvement was more in response to ZFS doing this not this app. I could have crossed the streams with other threads about ZFS if it's not directly in this thread |
![]() |
| Most EULA’s would disclaim liability for data loss and suggest users keep good backups. I haven’t read a EULA for a long time, but I think most of them do so. |
![]() |
| If Apple is anything like where I work, there's probably a three-year-old bug ticket in their system about it and no real mandate from upper management to allocate resources for it. |
![]() |
| am I really that old that I remember this being the default for most of the software about 10 years ago? Are people already that used to the subscription trap that they think this is a new model ? |
![]() |
| I also really like this pricing model.
I wish it were more obvious how to do it with other software. Often there's a learning curve in the way before you can see the value. |
![]() |
| FWIW, when I wrote a tool like this I used same size + some hash function, not MD5 but maybe SHA1, don't remember. First and last bytes is a good idea, didn't think of that. |
![]() |
| I'd hash the first 1024 bytes of all files, and starts from there is any collision. That way you don't need to hash the whole (large) files, but only those with same hashes. |
![]() |
| Depending on the medium, the penalty of reading single bytes in sparse locations could be comparable with reading the whole file. Maybe not a big win. |
![]() |
| Just want to mention: Apple ships a modified version of the copy command (good old cp) that supports the ability to use the cloning feature of APFS by using the -c flag. |
![]() |
| And in case your cp doesn't support it, you could also do it by invoking Python. Something like `import Foundation; Foundation.NSFileManager.defaultManager().copyItemAtPath_toPath_error_(...)`. |
![]() |
| They might not have ported it. They could have a Python __getattr__ implementation that returns a callable that simply uses objc_msgSend under the hood. |
![]() |
| Yes, that's how it's described in this talk transcript:
https://asciiwwdc.com/2017/sessions/715 Let’s say for simplification we have three metadata regions that report all the entirety of what the file system might be tracking, things like file names, time stamps, where the blocks actually live on disk, and that we also have two regions labeled file data, and if you recall during the conversion process the goal is to only replace the metadata and not touch the file data. We want that to stay exactly where it is as if nothing had happened to it. So the first thing that we’re going to do is identify exactly where the metadata is, and as we’re walking through it we’ll start writing it into the free space of the HFS+ volume. And what this gives us is crash protection and the ability to recover in the event that conversion doesn’t actually succeed. Now the metadata is identified. We’ll then start to write it out to disk, and at this point, if we were doing a dry-run conversion, we’d end here. If we’re completing the process, we will write the new superblock on top of the old one, and now we have an APFS volume. |
![]() |
| I have file A that's in two places and I run this.
I modify A_0. Does this modify A_1 as well or just kind of reify the new state of A_0 while leaving A_1 untouched? |
![]() |
| > Which means if you actually edited those files, you might fill up your HD much more quickly than you expected.
I'm not sure if this is what you intended, but just to be sure: writing changes to a cloned file doesn't immediately duplicate the entire file again in order to write those changes — they're actually written out-of-line, and the identical blocks are only stored once. From [the docs](^1) posted in a sibling comment: > Modifications to the data are written elsewhere, and both files continue to share the unmodified blocks. You can use this behavior, for example, to reduce storage space required for document revisions and copies. The figure below shows a file named “My file” and its copy “My file copy” that have two blocks in common and one block that varies between them. On file systems like HFS Plus, they’d each need three on-disk blocks, but on an Apple File System volume, the two common blocks are shared. [^1]: https://developer.apple.com/documentation/foundation/file_sy... |
![]() |
| Everyone misunderstood your comment for a reason.
You wrote: but it only found 1GB of savings on a 8.1GB folder. It’s quite a saving and that’s what everyone understood from your comment. |
![]() |
| I think this is somewhat funny.
His comment is pretty understandable if you've done frontend work in javascript. Node_modules is so ripe for duplicate content that some tools explicitly call out that they're disk efficient (It's literally in the tagline for PNPM "Fast, disk space efficient package manager": https://github.com/pnpm/pnpm) So he got ok results (~13% savings) on possibly the best target content available in a user's home directory. Then he got results so bad it's utterly not worth doing on the rest (0.10% - not 10%, literally 1/10 of a single percent). --- Deduplication isn't super simple, isn't always obviously better, and can require other system resources in unexpected ways (ex - lots of CPU and RAM). It's a cool tech to fiddle with on a NAS, and I'm generally a fan of modern CoW filesystems (incl APFS). But I want to be really clear - this is people picking spare change out of the couch style savings. Penny wise, pound foolish. The only people who are likely to actually save anything buying this app probably already know it, and have a large set of real options available. Everyone else is falling into the "download more ram" trap. |
![]() |
| I wrote a similar (but simpler) script which would replace a file by a hardlink if it has the same content.
My main motivation was for the packages of Python virtual envs, where I often have similar packages installed, and even if versions are different, many files would still match. Some of the packages are quite huge, e.g. Numpy, PyTorch, TensorFlow, etc. I got quite some disk space savings from this. https://github.com/albertz/system-tools/blob/master/bin/merg... |
![]() |
| If you'd like. In the blog post he says he wrote the prototype in an afternoon. Hyperspace does try hard to preserve unique metadata as well as other protections. |
![]() |
| CoW is a function of ReFS, shipped with Server 2016. "DevDrive" is just a marketing term for a ReFS volume which has file system filters placed in async mode or optionally disabled altogether. |
![]() |
| No because it isn't getting rid of the duplicate, it's using a feature of APFS that allows for duplicates to exist separately but share the same internal data. |
![]() |
| Replacing duplicates with hard links would be extremely dangerous. Software which expects to be able to modify file A without modifying previously-identical file B would break. |
![]() |
| Judging by this sub-thread, the process really is harder to explain that it appears on the surface. The basic idea is simple but the implementation requires deeper knowledge. |
![]() |
| This operates within one drive. Possibly within one magical APFS “partition” (or whatever APFS calls partitions), I can’t remember |
![]() |
| On a related note: are there any utilities that can measure disk usage of a folder taking (APFS) cloned files into account? |
![]() |
| These are often Time Machine snapshots. Nuking those can free up quite a bit of space.
|
![]() |
| Even without time machine there are loads of storage spent on “system”. Especially now with the apple intelligence (even when turned off). |
![]() |
| There's no magic around it, macOS just doesn't do a good job explaining it using the built in tools. Just use Daisy Disk or something. It's all there and can be examined. |
![]() |
| Reminds me of Avira's Luke Filewalker - I wonder if they needed any special agreement with Lucasfilm/Disney. I couldn't find any info on it, and their website doesn't mention Star Wars at all. |
![]() |
| Downloaded. Ran it. Tells me "900" files can be cleaned. No summary, no list. But I was at least asked to buy the app. Why would I buy the app if I have no idea if it'll help? |
![]() |
| They had long discussions about the pricing on the podcast the author is a part of (atp.fm). It went through a few iterations of one time purchase, fee for each time you free up space and a subscription. There will always be people unhappy about either choice.
Edit: Apparently both is possible in the end: https://hypercritical.co/hyperspace/#purchase |
![]() |
| People who want the app to stick around and continue to be developed.
I worry about that with Procreate. It feels like it's priced too low to be sustainable. |
![]() |
| Claude 3.7 just rewrote the whole thing (just based on reading the webpage description) as a commandline app for me, so there's that.
And because it has no Internet access yet (and because I prompted it to use a workaround like this in that circumstance), the first thing it asked me to do (after hallucinating the functionality first, and then catching itself) was run `curl https://hypercritical.co/hyperspace/ | sed 's/<[^>]*>//g' | grep -v "^$" | clip` ("clip" is a bash function I wrote to pipe things onto the clipboard or spit them back out in a cross-platform linux/mac way)
|
![]() |
| The Mac App Store page has the pricing at the bottom in the In-App Purchases section..
TL;DR - $49 for a lifetime subscription, or $19/year or $9/month. It could definitely be easier to find. |
![]() |
| The price does seem very high. It’s probably a niche product and I’d imagine developers are the ones who would see the biggest savings. Hopefully it works out for them |
![]() |
| > Hyperspace can’t be installed on “Macintosh HD” because macOS version 15 or later is required.
macOS 15 was released in September 2024, this feels far too soon to deprecate older versions. |
![]() |
| I can’t remember for sure but there may also have been a recent file system API he said he needed. Or a bug that he had to wait for a fix on. |
![]() |
| Came here to post the same thing. Would love to try the application, but I guess not if the developer is deliberately excluding my device (which cannot run the bleeding edge OS). |
![]() |
| In fairness, I don't think you can describe it as bleeding edge when we're 5 months into the annual 12 month upgrade cycle. It's recent, but not exactly an early adapter version at this point. |
![]() |
| Expensive. Keeping us on the expensive hardware treadmill. My guess is that it cannot be listed in the Apple store unless its only for Macs released in the last 11 months. |
![]() |
| As a web dev, it’s been fun listening to Accidental Tech Podcast where Siracusa has been talking (or ranting) about the ins and outs of developing modern mac apps in Swift and SwiftUI. |
![]() |
| RTL and accessibility are on the roadmap, the latter for this next version IIRC
I'd argue there's a lot more to iced than just being a quick toolkit. the Elm Architecture really shines for GUI apps |
![]() |
| For those mentioning that there's no price listed, it's not that easy as in the App Store the price varies by country. You can open the App Store link and then look at "In App Purchases" though.
For me on the German store it looks like this:
So it supports both one time purchases and subscriptions. Depending on what you prefer. More about that here: https://hypercritical.co/hyperspace/#purchase |
![]() |
| It would be interesting if payments bought a certain amount of saved space, and the rate was based on current storage prices, to keep it competitive with the cost of just expanding storage. |
![]() |
| Its interesting how Linux tools are all free when even trivial mac tools are being sold. Nothing against someone trying to monetize but the linux culture sure is nice! |
![]() |
| It's not that nice to call someone's work they spent months on "trivial" without knowing anything about the internals and what they ran into. |
![]() |
| By now, you know it’s not de-duping anything-nothing is deleted.
Of course the Copy-on-Write clone functionality has been available since APFS became the default file system in 2019. |
![]() |
| A 20$ 1 year licence for something that probably has a FOSS equivalent on Linux...
However, considering Apple will never ever ever allow user replaceable storage on a laptop, this might be worth it. |
![]() |
| The cost is because of the fact people won't use it regularly. The developer is offering life time unlocks, lower cost levels for shorter timeframes etc. |
![]() |
| Copy-on-write means that it performs copy only when you make the first change (and only copies part that changes, rest is used from the original file), until then copying is free. |
0 - https://github.com/ttkb-oss/dedup