(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=40213731

此人讨论了与 Nim 语言编码和使用 Apple 的 APFS 文件系统相关的各种项目。 他们提到了由于条件恶劣而共享某些项目所面临的挑战,并对 FUSE 等潜在概念表示兴奋,FUSE 是一种用户空间文件系统,只需生成请求的数据,而不需要事先创建所有文件。 他们提出了一些可能性,例如通过文件而不是传统方法公开程序的配置。 在讨论 APFS 限制时,他们讨论了 Apple 不愿发布类似 FUSE 的 API 的潜在原因,重点关注系统稳定性和确保正确的用例等方面。 他们反思了过去的缓存优化经验,并使用 Linux 而不是 MacOS 进行此类探索。 此外,他们还提出了改进建议,例如与替代归档工具集成以及考虑内存文件系统。 整个对话展示了该人对编程的热情、他们对系统优化的想法以及他们对使用不同操作系统的看法。

相关文章

原文


Oh this is cool! I recently wrapped libfuse in Nim and after porting the 'hello' filesystem example I made one which is more or less exactly this. However my version you pipe data and have to provide a mountpoint, then when it's done it writes the result over stdout. That means you can inline it in a pipe chain but also that you have to make sure to grab the output.

At the moment I'm exploring other stuff which could be made into file systems. I've got a statusbar thing for the Nimdow window manager which allows you to write contents to individual files and it creates a bar with blocks on them as the output. It makes it super easy to swap out what is on your bar which is pretty neat.

Another tool I've made is a music player. It uses libvlc and when given a folder it reads all the media with ID3 tags and sets up folders like 'by-artist', 'by-album', etc. Each file is named as ' - ' and contains the full path to the actual file. To play a song you cat one of these files into 'control/current' and write the word play to 'control/command'. There's a bit more to it like that like a playlist feature and some more commands, but that's the basic idea. The goal is to have a super-scriptable music player.



Here's an idea: recursively mount code files/projects. Use something like tree-sitter to extract class and function definitions and make each into a "file" within the directory representing the actual file. Need to get an idea for how a codebase is structured? Just `tree` it :)

Getting deeper into the rabbit hole, maybe imports could be resolved into symlinks and such. Plenty of interesting possibilities!



The audio player is unfortunately not on GitHub yet, I've still got a few kinks to work out before it's in a shareable state. The statusbar project was also shared mostly so the other Nimdow users could play around with it, so the code quality is quite sub-par.


JNRowe beat me to it! Feel free to browse the rest of my GitHub, its mostly Nim code. And if you want to talk Nim or Fuse you can join the Nim Discord server (or the Matrix or IRC bridge) or post on the Nim forum.


This makes me think, it would be nice if there was an easy built-in way to expose information about a process using the filesystem. Something like "cat /proc/$pid/fs/current_track" to get a name of a current song from a music player, or "ls /proc/$pid/fs/tabs" to list open tabs in my browser (and maybe use this to grab the html or embedded images).

I mean right now it's possible to do this using FUSE, but that's convoluted and nobody does it.



Can you actually write to the /proc directory? Working with libfuse I've had much of the same ideas, basically allow programs to expose information as files. The beauty of the fuse system is also that you only have to respond to requests when they happen. So you don't have to actually create all these files until someone asks for them. Another idea I've had is to expose the configuration of a program through a filesystem. Instead of having a config file and a refresh command or a complicated IPC you could simply write to files to change the config.


Useful enough that it should be an OS-level standard feature, imho.

Unix-like OSes allow mounting disk images to explore their contents. But there's many more file formats where exploring files-inside-files is useful. Compressed archives, for one. Some file managers support those, but (imho) application-level is not the optimal layer to put this functionality.

Could be implemented with a kind of driver-per-filetype.



Really what you'd like to see is a way to write the mount command for each file type (do one thing well) and another command to detect the file type and dispatch accordingly (probably similar to the `file` command), all in user space.

The only thing standing in the way of this today is that MacOS doesn't expose a user space file system API. You can do this on Linux, Windows, and BSDs today.

(No, file provider extensions don't cut it, Apple devs who read this, please give us a FUSE equivalent, we know it exists).



I want stuff to work like ZipMagic did in the early 90s/ early 2000s.

You could cd into zip files, they would act as directories and files at the same time.

I seem to remember Linus saying a file could act like a directory in Linux a long time ago too.

Though I don't think Linux has filters for the filesystem like Windows does so implementation might be more tricky.



Recent macOS versions do have a general purpose built-in API for user mode filesystems. That API is incompatible with FUSE. The big problem is it is undocumented and you need an entitlement from Apple to use it, and Apple won’t give you that.

Apple do have a publicly available API for cloud file systems (Dropbox-style products), but it makes a lot of assumptions which makes it effectively unusable for other use cases.

Then there are third party solutions like osxfuse. These have the problem that they rely on kernel extensions and Apple keeps on making those harder and harder, and is aiming to get rid of them; plus, they are all now proprietary licensed, albeit often with a free license for open source use.

One approach that does work without any kernel extensions or private APIs is to make your user filesystem an NFS server and then mount that. One competitor to osxfuse does that, but it also is proprietary



Yes, that’s not what I’m talking about though

MacFUSE uses the kernel mode VFS API

I’m talking about the undocumented user mode filesystem LiveFS/UserFS/com.apple.filesystems.lifs API which was added in Monterey (macOS 12), and since Ventura (macOS 13) is used to implement the OOTB FAT and exFAT filesystem support. Using that requires private entitlements (e.g. com.apple.private.LiveFS.connection) which Apple (thus far) won’t give to anyone else



It is designed for the cloud storage use case (Dropbox, Google Drive, etc) - it creates local copies of files and synchs them with remote ones. Not what you want to do in the general case.


I take it from the downvotes people think what I said is factually wrong?

If that's the case, I wish someone would point out where I've got it wrong instead of just silently downvoting



Well that requires a kext so it's a nonstarter, and fuse-t uses NFS which is extremely janky and unreliable on MacOS.

The fundamental issue is that macOS doesn't provide an API for this natively.



> fuse-t uses NFS which is extremely janky and unreliable on MacOS

I was wondering what issues you were talking about, and then I found this - https://github.com/macos-fuse-t/fuse-t/issues/45 - data corruption

> The fundamental issue is that macOS doesn't provide an API for this natively.

The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it. I don’t understand why Apple won’t.

Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?



> The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it.

If no one can call it it's not an API, it's an implementation detail. And I don't even think its exposed by headers, just alluded to by people who claim APFS is implemented in user space.

> I was wondering what issues you were talking about, and then I found this

Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request. That's unacceptable for a user space file system (although FUSE is only kinda better, in that it can force processes that read from the FS into uninterruptable sleep that prevents them from being killed).

> Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

Because Apple doesn't give a fuck about developers. Every developer will eventually learn this, but for those that haven't - Apple doesn't want you writing software for their platform, unless you're an Apple employee and on an Apple team paid to do it. It's why their docs suck, it's why to learn anything you need to watch ADC videos instead of read manpages, and it's why all the cool stuff is behind protected entitlements that you can't get or will be limited in using.



No, it's almost certainly not because they don't give a fuck about developers. They definitely do.

It's much more likely that they want to:

a. Dogfood the API using internal use cases first when they can still make changes to the API without breaking anything. Note that the latest MacOS releases moved some filesystems into userspace using this new API. They probably learned some stuff by doing that.

b. Work out how to protect system stability from crappy userland filesystems. As you point out, bugs in FUSE providers can hang apps.

c. Work out how such an API interacts with their sandboxing system and how to avoid FUSE-style filesystems being used to subvert the sandbox. This is a common source of exploits in FUSE-style systems and is one of the key learnings from GNU/Hurd: UNIX software is written on the assumption that filing systems aren't malicious and invalidating that assumption creates new bug classes.

d. Work out what the most important use cases are and try to ensure those use cases will have a good or at least uniform UX first.

Providing a FUSE-like API is presumably also just not a high priority. By far the most common use case in terms of number of users is the Dropbox use case. FUSE is mostly used for toys and experiments beyond that (like filefs). Those matter and I'm sure there are friendly geeks on the Darwin team who'd like to enable those, but Linux also works for exploration. Certainly Apple management would not be happy about an engineer who decided to enable nerd experimentation but undermined the security system whilst doing so.

And it's worth remembering that you can have root on macOS. It means disabling SIP and adding a kernel boot arg, but that only takes a few minutes and then you can grant apps any entitlements you like:

https://github.com/osy/AMFIExemption

That's no good for people who aren't developers, but most FUSE filesystems are designed for developers anyway.



Maybe? It's kind of hard to tell. It's not exactly easy to write any of these servers from scratch to find out. But I wouldn't be surprised - they want app developers to be using the file provider extension API, which is unsuitable for everyone who isn't making a Dropbox clone.

That link is very interesting. It doesn't smell like any other Apple API as they're exposing a vtable with good documentation comments. It would be interesting to hack with this with SIP disabled to see how it works. I'm especially curious about how mount/unmount work and how the plugin registers itself with the OS, or what application is the client/host.



> why don't they

It would make macOS more of a general-purpose OS, would increase the amount of functionality from which third parties would benefit, but Apple themselves would likely not. That would increase the number and variety of tech support requests, ever so slightly but still, and would introduce a few new attack surfaces.

Instead, Apple's strategy is to tighten the macOS more and more, and turn it into a specialist OS completely controlled by Apple, with a few companies like Adobe and Ableton licensing access to its internals.



Apple used to be a lot more developer-friendly company. It is part of what got them where they are now - the fact that so many developers use Macs, which in turn encourages business software vendors to support Macs

Stuff like this is of little interest to ordinary users (at least not directly), but appeals to developers

By de-emphasising the developer is experience, they are undermining one of the factors that got them to where they are today



I've been using OSX since 2003, and developing on it for more than ten years. At no point have I seen anything that it's reasonable to call "tightening macOS", let alone the absurd claim of complete control except for an inner circle of elite companies.

The closest thing would be adding the attestation system, so that unsigned binaries have to be explicitly given permission to run... once. That's a security feature which trades a bit of convenience for a lot of protection, especially for the average user. I have no problem with that sort of thing.

I see this sort of sentiment very frequently from non-users of the operating system, but never from those of us who actually use it. Go figure.



This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it, probably not helped by the whole murdering-his-wife thing.


> This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it

IIRC, because it contained these strange beasts which functioned as both files and directories - i.e. cat would return data, but then you could cd into them and run ls. Linus (among others) didn’t want to permit those violations of the file-directory dichotomy into the Linux kernel.



Oh that's funny- I remember a much much earlier Linus mentioning how this would he possible in Linux, I didn't know anyone actually did it.

I think you really should be able to "cd" into any kind of structured data.



It allows you to use any tool available for regular files, on the files-in-files as well.

As opposed to extract contents and then work on that (requiring extra steps + disk space). Or be limited to what specialized utilities support.



We used this in Gitlab CI. Unfortunately, the only way they deal with artifacts is by putting them in Zip files. Cache between builds would thus be stored as a Zip file. However, fully extracting it before each build would sometimes take as much, if not more time than to just build fresh. Mounting a Zip file as a filesystem allows extracting entries on-demand, at the time a file access would've been made. This was a notable speedup in our compilation process.


> Compressed archives, for one

You can look inside of archives pretty easily with `lsar` (part of the unar package). It works with disk images like ISO 9660 files too

But yes, especially for nested archives, having deeper OS support would be nice.



It exists :-)

For zip archives, there are fuse-zip and mount-zip which are FUSE filesystem.

As an intermediate between OS level and application-level, there are desktop environment level: gvfs for GNOME and KIO for KDE, but they are compatible only in their own ecosystems.



Would be nice to have something that integrates with 7z - it supports a lot of weird archive types, including "weird" ones I care about (for example PE files, better known as ".exe files").


I thought archivemount already did that. Am I missing something?

Anyway, even if that's not what you are looking for, FUSE is a more general mechanism that will allow you to do what you want (well, it seems like, at least) and much more.



This is really neat, but when I saw the headline I got excited that it was something I have been looking for / considering writing, and I figure the comments here would be a good place to ask if something like this exists:

Is there a FUSE filesystem that runs in-memory (like tmpfs) while mounted, and then when dismounted it serializes to a single file on disk? The closest I can find are FUSE drivers that mount archive files, but then you don't get things like symlinks.



Closest I found: https://github.com/guardianproject/libsqlfs

> The libsqlfs library implements a POSIX style file system on top of an SQLite database. It allows applications to have access to a full read/write file system in a single file, complete with its own file hierarchy and name space. This is useful for applications which needs structured storage, such as embedding documents within documents, or management of configuration data or preferences. Libsqlfs can be used as an shared library, or it can be built as a FUSE (Linux File System in User Space) module to allow a libsqlfs database to be accessed via OS level file system interfaces by normal applications.



Why does it have to be in memory?

I’m sure you’re already aware of this, but there are all kinds of very real scenarios that could lead to corrupted data if you’re only flushing the buffer upon unmounting.

Sounds like you’ve got an interesting problem you’re trying to solve though.



I can't think of anything _exactly_ like that, but I think you can get close by just copying some type of image file to /tmp and then moving it to disk when you're done after unmounting.


/tmp isn't stored in memory; it's usually a normal on-disk filesystem that's cleared regularly. You want /dev/shm instead, which is a purely in-memory filesystem on normal Linux systems.


What!? you didn't add a versioned database layer on a server with code stored in clearcase that stored those ClearCase config specs to manage the configuration of your config specs to manage the configuration of your version control system that had your application configuration in it?! How did you even operate? /s


Ha. I did this back in 2003. It's surprisingly fast, and makes it simple to do granular locking.

I used it as a per-user database for a web-templating language for a giant web-site building tool.



this is cool but it's fuse.. which is not so cool.

these days i reach for jq.. I've recently became interested in duckdb too.

Using a tool that is specialized for the format is usually more ideal than a generic one treating everything as files.

There is a lot in JSON that can't be represented in a flexible way as files and directories. For instance, what if a key has "/" in it.. What happens to lists and their order when you re-serialize How are hashes represented.. how can you tell if a parent is a object or a list.. inserting a item into a list is a ton of error prone renames.. the list goes on

(edited for formatting)



This looks awesome, I need to give it a try asap. I can very well see myself using this to navigate or search inside JSON files


When I saw the title I thought it was a meme.

But wow what a clever idea. Not sure id ever need to reach for it personally as I do most data processing in a higher level language, but I can imagine people can find use cases.

Nice out of the box thinking



It's an interesting idea but I think the usefulness would be greatly enhanced if it could handle json arrays; most needed json structures contain array elements in my experience


Neat. Now how about a filesystem that takes a directory of files and exposes it as a single json file? You could call it the Filesystem File, and mount it in the File Filesystem if you wanted...


Daniel Stenberg, of cURL fame, co-write an Amiga-editor called FrexxEd where the open buffers were exposed as files in the filesystem.

Meaning you could write any shell script to manipulate an open buffer (not that important as it also exposed all editor functionality both via IPC via Arexx and via FPL - a C-like scripting language), and that you could e.g. compile without saving (that was very helpful on a system where a lot of people might only even have a single floppy drive, and where being able to have the compiler in the drive and compile straight from the in-memory version in RAM so you didn't have to keep swapping floppies was highly useful (just remember to save before actually trying to run the program - no memory protection...)



Classic MacOS in the 80s had "MPW" macintosh programmers workshop - that treated open text windows as files and selections within the windows as files, so it wasn't uncommon to have a portion of an otherwise documentation file have a "click here and hit enter", which would use the selected text as stdin for some semi-ported unix tool. (no memory protection or multitasking, so true pipelines with backpressure didn't work)


I guess support for XML would be tricky, because XML is just way more complex format than the ones already supported. It is still essentially a tree, but with additional structure.

Representing elements and their contents is easy enough. But attributes, comments, processing instructions, entities... And remember, an XML document can include a DTD (it does not have to be in a separate file).

To present it as a file system in a useful, non-convoluted way? I will be very, very interested if it's possible, but not holding my breath.



On the one hand, I can’t help but point out you forgot to mention the other big inherent complexity that would make XML-as-FS a uniquely complex beast: namespaces.

On the other hand, I can’t help but point out that a related technology comes very close to demonstrating how you might map XML to a file system: XPath. Probably the biggest issue would be syntax, and again largely due to namespaces.



Op’s ffs do not target FreeBSD and it l seems like referenced system is FreeBSD only. Claiming naming rights is a stretch here
联系我们 contact @ memedata.com