![]() |
|
![]() |
| > I should have just used on-disk mode from the start, but only now know better.
Yeah, I saw the recent post about reducing rqlite disk space usage. Using the on-disk sqlite as both the FSM and the Raft snapshot makes a lot of sense here. I'm curious whether you've had concerns about write amplification though? Because we have only the periodic Raft snapshots and the FSM is in-memory, during high write volumes we're only really hammering disk with the Raft logs. > Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite. My understanding is that rqlite Raft entries are mostly SQL statements (is that right?). Where Nomad is somewhat different (and probably closer to the OP) is that the Raft entries are application-level entries. For entries that are commands like "stop this job"[0] upgrades are simple. The tricky entries are where the entry is "upsert this large deeply-nested object that I've serialized", like the Job or Node (where the workloads run). The typical bug here is you've added a field way down in the guts of one of these objects that's a pointer to a new struct. When old versions deserialize the message they ignore the new field and that's easy to reason about. But if the leader is still on an old version and the new code deserializes the old object (or your new code is just reading in the Raft snapshot on startup), you need to make sure you're not missing any nil pointer checks. Without sum types enforced at compile time (i.e. Option/Maybe), we have to catch all these via code review and a lot of tedious upgrade testing. > it requires discipline on the part of the end-users too. Oh for sure. Nomad runs into some commercial realities here around how much discipline we can demand from end-users. =) [0] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#... [1] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#... |
![]() |
| I think the structure is very simple. It's just a lot of items like your comment is item 41207393 as in https://news.ycombinator.com/item?id=41207393
I think that is just written to disk as something like file41207393 when you click reply. When the system needs an item it sees if it's cached in memeory and otherwise reads it from disk and I think that is pretty much the whole memory system. Some other stuff like user id that works in the same sort of way. |
![]() |
| I do feel like this largely summarizes as "we built our own sqlite + raft replication", yeah. But without sqlite's battle-tested reliability or the ability to efficiently offload memory back to disk.
So, basically, https://litestream.io/ . But perhaps faster switching thanks to an explicit Raft setup? I'm not a litestream user so I'm not sure about the subtleties, but it sounds awfully similar. That overly-simplified summary aside, I quite like the idea and I think the post does a pretty good job of selling the concept. For a lot of systems it'll scale more than well enough to handle most or all of your business even if you become abnormally successful, and the performance will be absurdly good compared to almost anything else. |
![]() |
| You should probably RTFA before making broad assumptions on their solution and how it works. Most of what you wrote is both incorrect and addressed in the article. |
![]() |
| As for your second question, I don't think you'd benefit much from than that, for two reasons:
- rqlite is a Raft based system, with quorum requirements. Running 2-node systems don't make much sense. [1]
- Secondly, all writes go to the Raft leader (rqlite makes sure this happens transparently if you don't initially contact the Leader node [2]). A load balancer, in this case, isn't going to allow you to "spread load". What is load balancer is useful for when it comes to rqlite is making life simpler for clients -- they just hit the load balancer, and it will find some rqlite node to handle the request (redirecting to the Leader if needed).
[1] https://rqlite.io/docs/clustering/general-guidelines/#cluste... [2] https://rqlite.io/docs/faq/#can-any-node-execute-a-write-req... |
![]() |
| Thank you - so my takeaway is that rqlite is well suited for distributed “publishing” of data ala etcd, but it is possible to use it as a Postgres replacement - thank you I will give it a go |
![]() |
| SQlite doesn't do Raft. There isn't any simple way to do replicated SQlite. (In fact, writing your own database is probably the simplest way currently, if SQlite+Raft is actually what you want.) |
![]() |
| You don’t even need a ram disk imho, databases already cache everything in memory and only writes reach the disk.
Just try and cold-start your database and run a fairly large select twice. |
![]() |
| Also the OS will cache a lot of the reads even if your database isn’t sophisticated enough or tuned correctly. Still could be a fun exercise, as with all things on here. |
![]() |
| Big vibes of "We are very smart, see how smart we are?" from the blog post.
These kind of people usually suck to work with. I'm glad they've found a startup to sink so I don't have to deal with them. |
![]() |
| This is fascinating, thanks for the data! I agree with the the other reply to this: I probably should've said that it's easy to get a machine with 100s of GB of RAM instead of saying it's "cheap". |
![]() |
| I've got a handful of small Go applications where I just have a "go generate" command that generates the entire dataset as Go, so the data set ends up compiled into the binary. Works great.
https://emoji.boats/ is the most public facing of these. I also have built a whole class of micro-services that pull their entire dataset from an API on start up, hold it resident and update on occasion. These have been amazing for speeding up certain classes of lookup for us where we don't always need entirely up to date data. |
![]() |
| My first thought was, “oh, I used to do this when I wrote Common Lisp, it’s funny someone rediscovered that technique in ”.
But no, just more lispers. |
![]() |
| I get your point and I don’t doubt the project you’re talking about was a mess, but the file system is a database, and can be a very good choice, depending on exactly what you’re doing. |
![]() |
| Why not sqlite? put the json in a single column, maybe copy some parts of it or metadata to another two or three. Should be faster than the filesystem for reading multiple rows. |
![]() |
| Check out https://eclipsestore.io (previously named Microstream) if you're into Java and interested in some of the ideas presented in this article. You use regular objects, such as Records, and regular code, such as java.util.stream, for processing, and the library does snapshotting to disk.
I haven't tried it out but just thinking of how many fewer organizational hoops I would have to jump through makes we want to try it out: - No ordering a database from database operations. - No ordering a port opening from network operations. - No ordering of certificates. - The above times 3 for development, test and production. - Not having to run database containers during development. I think the sweet spot for me would be in services that I don't expect to grow beyond a single node and there is an acceptance for a small amount of downtime during service windows. |
![]() |
| This reminds me of the heated discussions around jQuery by some so called performance driven devs, which cumulated into this website:
https://youmightnotneedjquery.com/ The overwhelming majority underestimates the beauty and effort as well as experience that goes into abstractions. There are some true geniuses at times doing fantastic work, to deliver syntactical sugar while the critics mock the maybe somewhat larger bundle size for “a couple of lines frequently used.” That’s why. In the end, a good framework is more than just an abstraction. It guarantees consistency and accessibility. Try to understand the source code if possible before reinventing the wheel is my advice. What maybe starts out to be fun quickly becomes a burden. If there weren’t any edge cases or different conditions, you wouldn’t need an abstraction. Been there, done that. |
![]() |
| We didn’t want to build something complicated, so we implemented our own raft consensus layer. Have you considered just using Redis? |
![]() |
| Well that’s only 7y of working with people to learn from, it’s not nothing but it’s not enough credentials to make me go from “it’s a horrible idea” to “I must be missing something” |
![]() |
| Who cares about my experience- I am just random guy posting on the internet.
I am also not claiming I have a way to run every other new software compan. |
![]() |
| This is cool! I’m always excited by people trying simpler things, as a big fan of using Boring Technology.
But I have some bad news: you haven’t built a system without a database, you’ve just built your own database without transactions and weak durability properties. > Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk. This is actually not an easy thing to do. If your shutdowns are always clean SIGSTOPs, yes, you can reliably flush writes to disk. But if you get a SIGKILL at the wrong time, or don’t handle an io error correctly, you’re probably going to lose data. (Postgres’ 20-year fsync issue was one of these: https://archive.fosdem.org/2019/schedule/event/postgresql_fs...) The open secret in database land is that for all we talk about transactional guarantees and durability, the reality is that those properties only start to show up in the very, very, _very_ long tail of edge cases, many of which are easily remedied by some combination of humans getting paged and end users developing workarounds (eg double entry bookkeeping). This is why MySQL’s default isolation level can lose writes: there are usually enough safeguards in any given system that it doesn’t matter. A lot of what you’re describing as “database issues” problem don’t sound to me like DB issues, so much as latency issues caused by not colocating your service with your DB. By hand-rolling a DB implementation using Raft, you’ve also colocated storage with your service. > Screenshotbot runs on their CI, so we get API requests 100s of times for every single commit and Pull Request. I’m sorry, but I don’t think this was as persuasive as you meant it to be. This is the type of workload that, to be snarky about, I could run off my phone[0] |
![]() |
| Honestly SQLite is just a great option. Stored locally, so you have that fast disk access. Does great for small medium and even larger databases. And you just have a file. |
![]() |
| 1. If your entire cluster goes down do you permanently lose state?
2. Are network requests / other ephemeral things also saved to the snapshot? |
![]() |
| Cloudflare's durable objects seem similar to this article's "objects in RAM", but I think you still have to do some minimal serialization. |
The in-memory state can be whatever you want, which means you can build up your own application-specific indexing and querying functions. You could just use sqlite with :memory: for the Raft FSM, but if you can build/find an in-memory transaction store (we use our own go-memdb), then reading from the state is just function calls. Protecting yourself from stale reads or write skew is trivial; every object you write has a Raft index so you can write APIs like "query a follower for object foo and wait till it's at least at index 123". It sweeps away a lot of "magic" that normally you'd shove into a RDBMS or other external store.
That being said, I'd be hesitant to pick this kind of architecture for a new startup outside of the "infrastructure" space... you are effectively building your own database here though. You need to pick (or write) good primitives for things like your inter-node RPC, on-disk persistence, in-memory transactional state store, etc. Upgrades are especially challenging, because the new code can try to write entities to the Raft log that nodes still on the previous version don't understand (or worse, misunderstand because the way they're handled has changed!). There's no free lunch.