![]() |
|
![]() |
| This is solved sorting by timestamp first then by random PK UUID. Don't think a little simpler batch queries justify leaking time and quantity information or complexity of handling two types of IDs. |
![]() |
| When running a batched migration it is important to batch using a strictly monotonic field so that new rows wont get inserted in already processed range |
![]() |
| I don’t understand how that’s an issue. Do you have an example of a possible attack using UUIDv7 timestamp? Is there evidence of this being a real security flaw? |
![]() |
| The draft spec for uuid v7 has details about the security considerations : https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...
The way I see it is that uuid v7 in itself is great for some use but not for all uses. You always have to remember that a v7 always carries the id's creation time as metadata with it, whether you want it or not. And if you let external users get the v7, they can get that metadata. I'm not a security expert but I know enough to know that you should only give the minimal data to a user. My only guess is that v7 being so new, attacks aren't widespread for now, and I know why the author decided not to focus on "if UUID is the right format for a key", because the answer is no 99% of the time. |
![]() |
| How would it be fine, e.g. for e commerce which is arguably very large portion of the use cases?
You would be immediately leaking how many orders a day your business is getting with sequential id. |
![]() |
| Best solution is to have a serial identifier internally and a generated ID for external. And yes it shouldn't be a UUID as they are user-hostile, it should be something like 6-10 letters+digits. |
![]() |
| I never said it's "not going to impact anyone", I said it's not comparable.
"Denial of service" is just not a security bug on the same level as "zomg my credit card is stolen". I have argued this many times before, e.g. https://news.ycombinator.com/item?id=39377795 – It's easy to knock most sites offline regardless just by flooding it with traffic, don't need long-term plans like this (which are likely to be detected before it even has effect). > This does happen and break people I have never heard about a deliberate attack using this method. |
![]() |
| Conversely: if using bigserial for a primary key introduces crippling performance problems to your system, maybe that is a problem you need to address either way. |
![]() |
| I wrote that style of session mapping for a project long ago. It was fairly easy, but a massive pain in the ass to debug. Ended up needing to record the mappings in the backend for a period of time. |
![]() |
| That wouldn't work for integration scenarios where the other system need to store the ID so it can refer to it later. How would you provide IDs for integration purposes? |
![]() |
| there is unlikely significant performance degradation for int vs big int, but it will be huge PITA, if 10 years later and tons of legacy code written that table will grow over 4B rows.. |
![]() |
| Postgres doesn't necessarily pad to 8 bytes; it depends on the next column's type. EDB has a good writeup on this (https://www.2ndquadrant.com/en/blog/on-rocks-and-sand/), but also here's a small example:
`foo` has an INT followed by an INT, and its table size is 35 MB. `bar` has an INT followed by a BIGINT, and its table size is 43 MB; this is the same size for `baz`, despite `baz` being a BIGINT followed by a BIGINT. |
![]() |
| > But as has been pointed out, most apps are never, ever going to have close to 2 billion users, or for that matter 2 billion anything. T
all computers will be fine with 640kb of ram |
![]() |
| > cherry picked some fictional example
What an incredibly rude and dismissive accusation. Here's my table: https://github.com/arp242/goatcounter/blob/master/db/schema.... – number of IDs is actually 7, not 6. I can give a lot more details and context on all of that and why it works the way it works and the savings are certainly not insignificant and theoretical, but save me real money in server costs every month. But I have little interest in talking to you further since you're just going to insult everyone who tries to inject the slightest nuance into "always use bigserial"-extremism. So good day to you. |
![]() |
| Why “bigint generated always as identity” instead of bigserial, instead of Postgres' uuid data type?
Postgres' UUID datatype: https://www.postgresql.org/docs/current/datatype-uuid.html#D... django.db.models.fields.UUIDField: https://docs.djangoproject.com/en/5.0/ref/models/fields/#uui... : > class UUIDField: A field for storing universally unique identifiers. Uses Python’s UUID class. When used on PostgreSQL and MariaDB 10.7+, this stores in a uuid datatype, otherwise in a char(32) > [...] Lookups on PostgreSQL and MariaDB 10.7+: Using iexact, contains, icontains, startswith, istartswith, endswith, or iendswith lookups on PostgreSQL don’t work for values without hyphens, because PostgreSQL and MariaDB 10.7+ store them in a hyphenated uuid datatype type. From the sqlalachemy.types.Uuid docs: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqla... : > Represent a database agnostic UUID datatype. > For backends that have no “native” UUID datatype, the value will make use of CHAR(32) and store the UUID as a 32-character alphanumeric hex string. > For backends which are known to support UUID directly or a similar uuid-storing datatype such as SQL Server’s UNIQUEIDENTIFIER, a “native” mode enabled by default allows these types will be used on those backends. > In its default mode of use, the Uuid datatype expects Python uuid objects, from the Python uuid module From the docs for the uuid Python module: https://docs.python.org/3/library/uuid.html : > class uuid.SafeUUID: Added in version 3.7. > safe: The UUID was generated by the platform in a multiprocessing-safe way And there's not yet a uuid.uuid7() in the uuid Python module. UUIDv7 leaks timing information ( https://news.ycombinator.com/item?id=40886496 ); which is ironic because uuids are usually used to avoid the "guess an autoincrement integer key" issue |
![]() |
| Best practice in Postgres is to use always use the text data type and combine it with check constraints when you need an exact length or max length.
See: https://wiki.postgresql.org/wiki/Don't_Do_This#Text_storage Also, I think you're misunderstanding the article. They aren't talking about storing a uuid in a bigint. They're talking about have two different id's. An incrementing bigint is used internally within the db for PK and FK's. A separate uuid is used as an external identifier that's exposed by your API. |
![]() |
| That's not true, you can increment by 2, 10, 100, or any number. I'm not saying that's necessarily the best solution, but it's not true that you can't use it. |
![]() |
| It does but it allows common names, the name of the deity, common words, &c. I suspect you have to do something like an earlier poster suggested -- strip out all the vowels to start with... |
![]() |
| You really have to understand the usage patterns of a program to give a recommendation. Anyone who doesn't is just a grifter.
I don't trust anyone who gives recommendations sight unseen. |
![]() |
| Most of the comments here seem to gear towards “you must always…”. But as you rightfully point out, “it depends” and is an architectural trade off depending our the required qualities. |
![]() |
| Reconfiguring tables to use a different kind of unique ID (primary key in this context) can be a much bigger pain than an ordinary column rename if it is in use by foreign key constraints. |
![]() |
| Call me naïve, but surely you can have fuzzing tests in CI?
I’m also going to use this as yet another example of why getting rid of QA in favor of Ship It Now was a bad idea. |
![]() |
| i mean, you might care if the investors you’re trying to woo for that bridge round figure out your churn is a lot higher than you’re willing to admit… or worse, your traction is terrible. |
![]() |
| We chose ULID for our Postgres PK recently, and this article helped a lot in making that decision: https://brandur.org/nanoglyphs/026-ids
I personally prefer ULID since it is compat with a UUID type and you also get a timestamp lexicographically built into the ID so that sorting by ID also means sorting by timestamp. There are multiple PG extensions to make it easy to drop in and use. |
![]() |
| there is also a problem of data locality and blocks present in caches (page cache, buffer pool) at any given time, in general -- UUIDv4 is losing to bigint and UUIDv7 in this area |
![]() |
| What database support is needed? Assuming Postgres already has the uuid type and that you can (and often should) do the actual generation of them in application code? |
![]() |
| There’s no need since they are random, so you might as well generate them on your application servers which are easier to scale, to offload the write database. |
![]() |
| > When UUIDs become your bottleneck.
When UUIDs become your bottleneck you'll be celebrating for picking UUIDs, because now you can move to a distributed architecture and not worry about IDs. |
![]() |
| Thanks, good to hear.
If you are using PG, simply using it's native UUID type instead of char(36) seems like a no-opportunity-cost obvious optimization choice at least though, if you have a choice? |
Generally, the principles of separation of concerns and mechanical sympathy should be top of mind when designing a lasting and purposeful database schema.
Finally, since folks often say “I like stripe’s typed random IDs” in these kind of threads: Stripe are lying when they say their IDs are random. They have some random parts but when analyzed in sets, a large chunk of the binary layout is clearly metadata, including embedded timestamps, shard and reference keys, and versioning, in varying combinations depending on the service. I estimate they typically have 48-64 bits of randomness. That’s still plenty for most systems; you can do the same. Personally I am very fond of base58-encoded AES-encrypted bigserial+HMAC locators with a leading type prefix and a trailing metadata digit, and you can in a pinch even do this inside the database with plv8.