![]() |
|
![]() |
| OCR-A looks cool, but my above post isn't saying I want the menus in a machine-readable font, but rather I want the human-relevant parts of the lookup to be both human-readable and machine-readable. |
![]() |
| You misinterpret - I am not saying that what you experienced was technically impossible (you saw a URL, later recreating a valid URL at least partially), but that your interpretation of what happened is neither practical nor possible in the general case, nor is it necessary. Let's try with an example:
You see a poster. "Samsung Galaxy, for real this time. Tune in on 1st of April to watch the ceremony live where we subjugate the last planet in the Milkyway.". Scenario 1: The poster has a QR code, and a URL: https://events.samsung.com/press-room/world-domination. You memorize the hostname (events, samsung, com), and half a day later you manage to pull up a page for events, select the intended one, and get to the live feed. Scenario 2: The poster has a QR code, no URL. You memorize "samsung galaxy", or "samsung event", or even just "samsung", and half a day layter you type this into your browser's address bar, which gives you as the first result the live feed you were looking for, or at the very least to samsung's event page. "memorizable URLs" is not human-readable information, but computer-readable information constructed with certain rules to mimic human-readable information - e.g., the company name mangled to fit URL syntax. The original, unmangled information is easier to remember. |
![]() |
| Just because something is OCRable doesn't make it structured data that can be used immediately. A table at a restaurant might have a QR code that takes me to a menu with the table number already encoded and pre-entered into the order page ready to go. An OCRable table number does not give me that, and an OCRable URL like https://fragmede.restaurant/menu?table=42 might work for HNers, but most humans won't recognise and understand their table number when going up to the bar to order.
|
![]() |
| "Fragmede.menu" costs $35 a year, which is roundoff error cost for a restaurant, and is a short-enough domain for a customer who wants to view a menu and order. No need for the "https://" which is implied. Adding a "?table=42" could be optional but isn't necessary, as the website in addition to simply presenting the menu could provide a means to order and if so have a little html input box when ordering to put their table number or whether it is pickup.
|
![]() |
| Scanning a QR code is far faster than typing a URL, and you need some sort of a computer to access the URL anyway, so providing a human-readable URL doesn't achieve anything. |
![]() |
| Agreed. I like how most 1D barcodes have human-readable numbers/text printed under the barcode. For example, think of UPC barcodes on retail products. Not many 2D barcodes respect this convention. |
![]() |
| A normal UPC barcode, as the parent said. The data is read along the horizontal dimension. The only reason the lines extend in the vertical direction is to make them easier to scan. |
![]() |
| I've seen QR codes to join a discord that have text under them that looks like
which are just fine to scan or type in.My own 'discovery' about QR codes a few years is that you can make them "module 2" sized that ought to be easy to read with a low-spec system and have astronomical capacity if you use uppercase characters, a reasonably short domain and identifiers similar to random UUIDs. These were part of the system of "three-sided cards" https://mastodon.social/@UP8/111013706271196029 but new-style cards put the QR code in front because (1) I have a huge amount of glossy paper that I can't print on the back of, (2) you can't read the QR code on the back if the card is stuck to the wall with mounting putting, (3) three-sided cards struggled with branding in that people didn't really understand the affordances they offered, a problem that the new-style cards attack in various ways. https://mastodon.social/@UP8/113541119391897096 (Note the QR codes on both of those cards do not point at safebooru but at a redirect that I control that fits my QRL specification) Personally I don't think any QR code for the web should ever require more than a "module 2" QR code and that printing a QR code which requires extra alignment markers is a sign of failure. (e.g. sure you can make a QR-code with 2000 bytes of form data embedded in it, but should you? Random UUIDs are so numerous and redirects so cheap that every new-style card like that Yakumo Ran card has a unique id because with inkjet printing it doesn't cost anything more) |
![]() |
| Not very useful, as it doesn't support alphanumeric mode.
> For our code, encoding mode is Alphanumeric (2), but we haven't implemented how to read that yet. Sorry! Try another QR code! |
![]() |
| Thanks, this looks like an awesome tool. I wish more web explainers took this "explain your work" approach. It's great that it uses the content you put in to illustrate the step-by-step breakdown. |
![]() |
| Some discussion happened here about that when it was in draft: https://news.ycombinator.com/item?id=27628178
tl;dr: It is sadly not the most efficient encoding (and they missed an opportunity to make it actually base41, which could have been URL safe) -- as defined it only needs 41 characters (as 41^3 > 2^16). The RFC is also not standards track, it's just "Category: Informational". I think a better approach is to understand there are many circumstances where different sets of characters make sense for encoding data. There's no need to write an RFC, instead define a custom alphabet for them, using something like base-x[1]. |
![]() |
| Yeah, use a bignum. QR codes are not very big.
And while 3% overhead is fine, encoding decimal digits in groups of 3 is only 0.3% plus a couple bits to round up to the nearest digit. |
![]() |
| > Base45 does not use all 45 characters
Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric, which just so happens to be a (generic) base 45 character set. For QR code, two characters are encoded into 11 bits. >in every slot. I've worked with the QR code standards pretty seriously and I am unfamiliar with the term "slots" being used by the standards. This is why I suspect your referring specifically to RFC base45 (although the term isn't used there either), which QR code doesn't care about. I also don't care about RFC Base 45 and would prefer to use a more bit space efficient method, such as using the iterative divide by radix method, which I also call "natural base conversion". > base45 takes 32 source bits For QR code alphanumeric, 6 characters use 33 bits, not 32. way to calculate efficiency The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry. On the matter, I thought I had read "the iterative divide by radix algorithm" in industry, but after searching it turns out to be a term novel to our work. This is also similar to the way Shannon originally calculated entropy and appears to be a fundamental representation of information. Of course log is useful, but it often results in partial bits or rounding, 5.5 in the case of alphanumeric, which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with. Granted, in all of this, when I have done the math (and I done a lot of math on this particular issue) there appeared to be some very extreme edge cases at the end result of the QR code where some arbitrary data encoded into QR numeric was slightly more efficient than alphanumeric, but overall alphanumeric was more efficient almost all the time. There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped. For more detail of my work, my BASE45 predates the RFC by 2 years in 2019, then I published a base 45 alphabet, BASE45, by March 1, 2020, a whole year before the RFC. A patent including BASE45 was submitted June 22, 2021: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa... Matter of fact, because of the issues and confusion surrounding base conversion, I wrote this tool in 2019: It is the first arbitrary base conversion tool on the web. It also was essential for our work with QR code and other base conversion issues. |
![]() |
| For entropy codes to be effective for such short strings you need a shared initial probability table. And if you have that you are effectively back at special encoding modes for each character set. |
![]() |
| I did a lot of trial and concluded the same: for non .COM address we had to use the full HTTPS:// prefix otherwise iOS won't open it. On Android it opens any TLDs, even unusual ones like .MD or .NZ. |
![]() |
| > The linked "byte mode" table only has 45 individual chars.
No, that link is for alphanumeric mode, which uses 5.5 bits per character (45 * 45 = 2025 <= 2048, so it fits in 11 bits). |
![]() |
| This reminds me of a silly nerd lie I like to tell non-technical people: "Did you know that capital letters take up 4 more bits of storage than lowercase letters?" |
When QR codes first came out I thought it was really cool. But then re-entering meatspace after the pandemic I was honestly saddened to see so many in-person venues start using QR codes. QR codes are machine-readable, but they sure aren't human-readable, but why can't we have both? For instance, plain text using a low-pixel font with a dotted line underneath for error-correction and alignment.