![]() |
|
![]() |
| Thanks for the link. I was motivated to write the post after reading Moffat’s book ‘Managing Gigabytes’. A pearl from the 90’s.
The authors mention this technique in the second edition. |
![]() |
| The JPEG standard ITU T.81 (1992) has a description of the algorithm in flowcharts, so the knowledge of array-based Huffman was probably already somewhat common in the 80s. |
![]() |
| > It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.
More interesting than I thought. First the adversarial answer; sure (edit: ah, I see someone else posted exactly the same!):
But it's a bad code, because we'd always be better with a=1 and b=0.The Kraft inequality gives the sets of code lengths that can be made uniquely decodable, and we can achieve any of those with Huffman coding. So there's never a reason to use a non-prefix code (assuming we are doing symbol coding, and not swapping to something else like ANS or arithmetic coding). But hmmmm, I don't know if there exists a uniquely-decodable code with the same set of lengths as an optimal Huffman code that is neither a prefix code nor one in reverse (a suffix code). If I was going to spend time on it, I'd look at https://en.wikipedia.org/wiki/Sardinas-Patterson_algorithm -- either to brute force a counter-example, or to see if a proof is inspired by how it works. |
![]() |
| It's a weird example, but what about
?It is neither prefix-free nor suffix-free. Yet every occurrence of 0 corresponds to an occurrence of b. However, this is obviously inefficient. So I guess the question is whether there's an optimal code which is neither prefix-free nor suffix-free. -------------- EDIT I did some googling and found this webpage https://blog.plover.com/CS/udcodes.html where the author gives the following example of a uniquely decodable code:
I guess this is "almost" prefix-free since the only prefix is c of d. If a message starts wiht 1, you could find the first 0 and then look at whether there's an odd or even number of 1's. So I think I can see how it's uniquely decodable. However, my crypto knowledge is too rusty to remember how to show whether this is an optimal code for some probability distribution. |
![]() |
| That code in the EDIT is suboptimal. It doesn't saturate the Kraft inequality. You could make every codeword two bits and still encode 4 symbols, so that would be strictly better. |
![]() |
| This is great! Are there any other similar tutorials going through writing a Haskell program, but with some more advanced features (monad transformers, lenses, etc) |
![]() |
| Haskell really shines when you want to write high level, declarative code. Performance when using this style is generally fine for CLI / web backend style stuff. It has the tools to write pretty fast low level code, but they’re fairly clunky and if that’s all you want to write it’s probably not going to be the best tool for the job. They’re pretty nice if you have a few focused hotspots you need to optimize.
It has pretty nice CPU profiling tools, so finding and optimizing CPU hotspots is fairly pleasant. Tracking down rouge memory leaks (which lazy evaluation makes more likely) on the other hand can be extremely frustrating. If you look at the benchmarks game results [1], the fastest haskell implementations are generally between 2 and 5 times slower than the fastest c versions, and will be written in a highly imperative style. [1]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/... |
![]() |
| You're doing extremely simple constant arithmetic here. GHC can optimize this type of thing away to nothing as well. Are we talking about contrived examples or real numerical bottlenecks? |
![]() |
| I think if your runtime performance is terrible with Haskell, either the explanation is that you might be doing something a bit crazy, or that your application is memory-constrained. |
![]() |
| Huffman codes are conceptually isomorphic to arithmetic codes where all probabilities are 2^-k with k integer, so they have an obvious disadvantage due to more inaccurate symbol distribution. |
![]() |
| To rephrase using my experience: "performance aware" Haskell is about as "fast" as Go, but needs more memory, and both are slower than the same Java code - but both are way more fun to write ;). Optimising Go is easier for most people though, in Haskell you _really_ need to know Haskell internals (how to read core and write unboxed code) and understand laziness.
My try of the 1 billion row challenge in Go and Haskell (and C) and comparison to other, fast implementations: https://github.com/Release-Candidate/1-billion-row-challenge... |
![]() |
| Arithmetic codes are not marginally better.
Asymmetric numeral systems, which is related to arithmetic coding was a real breakthrough used in all modern compressors. |
I mention this only because, when I learned the tree-based approach at uni, I simply wasn't aware that there was another way to do it, and I'm wondering how many of you that's true for as well.
While the tree approach is intuitive and illuminating, it probably makes more sense to work with in-place arrays, since the situations when you care most about compression are probably the situations when you have a lot of data and want to run fast.