![]() |
|
![]() |
| Luckily utf-8 structure is _very_ trivial compared to the average parser. Not to say there can't be bugs, but that the internal states of a parser shouldn't be large, and can be exhaustively tested. |
![]() |
| https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_er...
> Many of the first UTF-8 decoders would decode these, ignoring incorrect bits and accepting overlong results. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL, slash, or quotes. Invalid UTF-8 has been used to bypass security validations in high-profile products including Microsoft's IIS web server[26] and Apache's Tomcat servlet container.[27] RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences." |
![]() |
| > Teaching to _think_ is just as important as teaching to code, but this is seldom done
Oh, the arrogance of thinking that the other person doesn't think. |
![]() |
| It has 5 dependencies, one of which is optional, and another is serde itself: https://github.com/serde-rs/json/blob/master/Cargo.toml
I don’t think you’re measuring what you think you’re measuring when you say it has 3GB of dependencies. But I can’t say for sure because you don’t provide any evidence for it, you just declare it as true.If I were to guess, I’d say you’re doing a lot of #[derive(Serialize, Deserialize)] and it’s generating tons of code (derive does code generation, after all) and you’re measuring the total size of your target directory after all of this. But this is just a guess… other commenters have shown that a simple build produces code on the order of tens of MB… |
![]() |
| Because Rust has Cargo which makes it trivial to include dependencies.
C developers constantly reinvent because C dependency management is such a joke that no one bothers. |
![]() |
| Odd indeed.
It's also worth noting that the article has a real apples != oranges problem. Things like 'libsystemd' are not in any way comparable to your typical Rust crate. |
![]() |
| tokio and serde are certainly not the only ones. You can put almost all of my crates into that category too.
The problem with your framing is that you look at this as a "dependency addiction." But that doesn't fully explain everything. The `regex` crate is a good case study. If it were a C library, it would almost certainly have zero dependencies. But it isn't a C library. It exists in a context where I can encapsulate separately versioned libraries as dependencies with almost no impact on users of `regex`. Namely, it has two required dependencies: regex-syntax and regex-automata. It also has two optional dependencies: memchr and aho-corasick. This isn't a case of the regex crate farming out its core functionality to other projects. Indeed, it started as a single crate. And I split its code out into separately versioned crates that others can now use. And this has been a major ecosystem win: * memchr is used in all sorts of projects, and it promises to give you exactly the same implementation of substring search that the regex crate (and also ripgrep) use in your own projects. Indeed, that crate is used here! What would you do instead? If you were in C-land, you'd re-roll all of the specialized SIMD that's in memchr? For x86-64, aarch64 and wasm32 right? If you haven't done that sort of thing before, good luck. That'll be a long ramp-up time. * aho-corasick is packaged as a stand-alone Python library that is quite a bit faster than pyahocorasick: https://pypi.org/project/ahocorasick-rs/ There's tons of other projects on crates.io relying on aho-corasick specifically, separately from how its used inside of `regex`. * regex-syntax gives you a production grade regex parser. More than that, it gives you exactly the same parser used by the regex crate. People have used this for all sorts of things, including building their own regex engine without needing to re-create the parser (which is a significant simplification). * regex-automata gives you access to all of the internal APIs of the regex engine. This is all the stuff that is too complex to put into a general purpose regex library targeting the 99% use case. As far as I know, literally no other general purpose regex engine has ever attempted this because most regex engines are written in C or C++ where you'd be laughed out of the room for suggesting it because dependency management is such a clusterfuck. Yet, this has been a big benefit to other folks. The Yara project uses it for example, and the Helix editor uses it to search discontiguous strings: https://github.com/helix-editor/helix/pull/9422 (Instead of rolling your own regex engine, which is what I believe vim does.) This isn't dependency addiction. This is making use of separately versioned libraries to allow other projects to depend on battle tested components independent of their primary use case. Yet, if people repeat this kind of process---exposing internals like I did with the regex crate---then you wind up with a bigger dependency tree. Good dependency management is a trade-off. One the one hand, it enables the above to happen, which I think is an objectively Good Thing. But it also enables folks to depend on huge piles of code so easily that it actively discourages someone from writing their own base64 implementation. But as should be obvious, it doesn't prevent them from doing so: https://github.com/BurntSushi/ripgrep/blob/ea99421ec896fcc9a... Good dependency management is Pandora's box. It has been opened and it is never going to get closed again. Just looking on and calling it an addiction isn't going to take us anywhere. Instead, let's look at it as a trade-off. |
![]() |
| Moment you add something to std it's effectively died. Strong backwards guarantees will choke development of library.
Python std is where libraries go to die. |
![]() |
| I arrive at almost the same result as you, with 76MB.
I've also checked .cargo, .rustup, and my various cache folders (just in case) and haven't found any additional disk usage. OP is clearly mistaken. |
![]() |
| It deserializes a unicode string to a custom structure. Do not mistake it with C character-shuffling hello-world-programs.
Edit: s/to JSON/to a custom structure/ |
![]() |
| I’m on mobile so I can’t check at the moment. But I’d be shocked if the equivalent go binary was anywhere near as big, or took anywhere near as long to build.
I’ll check later |
![]() |
| Built in JSON encoding/decoding is one of the things I’ve enjoyed about Swift. It’s nice when it’s not necessary to shop around for libraries for common needs like that. |
![]() |
| Almost nobody is shopping around Rust JSON libraries unless they need some specific feature not provided by serde and serde_json. They are the default everyone reaches for. |
The utf-8 tricks make me very nervous since I have seen too many attacks with parser confusion. I for with serde for correctness not speed. I hope this was fuzzed all the way with a bunch of invalid utf-8 strings.