![]() |
|
![]() |
| Instruction duration isn't constant even within the same arch.
You cannot have branches in constant-time code.
I do wonder though how often cpu instructions have data-dependent execution times.... |
![]() |
| We had ~30 years of "undefined behaviour" practically meaning "do whatever the CPU does". It is not new that people want predictable behaviour, it simply wasn't a talking point as we already had it. |
![]() |
| You pretty much answered your own question. ~20 years ago and back. But I think it is also worth pointing out that it has gotten worse, those 20 years has been a steady trickle of new foot guns. |
![]() |
| We have ill defined behaviour, implementation defined behaviour, erroneous behaviour, unspecified behaviour, undefined behaviour.
Undefined behaviour isn't exactly what most people think it is. |
![]() |
| > Is there some reason a cryptographic algorithm developer must track the latest release of a compiler?
Tracking the latest release is important because: 1. Distributions build (most? all?) libraries from source, using compilers and flags the algorithm authors can't control 2. Today's latest release is the base of tomorrow's LTS. If the people who know most about these algorithms aren't tracking the latest compiler releases, then who else would be qualified to detect these issues before a compiler version bearing a problematic optimization is used for the next release of Debian or RHEL? > Logically, therefore, must we not also expect CPU designers to also forego changes that could alter timing behavior? Maybe? [1] > freezing all compiler development There are many, many interesting areas of compiler development beyond incremental application of increasingly niche optimizations. For instance, greater ability to demarcate code that is intended to be constant time. Or test suites that can detect when optimizations pose a threat to certain algorithms or implementations. Or optimizing the performance of the compiler itself. Overall I agree with you somewhat. All engineers must constantly rail against entropy, and we are doomed to fail. But DJB is probably correct that a well-reasoned rant aimed at the community that both most desires and most produces the problematic optimizations has a better chance at changing the tide of opinion and shifting the rate at which all must diminish than yelling at chipmakers or the laws of thermodynamics. [1]https://en.m.wikipedia.org/wiki/Spectre_(security_vulnerabil... |
![]() |
| > short of regressing to in-order, non-speculative cores
I guess you are referring to a GPU cores here. It is a joke but can hint that in-order non-speculative cores are powerful computers nonetheless. |
![]() |
| But then you're not writing C, except maybe as some wrappers. Wanting to use C isn't laziness. Making it nearly unfeasible to use C is the most suffering a C compiler can inflict. |
![]() |
| I hung out in comp.std.c, read the C FAQ (https://c-faq.com/), and yes, read the actual language spec.
For every C operation you type, ask yourself what is its "contract", that is, what are the preconditions that the language or the library function expects the programmer to ensure in order for their behavior to be well-defined, and do you ensure them at the particular usage point? Also, what are the failure modes within the defined behavior (which result in values or states that may lead to precondition violations in subsequent operations)? This contractual thinking is key to correctness in programs in general, not just in C. The consequences of incorrect code are just less predictable in C. |
![]() |
| > Every study about bugs related to UB
Are about C++. There's an order of magnitude difference in the cognitive level to visually spot UB in C code vs visually spotting UB in C++ code. |
![]() |
| You mean studies from Google, which explicitly has a culture of dumbing down software development, and heavily focuses on theoretical algorithmic skills rather than technical ones? |
![]() |
| Google hires the best developers in the world. They pay well beyond anyone else except the other big SV tech giants, who compete for the best.
I don't work for them but if money was my main motivator and they had jobs not too far from me I would totally want to.
My point is: don't pretend you're superior to them. You're very likely not, and even if you are really good, they're still about the same level as you. If you think they're doing "dumb" development, I can only think you're suffering from a very bad case of https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect , without meaning disrespect.
|
![]() |
| The linux kernel nowadays uses the fact that signed overflow is UB to detect problems using sanitizers. It turns out the defined unsigned wraparound is now the hard problem. |
![]() |
| Not all software is security relevant. I run a lot of large simulations. I do not care at all if that software would crash on specially prepared inputs. I do care that it's as fast as possible. |
![]() |
| I don't think it is unreasonable to have an option for "warn me about places that might be UB" that would tell you if it removes something it thinks is dead because it assumed UB doesn't happen? |
![]() |
| I think this is a point of view that seems sensible, but probably hasn't really thought through how this works. For example
What should the compiler emit here? Should it emit a bounds check? In the event the bounds check fails, what should it do? It is only through the practice of undefined behavior that the compiler can consistently generate code that avoids the bounds check. (We don't need it, because if `i` is out-of-bounds then it's undefined behavior and illegal).If you think this is bad, then you're arguing against memory unsafe languages in general. A sane position is the one the Rust takes, which is by default, yes indeed you should always generate the bounds check (unless you can prove it always succeeds). But there will likely always be hot inner loops where we need to discharge the bounds checks statically. Ideally that would be done with some kind of formal reasoning support, but the industry is far that atm. For a more in depth read: https://blog.regehr.org/archives/213 |
![]() |
| We do. We just wish undefined was defined to be a bit less undefined, and are willing to sacrifice a bit of performance for higher debuggability an. ability to reason. |
![]() |
| The result of a binary search is undefined if the input is not sorted.
How do you expect the compiler to statically guarantee that this property holds in all the cases you want to do a binary search? |
![]() |
| If something is good for compiler developers, it is good for compiler users, in the sense that it makes it easier for the compiler developers to make the compilers we need. |
![]() |
| I do not think there is a reason to fork. Just contribute. I found GCC community very welcoming. But maybe not come in with an "I need to take back the compiler from evil compiler writers" attitude. |
![]() |
| I don’t really think there is either, but I figured it was a funny way to present the “there never was anything to prevent you from forking in the first place” argument. |
![]() |
| > You almost may as well just design a new language, at that point.
Forget “almost”. Go compile this C code:
This is UB. And it has nothing whatsoever to do with optimizations — any sensible translation to machine code is a use-after-free, and an attacker can probably find a way to exploit that machine code to run arbitrary code and format your disk.If you don’t like this, use a language without UB. But djb wants something different, I think: a way to tell the compiler not to introduce timing dependencies on certain values. This is a nice idea, but it needs hardware support! Your CPU may well implement ALU instructions with data-dependent timing. Intel, for example, reserves the right to do this unless you set an MSR to tell it not to. And you cannot set that MSR from user code, so what exactly is a compiler supposed to do? https://www.intel.com/content/www/us/en/developer/articles/t... |
![]() |
| It isn't just UB to dereference `ptr` after `free(ptr)` – it is UB to do anything with its value whatsoever. For example, this is UB:
Why is that? Well, I think because the C standard authors wanted to support the language being used on platforms with "fat pointers", in which a pointer is not just a memory address, but some kind of complex structure incorporating flags and capabilities (e.g. IBM System/38 and AS/400; Burroughs Large Systems; Intel iAPX 432, BiiN and i960 extended architecture; CHERI and ARM Morello). And, on such a system, they wanted to permit implementors to make `free()` a "pass-by-reference" function, so it would actually modify the value of its argument. (C natively doesn't have pass-by-reference, unlike C++, but there is nothing stopping a compiler adding it as an extension, then using it to implement `free()`.)See this discussion of the topic from 8 years back: https://news.ycombinator.com/item?id=11235385 > And you cannot set that MSR from user code, so what exactly is a compiler supposed to do? Set a flag in the executable which requires that MSR to be enabled. Then the OS will set the MSR when it loads the executable, or refuse to load it if it won't. Another option would be for the OS to expose a user space API to read that MSR. And then the compiler emits a check at the start of security-sensitive code to call that API and abort if the MSR doesn't have the required value. Or maybe even, the OS could let you turn the MSR on/off on a per-thread basis, and just set it during security-sensitive processing. Obviously, all these approaches require cooperation with the OS vendor, but often the OS vendor and compiler vendor is the same vendor (e.g. Microsoft)–and even when that isn't true, compiler and kernel teams often work closely together. |
![]() |
| This is pretty persnickety and I imagine you're aware of this, but free is a weak symbol on Linux, so user code can replace it at whim. Your foo cannot be statically determined to be UB. |
![]() |
| Or writing code that relies on inlining and/or tail call optimization to successfully run at all without running out of stack... We've got some code that doesn't run if compiled O0 due to that. |
![]() |
| It is, in fact, pretty hard as evidenced by how often programmers fail at it. The macho attitude of "it's not hard, just write good code" is divorced from observable reality. |
![]() |
| Some jurisdictions also set the speed limit at, e.g., the 85th percentile of drivers' speed (https://en.wikipedia.org/wiki/Speed_limit#Method) so some drivers are always going to be speeding.
(I'm one of those speeders, too; I drive with a mentality of safety > following the strict letter of the law; I'll prefer speed of traffic if that's safer than strict adherence to the limit. That said, I know not all of my peers have the same priorities on the road, too.) |
![]() |
| People write buffer overflows because and memory leaks they are not coreful. The rest of ub are things I have never seen despite running sanitizers and a large codebase. |
![]() |
| I am referring to undefined behavior.
For example, consider the case integer overflow when adding two signed numbers. C considers this undefined behavior, making the program's behavior is undefined. All bets are off, even if the program never makes use of the resulting value. C compilers are allowed to assume the overflow can never happen, which in some cases allows them to infer that numbers must fit within certain bounds, which allows them to do things like optimize away bounds checks written by the programmer. A more reasonable language design choice would be to treat this as an operation that produces and unspecified integer result, or an implementation-defined result. Edit: The following article helps clear up some common confusion about undefined behavior: https://blog.regehr.org/archives/213 Unfortunately this article, like most on the subject, perpetuates the notion that there are significant performance benefits to treating simple things like integer overflow as UB. E.g.: "I've heard that certain tight loops speed up by 30%-50% ..." Where that is true, the compiler could still emit the optimized form of the loop without UB-based inference, but it would simply have to be guarded by a run-time check (outside of the loop) that would fall back to the slower code in the rare occasions when the assumptions do not hold. |
![]() |
| It would also be nice if hardware would trap on signed integer overflow. Of course since the most popular architectures do not, new architectures also do not either. |
![]() |
| For example the recent realloc change in C23. I was surprised the previously used behaviour, even if inconsistent across implementations, was declared UB. Why not impdef? |
![]() |
| According to some comments under this submission, even x86 assembly isn't suitable, or only under specific circumstances that are generally not available in userspace. |
![]() |
| At this time, the idea of a constant-time operation embedded into a language’s semantics is not a thing. Similar for CPU architectures. Our computing base is about being fast and faster. |
![]() |
| The version of this that I want to see is a CPU that gives you a core that doesn't have caches or branch prediction on which you can write custom code without having to worry about timing attacks. |
![]() |
| Yes, but that does not change the fact that compilers writers have control of the standard, have had that control since probably C99, and have introduced new UB along with pushing the 00UB worldview. |
![]() |
| Sigh...yes, I don't want any UB where it's not necessary.
But if you must have a concrete example, how about realloc? In C89 [1] (page 155), realloc with a 0 size and a non-NULL pointer was defined as free: > If size is zero and ptr is not a null pointer, the object it points to is freed. In C99 [2] (page 314), that sentence was removed, making it undefined behavior when it wasn't before. This is a pure example of behavior becoming undefined when it was not before. In C11 [3] (page 349), that sentence remains gone. In C17 [4] (page 254), we get an interesting addition: > If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated. If the old object is not deallocated, its value shall be unchanged. So the behavior switches from undefined to implementation-defined. In C23 [5] (page 357), the wording completely changes to: > ...or if the size is zero, the behavior is undefined. So WG14 made it UB again after making implementation-defined. SQLite targets C89, but people compile it with modern compilers all the time, and those modern compilers generally default to at least C99, where the behavior is UB. I don't know if SQLite uses realloc that way, but if it does, are you going to call it buggy just because the authors stick to C89 and their users use later standards? [1]: https://web.archive.org/web/20200909074736if_/https://www.pd... [2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf [3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf [4]: https://web.archive.org/web/20181230041359if_/http://www.ope... [5]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf |
![]() |
| If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame. This is just standard backwards incompatibility, nothing about UB (in other languages requiring specific compiler/language versions is routine). Problems would arise even if it was changed from being a defined 'free(x)' to being a defined 'printf("here's the thing you realloc(x,0)'d: %p",x)'. (whether the C standard should always be backwards compatible is a more interesting question, but is orthogonal to UB)
I do remember reading somewhere that a real platform in fact not handling size 0 properly (or having explicitly-defined behavior going against what the standard allowed?) being an argument for changing the standard requirement. It's certainly not because compiler developers had big plans for optimizing around it, given that both gcc and clang don't: https://godbolt.org/z/jjcGYsE7W. and I'm pretty sure there's no way this could amount to any optimization on non-extremely-contrived examples anyway. I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases. |
![]() |
| It specifically says that the use of C as a "portable assembler" is a use that the standards committee does not want to preclude.
Not sure how much clearer this can be. |
![]() |
| Back in 1989, when C abstract machine semantics were closer to being a portable macro processor, and stuff like the register keyword was actually something compilers cared about. |
But that's not an excuse for having a bug; it's the exact evidence that it's not a bug at all. Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.