![]() |
|
![]() |
| On clang, you can actually request that it gives a warning on missed vectorization of a given loop with "#pragma clang loop vectorize(enable)": https://godbolt.org/z/sP7drPqMT (and you can even make it an error).
There's even "#pragma clang loop vectorize(assume_safety)" to tell it that pointer aliasing won't be an issue (gcc has a similar "#pragma GCC ivdep"), which should get rid of most odd reasons for missed vectorization. |
![]() |
| Reminded me how one famous Russian guy ran Atomic Heart on Elbrus 8S.
Elbrus has native translator, though, and pretty good one, afaik. Atomic Heart was kinda playable, 15-25 fps. |
![]() |
| The basics are here: https://box86.org/ It is an emulator but:
> Because box86 uses the native versions of some “system” libraries, like libc, libm, SDL, and OpenGL, it’s easy to integrate and use with most applications, and performance can be surprisingly high in some cases. Wine can also be compiled/run as native. |
![]() |
| Box64's documentation is just on installing the Wine x64 builds from winehq repos, because most arm repos aren't exactly hosting x64 software. It's even possible to run Steam with their x64 Proton running Windows games. At least on ARM, not sure about RISC-V.
Wine's own documentation says it requires an emulator: https://wiki.winehq.org/Emulation > As Wine Is Not an Emulator, all those applications can't run on other architectures with Wine alone. Or do you mean provide the x86_64 Windows API as a native RISC-V/ARM to the emulator layer? That would require some deeper integration for the emulator, but that's what Box64/box86 already does with some Linux libraries: intercept the api calls and replace them with native libraries. Not sure if it does it for wine |
![]() |
| >and more about "don't add any assembly programmer conveniences or other such cleverness, rely on compilers instead of frontend silicon when possible"
What are the advantages of that? |
![]() |
| The biggest divide is that no more than a single exception can occur in a RISC instruction, but you can have an indefinite number of page faults in something like an x86 rep mov. |
![]() |
| Great post as it is also directly applicable to invalidate the myth that the arm instruction set somehow makes the whole cpu better than analogous x86 silicon. It might be true and responsible for like 0.1% (guesstimate) of the total advantage; it's actually all RISC under the hood and both ISAs need decoders, x86 might need a slightly bigger one which amounts to accounting noise in terms of area.
c.f. https://chipsandcheese.com/2021/07/13/arm-or-x86-isa-doesnt-... |
![]() |
| > You had literal students design chips that outperformed industry cores that took huge teams and huge investment
Everyone remember to thank our trans heroine Sophie Wilson (CBE). |
![]() |
| Itanium was not in any sensible way RISC, it was "VLIW". That pushed a lot of needless complexity into compilers and didn't deliver the savings. |
![]() |
| Lets be real, its about business models. POWER was and is backed by IBM. ARM won on mobile. Does this mean POWER and ARM are better then MIPS, SPARC, PA-RISC, Am29000, i860? I don't think so. |
![]() |
| Probably true now, but in ye olde days, some instructions existed primarily to make assembly programming more convenient.
Assembly programming is a real pain in the RISCiest of RISC architectures, like SPARC. Here's an example from https://www.cs.clemson.edu/course/cpsc827/material/Code%20Ge...: • All branches (including the one caused by CALL, below) take place after execution of the following instruction. • The position immediately after a branch is the “delay slot” and the instruction found there is the “delay instruction”. • If possible, place a useful instruction in the delay slot (one which can safely be done whether or not a conditional branch is taken). • If not, place a NOP in the delay slot. • Never place any other branch instruction in a delay slot. • Do not use SET in a delay slot (only half of it is really there). |
![]() |
| Incredible result! This is a tremendous amount of work and does seem like RV is at its limits in some of these cases. The bit gather and scatter instructions should become an extension! |
![]() |
| It will be interesting to try out Box64 as soon as I get my hands on some suitable RISCV hardware. I have played with RISCV microcontrollers they're quite nice to work with. |
![]() |
| That screenshot shows 31 gb of ram which is distinctly more than the mentioned dev board at max specs. Are they using something else here? |
![]() |
| Pioneer, an older board.
Note that, today, one of the recent options with several, faster cores implementing RVA22 and RVV 1.0 is the better idea. |
![]() |
| The scalar efficiency SIG has already been discussing bitfield insert and extract instructions.
We figured out yesterday [1], that the example in the article can already be done in four risc-v instructions, it's just a bit trickier to come up with it:
[1] https://www.reddit.com/r/RISCV/comments/1f1mnxf/box64_and_ri... |
![]() |
| The handling of misaligned loads/stores in RISC-V is also can be considered a disappointing point: https://github.com/riscv/riscv-isa-manual/issues/1611 It oozes with preferring convenience of hardware developers and "flexibility" over making practical guarantees needed by software developers. It looks like the MIPS patent on misaligned load/store instructions has played its negative role. The patent expired in 2019, but it seems we are stuck with the current status quo nevertheless.
|
![]() |
| Another argument against the C extension is that it uses a big chunk of the opcode space, which may be better used for other extensions with 32-bit instructions. |
![]() |
| >2. nobody uses it on mips either, so it is likely of no use.
Sure but at the time Rust, Zig didn't exist, these two languages have a mode which detects integer overflow.. |
![]() |
| You too can run Witcher 3 equally on a minimal PC if you're willing to set the render resolution to 720p (540p undocked), settings to below minimum, and call ~30 FPS well. |
![]() |
| I want somebody to make a GPT fine tune that specializes in converting instructions and writing tests. If you made it read all x86 docs a bunch and risc v docs, a lot of this could be automated. |
I would imagine that executable size increases, meaning it has to be aggressively optimized for cache locality?
I would imagine that some types of softwares are better suited for either CISC or RISC, like games, webservers?