![]() |
|
![]() |
| > The only important differences in throughput between Intel and AMD
Not exactly related, but AMD also has a much better track record when it comes to speculative execution attacks. |
![]() |
| Yeah, it's certainly possible that it's not double-pumping. Should be roughly possible to test via comparing latency upon inserting a vandpd between two vpermd's (though then there are questions about bypass networks; and of course if we can't measure which method is used it doesn't matter for us anyway); don't have a Zen 4 to test on though.
But of note is that, at least in uops.info's data[0], there's one perf counter increment per instruction, and all four pipes get non-zero equally-distributed totals, which seems to me much simpler to achieve with double-pumping (though not impossible with splitting across ports; something like incrementing a random one. I'd expect biased results though). Then again, Agner says "512-bit vector instructions are executed with a single μop using two 256-bit pipes simultaneously". [0]: https://uops.info/html-tp/ZEN4/VPADDB_ZMM_ZMM_ZMM-Measuremen... |
![]() |
| You could put an 8700G in the same socket. The CPU isn't much faster but it has the new NPU for AI. I'm thinking about this upgrade to my 2400G but might want to wait for the new socket and DDR5. |
![]() |
| I looked at upgrading my existing AMD based system's ram for this purpose, but found out my mobo/cpu only supports 128gb of ram. Lots, but not as much as I had hoped I could shove in there. |
![]() |
| Oh you're right. I didn't read the diagram correctly and the specs say 28 total lanes, 24 usable - probably because 4 go to the chipset. |
![]() |
| Siena would make a very practical HEDT socket - it's basically half of a bergamo, 6ch DDR5/96x pcie 5.0. It's sort of an unfortunate artifact of the way server platforms have gone that HEDT has fizzled out, they're mostly just too big and it isn't that practical to fit into commodity form-factors anymore, etc. a bigass socket sp3 and 8ch was already quite big, now it's 12ch for SP5 and you have a slightly smaller one at SP6. But still, doing 1DPC in a commodity form factor is difficult, you really need an EEB sort of thing for things like GENOAD8X etc let alone 2 dimms per channel etc, which if you do like a 24-stick board and a socket you don't fit much else.
https://www.anandtech.com/show/20057/amd-releases-epyc-8004-... 2011/2011-3/2066 were actually a reasonable size. Like LGA3678 or whatever as a hobbyist thing doesn't seem practical (the W-3175X stuff) and that was also 6ch, and Epyc/TR are pretty big too etc. There used to exist this size-class of socket that really no longer gets used, there aren't tons of commercial 3-4-6 channel products made anymore, and enthusiast form-factors are stuck in 1980 and don't permit the larger sockets to work that well. The C266 being able to tap off IOs as SAS3/12gbps or pcie 4.0 slimsas is actually brilliant imo, you can run SAS drives in your homelab without a controller card etc. The Asrock Rack ones look sick, EC266D4U2-2L2Q/E810 lets you basically pull all of the chipset IO off as 4x pcie 4.0x4 slimsas if you want. And actually you can technically use MCIO retimers to pull the pcie slots off, they had a weird topology where you got a physical slot off the m.2 lanes, to allow 4x bifurcated pcie 5.0x4 from the cpu. 8x nvme in a consumer board, half in a fast pcie 5.0 tier and half shared off the chipset. https://www.asrockrack.com/general/productdetail.asp?Model=E... Wish they'd do something similar with AMD and mcio preferably, like they did with the GENOAD8X. But beyond the adapter "it speaks SAS" part is super useful for homelab stuff imo. AMD also really doesn't make that much use of the chipset, like, where are the x670E boards that use 2 chipsets and just sling it all off as oculink or w/e. Or mining-style board weird shit. Or forced-bifurcation lanes slung off the chipset into a x4x4x4x4 etc. https://www.asrockrack.com/general/productdetail.asp?Model=G... All-flash is here, all-nvme is here, you just frustratingly can't address that much of it per system, without stepping up to server class products etc. And that's supposed to be the whole point of the E series chipset, very frustrating. I can't think of many boards that feel like they justify the second chipset, and the ones that "try" feel like they're just there to say they're there. Oh wow you put 14 usb 3.0 10gbps ports on it, ok. How about some thunderbolt instead etc (it's because that's actually expensive). Like tap those ports off in some way that's useful to people in 2024 and not just "16 sata" or "14 usb 3.0" or whatever. M.2 NVMe is "the consumer interface" and it's unfortunately just about the most inconvenient choice for bulk storage etc. Give me the AMD version of that board where it's just "oops all mcio" with x670e (we don't need usb4 on a server if it drives up cost). Or a miner-style board with infinite x4 slots linked to actual x4s. Or the supercarrier m.2 board with a ton of M.2 sticks standing vertically etc. Nobody does weird shit with what is, on paper, a shit ton of pcie lanes coming off the pair of chipsets. C'mon. Super glad USB4 is a requirement for X870/X870E, thunderbolt shit is expensive but it'll come down with volume/multisourcing/etc, and it truly is like living in the future. I have done thunderbolt networking and moved data ssd to ssd at 1.5 GB/s. Enclosures are super useful for tinkering too now that bifurcation support on PEG lanes has gotten shitty and gpus keep getting bigger etc. An enclosure is also great for janitoring M.2 cards with a simple $8 adapter off amazon etc (they all work, it's simple physical adapater). |
![]() |
| It sounds like you need a desktop workstation with replaceable extension cards, and not a mostly immutable laptop, which has different strengths. |
![]() |
| Ryzen doesn't even lead in MT on laptops.
With M4, they're likely to fall even farther behind. M4 Pro/Max is likely to arrive in Fall. AMD's Strix Point doesn't seem to have a release date. |
![]() |
| as soon as someone puts 16GB of VRAM next to an SoC (Nvidia, soon), the gaming pc is dead. These hot 'n slow discrete components are fun to decorate with RGB but they're yesterday for _all_ segments. |
![]() |
| And in a way same applies to M3 vs M4 Geekbench scores. A few new instructions were added. Aside from those it's nowhere near the 25% improvement there either. |
![]() |
| Yeah, I have several GT 710's which also came in a PCIe x1 variant so I could keep the x16 slot free for something better. Glad that's no longer needed - the built-in GPU is a legit good thing. |
![]() |
| TBH, CPUs nowadays are mostly good enough for the consumer, even at mid or low tiers.
It's the GPUs that are just getting increasing inaccessible, price wise. |
![]() |
| Yes - with more and more users moving to laptops and wanting a longer battery life, raw peak performance hasn't moved much in a decade.
A decade ago, Steam's hardware survey said 8GB was the most popular amount of RAM [1] and today, the latest $1600 Macbook Pro comes with.... 8GB of RAM. In some ways that's been a good thing - it used to be that software got more and more featureful/bloated and you needed a new computer every 3-5 years just to keep up. [1] https://web.archive.org/web/20140228170316/http://store.stea... |
![]() |
| To be fair, a decade ago gaming PCs came with 2GB to 4GB of vRAM. Today's gaming PCs come with 12GB to 20GB of vRAM. Most games don't demand a lot of system memory, so it makes sense that PC gamers would invest in other components.
You're also comparing Windows x86 gaming desktops from a decade ago with macOS AppleSilicon base-spec laptops today. Steam's recent hardware survey shows 16GB as the most popular amount of RAM [1]. Not the 5x increase we've seen in vRAM, but still substantial. [1] https://store.steampowered.com/hwsurvey/Steam-Hardware-Softw... |
![]() |
| These are decent GPUs for anything other than heavy gaming. I'm driving two 4k screens with it, and even for some light gaming (such as factorio) it's completely fine. |
![]() |
| Sort of. The generation is RDNA2, but it's unlike all other RDNA2 chips because the focus is so much on energy efficiency (and the APU has its own codename, van gogh) |
![]() |
| > The CPUs are also using the previous-gen graphics architecture, RDNA2
Faster GPU is reserved for APUs. These graphics are just here for basic support. |
![]() |
| It's because x86 chips are no longer leading in the client. ARM chips are. Specifically, Apple chips. Though Qualcomm has huge potential leapfrog AMD/Intel chips in a few generations too. |
![]() |
| [If you're a laptop user, scroll down the thread for laptop Rust compile times, M3 Pro looks great]
You're misguided. Apple has excellent Notebook CPUs. Apple has great IPC. But AMD and Intel have easily faster CPUs. https://opendata.blender.org/benchmarks/query/?compute_type=... Blender Benchmark
It depends on what you're doing.I'm a software developer using a compiler that 100%s all cores. I like fast multicore.
[Edit2] Compare to: 7950x is $500 and a very fast SSD is $400, fast 64gb is $200, very good board is $400 so I get a very fast dev machine for ~$1700 (0,329 p/$ vs. mini 0,077 p/$)[Edit] Made a c&p mistake, the mini has no ultra. |
![]() |
| Geekbench has a seperate page for each "instruction set".
For Apple you need to go to https://browser.geekbench.com/mac-benchmarks Then compare numbers by hand I assume. Though what I would love is compile-time vs. $ (as mentioned, I'm a software developer). The 7950x is $500 and a very fast SSD is $400, fast 64gb is $200, very good board is $400 so I get a very fast dev machine for ~$1700. |
![]() |
| I compiled a few previously. Sorry for the formatting:
ASUS ROG Zephyrus G16 (2024) Processor: Intel Core Ultra 9 185 Memory: 32GB Cargo Build: 31.85 seconds Cargo Build --Release: 1 minute 4 seconds ASUS ROG Zephyrus G14 (2024) Processor: AMD Ryzen 8945HS / Radeon 780M Memory: 32GB Cargo Build: 29.48 seconds Cargo Build --Release: 34.78 seconds ASUS ROG Strix Scar 18 (2024) Processor: Intel Core i9 14900HX Memory: 64GB Cargo Build: 21.27 seconds Cargo Build --Release: 28.69 seconds Apple MacBook Pro (M3 Pro 11 core) Processor: M3 Pro 11 core Cargo Build: 13.70 seconds Cargo Build --Release: 21.65 seconds Apple MacBook Pro 16 (M3 Max) Processor: M3 Max Cargo Build: 12.70 seconds Cargo Build --Release: 15.90 seconds Firefox Mobile build: M1 Air: 95 seconds AMD 5900hx: 138 seconds Source: https://youtu.be/QSPFx9R99-o?si=oG_nuV4oiMxjv4F-&t=505 Javascript builds Here, Alex compares the M1 Air running Parallels emulating Linux vs native Linux on AMD Zen2 mobile. The M1 is still significantly faster. https://youtu.be/tgS1P5bP7dA?si=Xz2JQmgoYp3IQGCX&t=183 Docker builds Here, Alex runs Docker ARM64 vs AMD x86 images and the M1 Air built the image 2x faster than an AMD Zen2 mobile. https://youtu.be/sWav0WuNMNs?si=IgxeMoJqpQaZv2nc&t=366 Anyways, Alex has a ton more videos on coding performance between Apple, Intel and AMD. Lastly, this is not M1 vs Zen2 but it's M2 vs Zen4. LLVM build test M2 Max: 377 seconds Ryzen 9 7940S: 826 seconds |
![]() |
| @aurareturn really appreciate the comparison and your effort (upvoted). As I no longer use a laptop (don't need it, too expensive, breaks, no upgrades, but that is just me), browsing through that channel looks like he focuses on laptops.
Would love to see a 7950x/64gb/SSD5 comparison, perhaps (see https://www.octobench.com/ for SSD impact on Go compilation) he will create one in the future (channel bookmarked). But would I still need to use a laptop, I would probably switch back to Apple (have an iMac Pro as decoration standing in the shelf, was my last Apple dev machine). The $5000 16 Pro looks great as a machine. When still working at eBay, the nice thing was one always got the max specced machine as a developer back in the days - so that would probably be it. Real nice one. [Edit] Someone suggested looking at Geekbench Clang, which brought some insights for my desktop usage: (it looks like top CPUs are more or less the same, ~15% difference) "Randomly" picking
|
![]() |
| I think it depends on what you do. If you write a database in Go, there is no problem with 5min compile time. If you write a web app, 10 sec compile times are already annoying. |
![]() |
| Here's GB6: https://browser.geekbench.com/v6/cpu/compare/6339005?baselin...
Note: M3 Max is a 40w CPU maximum, while 7950x is a 230w CPU maximum. The stated 170w max is usually deceptive from AMD. Source for 7950x power consumption: https://www.anandtech.com/show/17641/lighter-touch-cpu-power.... Note that the M3 Max leads in ST in Cinebench 2024 and 2-3x better in perf/watt. It does lose in MT in Cinebench 2024 but wins in GB6 MT. Cinebench is usually x86 favored as it favors AVX over NEON as well as having extremely long dependency chains, bottlenecked by caches and partly memory. This is why you get a huge SMT yield from it and why it scales very highly if you throw lots of "weak" cores at it. This is why Cinebench is a poor CPU benchmark in general as the vast majority of applications do not behave like Cinebench. Geekbench and SPEC are more predictive of CPU speed. |
![]() |
| It the end, what matters is real-world performance and different workloads have different bottlenecks. For people who use Cinema 4D, Cinebench is the most accurate measurement of hardware capabilities they can get. It's very hard to generalize what will matter for the vast majority of people. I find it's best to look at benchmarks for the same applications or similar workloads to what you'll be doing. Single score benchmark like Geekbench are fun and quick way to get some general idea about CPU capabilities, but most of the time they don't match specifics of real-world workloads.
Here's a content creation benchmark (note that for some tasks a GPU is also used): https://www.pugetsystems.com/labs/articles/mac-vs-pc-for-con... |
![]() |
| Can you point me to a comparison site? Didn't find a M3/M2/7950/... comparison site for chromium compile times :-(
(Even phoronix is scares and mostly focuses on laptops - I have no laptop) |
![]() |
| It's like I keep saying: the first Chinese manufacturer to churn out cheap SBCs with ServerReady support will make a killing as a true Pi killer. Anyone? Anyone? Pine64? Pine64? |
![]() |
| Arguably that single company is ASML. There are more fabs (e.g. Intel), but AFAIK cutting-edge nodes all use ASML EUV chip fabrication machines? |
![]() |
| >Intel still fabs their own CPUs
Isn't Lunar Lake made by TSMC? Supposedly they have comparable efficiency to AMD/Apple/Qualcomm at the cost of making their fab business even less profitable |
![]() |
| Might be worth considering a Ryzen 9 5900XT (just launched as well) for a drop in upgrade. Been running a 5950X since close to launch and still pretty happy with it. |
![]() |
| The people who think PCs are already fast enough don't buy CPUs every year.
A 7950X in Eco mode is ridiculously capable for the power it pulls but that's less of a selling point. |
![]() |
| Compared to Intel, AMD seems to bin their chips a lot less. Intel have their -T chips which would be nifty if Intel weren't so far behind in terms in terms of efficiency. |
![]() |
| 4 DIMMs does not equal Quad-channel. I see in AMD's presentation Quad-channel is supported by chipsets, but I am not aware of a current AMD consumer/hedt chip with 4 memory channels. |
![]() |
| True, but the Threadripper requires the TRX50 platform, so I still don't understand what is operative about the statement that the X870E chipset supports quad channel memory. |
![]() |
| I think for a while there, the only way to be able to use 128gb was to go TR4 or TRX... I kindof stopped looking for a while, but 100+ boards is certainly a nice change. |
![]() |
| geizhals.de
Very good website to compare gadgets/electronics and look for good prices. Its for the German market though (and some small amount of Austrian shops) |
AVX512 in a single cycle vs 2 cycles is big if the clock speed can be maintained at all near 5GHz. Also doubling of L1 cache bandwidth is interesting! Possibly, needed to actually feed an AVX512 rich instruction stream I guess.