Building Meta's GenAI infrastructure

danielhanchen · 2024-03-12T17:18:22

float8 got a mention! x2 more FLOPs! Also xformers has 2:4 sparsity support now so another x2? Is Llama3 gonna use like float8 + 2:4 sparsity for the MLP, so 4x H100 float16 FLOPs? Pytorch has fp8 experimental support, whilst attention is still complex to do in float8 due to precision issues, so maybe attention is in float16, and RoPE / layernorms in float16 / float32, whilst everything else is float8?

GamerAlias · 2024-03-12T17:29:29

I was thinking why is this one guy on HN so deeply interested and discussing technical details from a minor remark. Then I clocked the name. Great work on Gemma bugs

danielhanchen · 2024-03-12T17:59:19

Oh thanks :) I always like small details :)

andy99 · 2024-03-12T19:18:49

Is there float8 support in any common CPU intrinsics? It sounds interesting but curious what will be the impact if any on CPU inference.

ashvardanian · 2024-03-12T22:35:48

Nope. Moreover, simulating it even with AVX-512 is quite an experience. Been postponing it for 2 years now... But first of all, you need to choose the version of float8 you want to implement, as the standards differ between GPU vendors.

janwas · 2024-03-13T02:03:31

We use it in gemma.cpp [1]. This hybrid of E5M2 and E4M3 decodes to bf16 in ~14 instructions, so we can do that on the fly during dot products.

[1]: github.com/google/gemma.cpp

danielhanchen · 2024-03-13T02:35:24

Congratulations on gemma.cpp!!

teaearlgraycold · 2024-03-13T05:37:41

I’m curious if there’s a meaningful quality difference between float8 and some uint8 alternative (fixed precision or a look up table).

CraigJPerry · 2024-03-13T07:48:46

A LUT could be a significant performance penalty would it not? Instead of a float8 (potentially multiple in simd case) in a register, you’re now having to head out to at least L1 cache to dereference the value in the LUT.

Plain uint8 wouldn’t allow for the same accuracy range as float8 and it’s the accuracy not the precision (which uint would win for the largest values it can represent) that counts most.

danielhanchen · 2024-03-13T08:00:04

Oh oh was just gonna comment as well, but saw this! I think x86 has like pshufb for LUTs (used them like ages ago, but forgot now :() I think also some game (was it Spiderman) used loads of lookup tables.

The issue with LUTs is don't you have to update the LUT itself? You can select which memory address to load up, but the LUT itself has to be differentiable maybe? TBH I'm not an expert on LUTs.

On fixed point - similarly ye you have to fix the precision ranges as well, so again I'm unsure on how one changes the fixed point numbers over time. I'll have to read more on fixed point.

Maybe 1.58bit using (-1, 0, 1) which gets rid of multiplications and just additions might be more useful, although you'll only get a 2x FLOP boost since you still need fp8 or fp16 addition.

protomolecule · 2024-03-13T09:02:54

>I think x86 has like pshufb for LUTs

There is also VPERMI2B [0] which operates on a 128 byte LUT.

[0] https://en.wikichip.org/wiki/x86/avx512_vbmi

danielhanchen · 2024-03-13T09:12:23

Oh I forgot about that!! But ye LUTs are very interesting and fascinating :) One of the hidden gems of CPU optimizations :)

ipsum2 · 2024-03-12T19:44:05

You're still bounded by memory bandwidth, so adding multiples to FLOPs is not going to give you a good representation of overall speedup.

jabl · 2024-03-12T19:49:38

Well, those smaller floats require less BW to transfer back and forth as well. Perhaps not a reduction linear in the size of the float, as maybe smaller floats require more iterations and/or more nodes in the model graph to get an equivalent result.

But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!

andy99 · 2024-03-12T21:10:20

The impact on bandwidth is the main reason smaller is better I belive, certainly when it's the bottleneck. I'm only really familiar with CPU but with say FP16 you might convert back to FP32 when you're doing the actual multiplication (so conversion plus multiplication is actually slower) but because you're moving half the data in and off you still get a huge speedup.

danielhanchen · 2024-03-13T02:34:55

I can't remember some research paper somewhere even if you do float32 multiplications, but keep the data in bfloat16 by just simply truncating the lower mantissa bits, and doing packing, you still get speedups, since matrix multiplication is bound both by compute and cache access. If you can optimize on the cache side of things, speedups are definitely there.

danielhanchen · 2024-03-13T02:33:18

I'm not sure exactly on how NVIDIA calculates FLOPs, but I do know for Intel's FLOPs, it's calculated from how many FMA units, how many loads can be done in tandem, and what the throughput is. And ye fp8 requires 2x less space. Sparse 2:4 might be less pronounced, since the matrix first needs to be constructed on the fly, and there is like a small matrix of indicator values.

boywitharupee · 2024-03-13T01:16:59

care to explain why attention has precision issues with fp8?

danielhanchen · 2024-03-13T01:39:43

Oh so float8's L2 Norm from float32 is around I think 1e-4, whilst float16 is 1e-6. Sadly attention is quite sensitive. There are some hybrid methods which just before the attention kernel which is done in fp8, upcasts the Q and K from the RoPE kernel to become float16, then also leaves V to be in float8. Everything is done in fp8 on the fly, and the output is fp8. This makes errors go to 1e-6.

alecco · 2024-03-13T10:28:54

Yes, but it's a bit more complicated. There are 2 FP8 formats: E5M2 and E4M3.

E5M2 is like an IEEE 754. But to compensate the smaller exponent, "E4M3’s dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs".

Some people reported E4M3 is better for the forward pass (small range, more precision) and E5M2 is better for the backward pass (bigger range, less precision). And most implementations have some sort of scaling or other math tricks to shrink the error.

[0] FP8 Formats for Deep Learning (Nvidia/ARM/Intel) https://arxiv.org/abs/2209.05433

danielhanchen · 2024-03-13T11:04:22

Fair points! Ye Pytorch's fp8 experimental support does scaling of the gradients. Interesting point on a larger range for the forward pass, and a small range for the gradients! I did not know that - so learnt something today!! Thanks! I'll definitely read that paper!

j45 · 2024-03-12T22:47:57

Is it safe to assume this is the same float16 that exists in Apple m2 chips but not m1?

j45 · 2024-03-13T01:17:57

Clarification: bfloat16

“bfloat16 data type and arithmetic instructions (AI and others)”

https://eclecticlight.co/2024/01/15/why-the-m2-is-more-advan...

dougdonohoe · 2024-03-12T23:35:06

Having lived through the dot-com era, I find the AI-era slightly dispiriting because of the sheer capital cost of training models. At the start of the dot-com era, anyone could spin up an e-commerce site with relatively little infrastructure costs. Now, it seems, only the hyper-scale companies can build these AI models. Meta, Google, Microsoft, Open-AI, etc.

herval · 2024-03-13T03:53:40

I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time

Jensson · 2024-03-13T11:08:40

> I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

To make your own competing LLM today you need hundreds of millions of dollars, the "very expensive" of this is on a whole different level. You could afford the things you talked about on a software engineering salary, it would be a lot of money for that engineer but at least he could do it, no way anyone but a billionaire could fund a new competing LLM today.

renegade-otter · 2024-03-13T00:01:01

Not everything has to be AI. You can run a small business infra for MUCH less than you did back then, especially if you adjust for inflation (!).

Training AI models costs a fortune, but so far it's been just front-loading costs in hopes of a windfall. We'll see what actually happens.

boringg · 2024-03-13T02:44:24

Front loading costs to eventually extract rents on usage with one hell of a capital wall protecting the assets.

Its easier to spin up a business for sure -- also easier to unwind it - there not as sticky as they used to be.

whatshisface · 2024-03-13T03:00:27

If the government can stay back far enough that more than one AI company can train their models, it will end up working like steel mills - barely enough profit to pay the massive cost of capital due to competition. If the government regulates the industry into a monopoly, all bets are off. Their investors are going to push hard for shutting the door behind them so watch out.

The only question is - what tactic? I don't really know, but one trick I am aware of is "specifying to the vendor." In other words, the introduction of regulatory requirements that are at every step in the process a description of the most favored vendor's product. As the favored players add more features, potentially safety features, those features are required in new regulations, using very specific descriptions that more or less mandate that you reproduce the existing technology, to use a software engineer's term, bug-for-bug. If your product is better in some ways but worse in others, you might have a chance in the market - but to no avail, if the regulations demand exactly the advantages of the established suppliers.

brookst · 2024-03-13T03:19:12

This is typically called a high fixed cost business, like airlines, hotels/apartments, SpaceX, etc.

The dream may be barriers to entry that allow high margins (“rents” if you prefer the prejudicial), but all too often these huge capital costs bankrupt the company and lose money for investors (see: WeWork, Magic Leap). It is high risk, high return. Which seems fair.

tdudhhu · 2024-03-13T10:41:48

As far as I know training is the main issue.

I don't know a lot about ML. Does anyone know if it is possible to keep training the system while it is running?

That would help a lot if you don't have the possibility to use huge training sets as a starting point.

xdeepak81 · 2024-03-13T11:20:30

Ads and Search engine uses a continuous incremental training to add the new relevant information.

andy99 · 2024-03-12T23:40:39

So far it's been pretty "democratic" - I feel in no way disadvantaged because I can't train a foundation model myself. Actually the ecosystem is a lot better than 25 years ago - there are open source (or source available) versions of basically everything you'd want to participate in modern AI/ML.

mewpmewp2 · 2024-03-12T23:57:44

But none of those are remotely as good as GPT4 for example.

to11mtm · 2024-03-13T00:07:15

Mixtral?

ametrau · 2024-03-13T01:04:52

Obviously not even close

rmbyrro · 2024-03-13T10:57:05

Fine-tuning is quite accessible for the average small business or hacker, though.

richardw · 2024-03-13T07:05:22

I find the market way more open and competitive than dot-com. Everyone is throwing up a chatbot or RAG solution. There are tradesmen and secretaries and infinite 19 year olds who are now able to wire together a no-code app or low-code bot and add value to real businesses. The hyper scalars are making some money but absolutely don't have this locked up. Any Groq or Mistral could wander in and eat their lunch, and we haven't really started the race yet. The next decade will be ridiculous.

danielhanchen · 2024-03-13T04:20:30

Another way to compete with the big tech incumbents is instead of hardware, try maths and software hacks to level the playing field! Training models is still black magic, so making it faster on the software side can solve the capital cost issue somewhat!

toxik · 2024-03-13T05:58:13

This kind of research is also incredibly capital intensive. You have to pay some of the smartest people around to work in it.

djhn · 2024-03-13T09:37:49

That's labour and human capital intensive, not capital intensive. And I don't mean this as a technically correct nitpick: in terms of economics it's more accurate to call it the exact opposite of capital intensive.

toxik · 2024-03-13T10:17:38

That’s a good point, I wanted to make the point that doing the research is also incredibly expensive because it requires some of the smartest people around, and the right background (and what even is that background?)

danielhanchen · 2024-03-13T11:11:10

Ye not a bad point - also agree with djhn on stuff.

It's true it'll still be relatively expensive - but I would propose its relatively inexpensive if people want to make it faster, and have the drive to do it :) On the other hand, capital expenditures requires large amounts of money, which also works.

I guess some general CUDA, some maths, knowing how to code transformers from scratch, some Operating systems and hardware knowledge, and the constant drive to read new research papers + wanting to make things better.

I just think as humans, if you have drive, we can do it no matter the contraints!

djhn · 2024-03-13T10:28:50

Yes, I agree with the general idea that it's not easy. Yet at least to some extent it might allow people and/or nations with (some degree of, relative) lack of capital but high levels of education and innovation to benefit and catch up.

danielmarkbruce · 2024-03-12T23:39:07

It's not quite the same thing. A model is just one part of a product. You can spin up a product with zero infra and calling APIs hosting models.

mindwok · 2024-03-13T02:04:47

We will probably get there, it's just going to take time for hardware supply chains to catch up. I feel it's more comparable to mainframe eras - it took time for general purpose computing to become commoditised.

hackerlight · 2024-03-13T05:30:28

Foundation models != application layer. The question is whether the application layer's lunch will be eaten by better foundation models.

islewis · 2024-03-12T16:41:29

I know we won't get it this from FB, but I'd be really interested to see how the relationship of compute power to engineering hours scales.

They mention custom building as much as they can. If FB magically has the option to 10x the compute power, would they need to re-engineer the whole stack? What about 100x? Is each of these re-writes just a re-write, or is it a whole order of magnitude more complex?

My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?

bilekas · 2024-03-12T16:50:03

I'm not 100% sure but I would.make an educated guess that that cluster in the first image for example is a sample of scalable clusters, so throwing more hardware at it could bring improvements but sooner or later the cost to improvements will call for an optimization or rewrite as you call it, so a bit of both usually. It seems a bit of a balancing act really!

jvalencia · 2024-03-12T23:22:05

The cost of training quickly outpaces the cost of development as context length increases. So hardware is cheap until it isn't anymore, by orders of magnitude.

samstave · 2024-03-12T23:27:58

But there is still significant cost in the physical buildouts of new pods/DCs, whatever and the human engineering hours to physically build, even though its a mix of resources across the vendors and FB? - it still would be interesting to know man hours into the physical build of the HW.

tintor · 2024-03-12T18:32:58

"just a re-write"

mirekrusin · 2024-03-12T19:04:18

...the idea is that at some point it "just re-writes" itself.

ametrau · 2024-03-13T01:06:32

The day after that, we have true AGI.

jvanderbot · 2024-03-12T19:34:15

So, I'd love to work on optimizing pipelines like this. How does one "get into" it? It seems a ML scientist with some C/C++ and infra knowledge just dips down into the system when required? Or is it CUDA/SIMD experts who move "up" into ML?

thegginthesky · 2024-03-12T23:43:27

I know someone who works on this in Meta. His resume is computer science heavy, with a masters in Machine Learning. On the previous experience side, before getting into Meta, he had about a decade working as a Software Engineer with Machine Learning system in multiple languages, such as Go, C++ and Python.

To get the job he applied for a spot I'm Software Engineer applied in Machine Learning, he went through the multiple step interview process, and then when he got the job he did a few weeks of training and interviewing teams. One of the teams in charge of optimizing ML code in Meta picked him up and now he works there.

Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.

In sum:

- Get a formal education in the area - Get work experience somewhere - Apply for a big tech job in Software Engineer applied with ML - Hope they hire you and have a spot in one of the teams in charge of optimizing stuff

jvanderbot · 2024-03-13T00:52:35

This is helpful thank you. There's always some luck.

I have a PhD in CS, and lots of experience in optimization and some in throughput/speedups (in an amdahl sense) for planning problems. My biggest challenge is really getting something meaty with high constraints or large compute requirements. By the time I get a pipeline set up it's good enough and we move on. So it's tough to build up that skillset to get in the door where the big problems are.

KaiserPro · 2024-03-12T20:18:39

A lot of the optimisation at this level is getting data into the right place at the right time, without killing the network.

Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.

So you need a good scheduler, that understand dependencies (no, the k8s scheduler(s) are shit for this, plus it wont scale past 1k nodes without eating all of your network bandwidth), then you need a dataloader that can provide the dataset access, then you need the IPC that allows sharing/joining of GPUs together.

all of that needs to be wrapped up into a python interface that fairly simple to use.

Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.

the model stuff is the cherry on the top

claytonjy · 2024-03-12T23:15:39

can you say more about the network issues with thousands of k8s nodes? I'm regularly running 2-3000 nodes in a GKE cluster, majority have GPUs, is this something I need to be worrying about?

KaiserPro · 2024-03-13T10:40:50

Only if you are paying for the network bandwidth. for example if there are nodes spanning more than one zone, and you pay for that traffic, you might want to think about moving stuff to a single zone.

For other settings, moving to something like opencue might be better (caveats apply)

jvanderbot · 2024-03-12T21:10:58

Ok, but back to my main question, how do I get into this?

willsmith72 · 2024-03-12T21:38:40

It looks more like an infra problem than ML. "Software architect"s mixed with devops/infra/sre people

jvanderbot · 2024-03-12T21:48:00

Well since I'm not a ML engineer of any kind - that's good!

zooq_ai · 2024-03-12T22:24:28

at the end of the day, you are still moving, storing and manipulating 1's and 0's, whether you are a front end engineer or a backend engineer or systems engieer or an ML engineer or an infra engineer

elbear · 2024-03-13T06:48:25

yeah, but how do you get the hiring managers to see things in the same way? :)

zooq_ai · 2024-03-13T11:29:24

well at least I fit my resume to match the 'job description' because at the end of the day it's all hallucinations and 'real' software engineers that has core computer science skills can literally do anything

chillee · 2024-03-13T01:48:15

I work on PyTorch Compilers at Meta, and I think folks enter ML Systems from all directions :)

Some folks start with more familiarity in ML research and dip down as far as they need.

Other folks come from a traditional distributed systems/compilers/HPC background, and apply those skills to ML systems.

gajjanag · 2024-03-13T00:06:46

Our group works on some of this stuff at Meta, and we have a pretty good diversity of backgrounds - high performance computing (the bulk), computer systems, compilers, ML engineers, etc. We are hiring.

Feel free to DM me to learn more.

jvanderbot · 2024-03-13T00:54:24

I will, thank you. Any info is very helpful.

yalok · 2024-03-13T00:49:29

start with something small - take some kernel function in C, and try to optimize it for your laptops assembly SIMD instruction set.

fuddle · 2024-03-12T17:23:44

How much are they paying for H100's? If they are paying $10k: 350,000 NVIDIA H100 x $10k = $3.5b

trsohmers · 2024-03-12T18:22:46

Significantly more than that; MFN pricing for NVIDIA DGX H100 (which has been getting priority supply allocation, so many have been suckered into buying them in order to get fast delivery) is ~$309k, while a basically equivalent HGX H100 system is ~$250k, coming to a price per GPU at the full server level being ~$31.5k. With Meta’s custom OCP systems integrating the SXM baseboards from NVIDIA, my guess is that their cost per GPU would be in the ~$23-$25k range.

fuddle · 2024-03-12T19:16:18

350,000 NVIDIA H100 x $23k = $8b :0

verticalscaler · 2024-03-12T19:26:40

Wait till you find out how much they spent on VR.

It is a real loophole in the economy. If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.

Sort of reminds me of The Producers.

oblio · 2024-03-12T19:31:49

The thing is, this could be considered basic research, right? Basic research IS setting money on fire until (and if) that basic research turns into TCP/IP, Ethernet and the Internet.

verticalscaler · 2024-03-12T19:38:19

I wish.

Funnily enough Arpanet and all that Xerox stuff were like

Where as I think this more appropriately can be considered the meta PR budget. They simply can't not spend it, would look bad for Wall Street. Have to keep up with the herd.

throwaway2037 · 2024-03-13T05:45:55

    > Funnily enough Arpanet and all that Xerox stuff were like

That doesn't say much. The industry was in utter infancy. How much do you think it cost to move Ethernet from 100Mbit/sec to 1GBbit/sec to 10GB to 100GB to 400GB to 800GB? At least one or two orders of magnitude.

How about the cost to build a fab for the Intel 8088 versus a fab that produces 5nm chips running @ 5GHz. Again, at least one or two orders of magnitude.

lotsofpulp · 2024-03-12T22:17:09

> If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.

You don’t think earning increasing amounts of tens of billions of dollars in net income per year at some of the highest profit margins in the world at that size for 10+ years has anything to do with market cap?

verticalscaler · 2024-03-13T00:15:57

$1T Market Cap lets it be known it will invest $10B a year into $current-hype that will change everything. P/E loosens speculatively on sudden new unbounded potential, Market Cap $1.1T. Hype funded. PR as innovator cemented.

throwaway2037 · 2024-03-13T05:48:25

If you look at the R&D expenditure of Apple, it is mindboggling.

https://www.macrotrends.net/stocks/charts/AAPL/apple/researc...

Roughly 30B USD per year. And what are we getting? Slightly slimmer phones and 3500USD AR/VR headsets?

throwaway2037 · 2024-03-13T06:41:20

> Market Cap $1.1T. Hype funded.

I'm confused. How does your stock price, which determines market cat, affect your cashflow to fund R&D? It does not.

bigcat12345678 · 2024-03-13T04:07:40

Would you kindly provide sources to the numbers? What is MFN?

Thanks! (Your number is consistent with what I hear of, but I never managed to get solid sources to back them up)

vineyardmike · 2024-03-12T20:20:03

It’s often forgotten now, but just a few years NVidia was cancelled production batches and writing down inventory when the GPU shortage cleared. No one needed more GPUs. It also happens to be when Meta first announced they were going to increase CapEx spending on compute.

I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.

transcriptase · 2024-03-12T23:38:12

I don’t think it was that nobody needed GPUs. It was that nvidia thought they could get scalper margins by restricting supply after the shortage showed people were willing to pay scalper prices.

dekhn · 2024-03-12T19:01:01

That sounds like a reasonable budget for 3 years of hardware at a major AI company.

ZiiS · 2024-03-12T17:34:07

They may have to pay a premium to secure ~¼ of the output; certainly unlikely to be that steep a discount.

theptip · 2024-03-12T19:19:02

Semi analysis posted recently noting that Meta locked in these purchases a while ago; something like a year or more. So they probably didn’t pay today’s spot rate.

YetAnotherNick · 2024-03-12T17:44:36

> $3.5b

Which is a fourth of what they spent in VR/AR in a year. And Gen AI is something they could easily get more revenue as it has now become proven technology, and Meta could possibly leapfrog others because of the data moat.

dougb5 · 2024-03-12T18:38:18

Proven technology, maybe, but proven product-market fit for the kinds of things Facebook is using it for? Their linked blog about AI features gives examples "AI stickers" and image editing... cool, but are these potential multi-billion dollar lifts to their existing business? I guess I'm skeptical it's worthwhile unless they're able to unseat ChatGPT with a market-leading general purpose assistant.

pests · 2024-03-12T19:11:48

I have a few group chats just that devolve into hours of sending stickers or image generation back and forth, lately we've been "writing a book together" with @Meta AI as the ghost writer, and while it utterly sucks, its been a hilarious shared experience.

I don't think anyone else has gotten that group chat with AI thing so nailed.

TaylorAlexander · 2024-03-12T19:35:20

On the podcast TrashFuture, November Kelly recently described AI systems as “garbage dispensers” which is both a funny image (why would anyone make a garbage dispenser??) and an apt description. Certainly these tools have some utility, but there are a load of startups claiming to “democratize creativity” by allowing anyone to publish AI generated slop to major platforms. On the podcast this phrase was used during discussion of a website which lets you create AI generated music and push it to Spotify, a move which Spotify originally pushed back on but has now embraced. Garbage dispenser indeed.

YetAnotherNick · 2024-03-12T20:02:43

> unseat ChatGPT with a market-leading general purpose assistant.

It's not impossible. The prediction from many(not that I believe it) is that over long run modelling tricks would become common knowledge and only thing that matters is compute and data, both of which Meta has.

Also there could be a trend of LLMs for ads or feed recommendation in the future as they has large completely unstructured dataset per user across multiple sites.

cj · 2024-03-12T21:40:07

Compute, data, and most importantly distribution/users.

IMO standalone AI companies like OpenAI might be successful by providing infrastructure to other companies, but I can’t imagine ChatGPT remaining #1 many years from now.

The web is still trending towards being a walled garden. Maybe not right now, but long term I think people will use whatever AI is most convenient which probably will be AI built into a giant company with established user base (FB, GOOG, MSFT, and Apple if they ever get around to launching - would love Siri 2.0 if it meant not needing to open the ChatGPT iOS app)

NBJack · 2024-03-12T17:56:15

What moat exactly? Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well. I'm not even sure they can use historical data.

Meta certainly has an edge in engineer count, undoubtedly. But I'd say they really, really want the metaverse to succeed more to have their on walled garden (i.e. equivalent power of Apple and Google stores, etc.). There's a reason they gave a hard pass to a Google partnership.

Dr_Birdbrain · 2024-03-12T18:28:32

I think the raw text inside Facebook groups is at least as valuable as Reddit data. Even if demographics data is restricted under European law, the raw text of people interacting is quite valuable.

verticalscaler · 2024-03-12T19:30:36

Indeed, my deranged auntie posting on FB is approximately as valuable as my ADHD/PTSD quaranteeny nephew redditing.

fragmede · 2024-03-13T04:11:33

That ignores all the user groups that are on Facebook. From apartment communities aka Nextdoor to grief support counseling to the mindfulness therapy groups, there’s a wealth of user comments a tad bit higher than Uncle John’s racist rants.

calvinmorrison · 2024-03-12T18:53:34

facebooks downfall will be their lock in. every other social media platform lets you view a public profile, discussion groups etc. it's all locked inside facebook.

agar · 2024-03-13T00:32:05

> There's a reason they gave a hard pass to a Google partnership.

AIUI, Google required Meta to basically cede control of a partnered OS to them:

"After years of not focusing on VR or doing anything to support our work in the space, Google has been pitching AndroidXR to partners and suggesting, incredibly, that WE are the ones threatening to fragment the ecosystem when they are the ones who plan to do exactly that.

"We would love to partner with them. They could bring their apps to Quest today! They could bring the Play store (with its current economics for 2d apps) and add value to all their developers immediately, which is exactly the kind of open app ecosystem we want to see. We would be thrilled to have them. It would be a win for their developers and all consumers and we’ll keep pushing for it.

"Instead, they want us to agree to restrictive terms that require us to give up our freedom to innovate and build better experiences for people and developers—we’ve seen this play out before and we think we can do better this time around."

-- From Mark Bosworth

YetAnotherNick · 2024-03-12T19:48:53

> Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well.

Source would be appreciated, because this is opposite of obvious. Regulations against using public first party would be a big news and I haven't heard of anything like that. They use my data for recommending feed so why not for answering my question?

loeg · 2024-03-12T21:01:05

Yes, billions in GPU cap ex.

benreesman · 2024-03-12T21:42:56

I think it’s always useful to pay attention to the history on stuff like this and it’s a rare pleasure to be able to give some pointers in the literature along with some color to those interested from first-hand experience.

I’d point the interested at the DLRM paper [1]: that was just after I left and I’m sad I missed it. FB got into disagg racks and SDN and stuff fairly early, and we already had half-U dual-socket SKUs with the SSD and (increasingly) even DRAM elsewhere in the rack in 2018, but we were doing huge NNs for recommenders and rankers even for then. I don’t know if this is considered proprietary so I’ll play it safe and just say that a click-prediction model on IG Stories in 2018 was on the order of a modest but real LLM today (at FP32!).

The crazy part is they were HOGWILD trained on Intel AVX-2, which is just wild to think about. When I was screwing around with CUDA kernels we were time sharing NVIDIA dev boxes, typically 2-4 people doing CUDA were splitting up a single card as late as maybe 2016. I was managing what was called “IGML Infra” when I left and was on a first-name basis with the next-gen hardware people and any NVIDIA deal was still so closely guarded I didn’t hear more than rumors about GPUs for training let alone inference.

350k Hopper this year, Jesus. Say what you want about Meta but don’t say they can’t pour concrete and design SKUs on a dime: best damned infrastructure folks in the game pound-for-pound to this day.

The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.

[1] https://arxiv.org/pdf/1906.00091.pdf

[2] https://arxiv.org/pdf/2108.09373.pdf

[3] https://engineering.fb.com/2022/10/18/open-source/ocp-summit...

[4] https://youtu.be/lQlIwWVlPGo?si=rRbRUAXX7aM0UcVO

zone411 · 2024-03-12T17:56:53

Meta is still playing catch-up. Might be hard to believe but according to Reuters they've been trying to run AI workloads mostly on CPUs until 2022 and they had to pull the plug on the first iteration of their AI chip.

https://www.reuters.com/technology/inside-metas-scramble-cat...

axpy906 · 2024-03-12T22:44:00

Definitely has some pr buzz and flex in the article. Now I see why.

DEDLINE · 2024-03-12T16:56:33

I wonder if Meta would ever try to compete with AWS / MSFT / GOOG for AI workloads

lifeisstillgood · 2024-03-12T17:56:57

FB does not have the flywheel of running data centres - all three of those mentioned run hyper scale datacentres that they can then juice by “investing” billions in AI companies who then turn around and put those billions as revenue in the investors

OpenAI takes money from MSFT and buys Azure services

Anthropic takes Amazon money and buys AWS services (as do many robotics etc)

I am fairly sure it’s not illegal but it’s definitely low quality revenue

miohtama · 2024-03-12T22:35:07

Such barter deals were also popular during the 00s Internet Bubble.

Here more on the deals (2003):

https://www.cnet.com/tech/services-and-software/aol-saga-ope...

Popular names included AOL, Cisco, Yahoo, etc.

Not sure if Amazon’s term sheets driving high valuation are nothing but AWS credits (Amazon’s own license to print money).

woah · 2024-03-12T18:18:54

Sounds like it's free equity at the very least

lotsofpulp · 2024-03-12T22:22:19

How is it free equity? Spending money to invest it somewhere involves risks. You might recover some of it if the investment is valued by others, but there is no guarantee.

miohtama · 2024-03-12T22:37:52

You do not need cash in hands to invest. Instead, you print your own money (AWS credit) and use that to drive up the valuation, because this money costs you nothing today.

It might cost tomorrow though, when the company starts to use your services. However depending the deal structure they might not use all the credit, go belly up before credit is used or bought up by someone with real cash.

vineyardmike · 2024-03-12T20:21:44

NVidia also invests in their AI customers.

fikama · 2024-03-13T08:00:31

What do you mean? Could you elaborate please? Enumerate some deals so I could read more about it?

itslennysfault · 2024-03-12T20:48:17

Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity. Meta could absolutely do the same, but in the short term, I think they find using that capacity more valuable than selling it.

otterley · 2024-03-12T21:42:51

> Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity.

This is a myth. It simply isn't true. AWS was conceived as a greenfield business by its first CEO. Besides, S3 and SQS were the first AWS services; EC2 didn't appear till a few years later. And it wasn't built from excess Amazon server capacity; it was totally separate.

virtuallynathan · 2024-03-12T20:09:46

Facebook has more datacenter space and power than Amazon, Google, and Microsoft -- possibly more than Amazon and Microsoft combined...

jedberg · 2024-03-12T20:46:25

Unless you've worked at Amazon, Microsoft, Google, and Facebook, or a whole bunch of datacenter providers, I'm not sure how you could make that claim. They don't really share that information freely, even in their stock reports.

Heck I worked at Amazon and even then I couldn't tell you the total datacenter space, they don't even share it internally.

virtuallynathan · 2024-03-12T22:49:49

You can just map them all... I have. I also worked at AWS :)

chatmasta · 2024-03-13T00:37:17

This would be an interesting dataset to use for trading decisions (or sell to hedge funds).

But I wonder how much of their infrastructure is publicly mappable, compared to just the part of it that's exposed to the edge. (Can you map some internal instances in a VPC?)

That said, I'm sure there are a lot of side channels in the provisioning APIs, certificate logs, and other metadata that could paint a decently accurate picture of cloud sizes. It might not cover everything but it'd be good enough to track and measure a gradual expansion of capacity.

the-rc · 2024-03-13T01:26:08

Mapping as in.. drawing the outlines of buildings and computing the square footage yourself?

virtuallynathan · 2024-03-12T22:55:51

To date, facebook has built, or is building, 47,100,000 sq ft of space, totaling nearly $24bn in investment. Based on available/disclosed power numbers and extrapolating per sqft, I get something like 4770MW.

Last I updated my spreadsheet in 2019, Google had $17bn in investments across their datacenters, totaling 13,260,000 sq ft of datacenter space. Additional buildings have been built since then, but not to the scale of an additional 30mil sq ft.

Amazon operates ~80 datacenter buildings in Northern Virginia, each ~200,000 sq ft -- about 16,000,000sq ft total in that region, the other regions are much much smaller, perhaps another 4 mil sq ft. When I'm bored I'll go update all my maps and spreadsheets.

the-rc · 2024-03-13T01:23:53

Does the square footage take into account multiple floors? What's the source? It can be misleading, because you don't know the compute density of what's inside. Using just public data, power is a more accurate proxy. Until at least 5-6 years ago, Google was procuring more electricity than Amazon. Before that, it had a further advantage from lower PUE, but I bet the big names are all comparable on that front by now. Anyone that has worked at several of them can infer that FB is not the largest (but it's still huge).

As for the dollars, were they just in 2019 or cumulative? The Google ones seem low compared to numbers from earnings.

samstave · 2024-03-12T23:42:11

At this point Power Companies (ala PG&E, etc) should be investing in AI companies in a big way. THen they make money off the AI companies to build out power infra - and vice versa.

I am surprised we havent heard about private electrical grid built out by such companies.

Surely they all have some owned power generation, but then if they do, the local areas where they DO build out power plants - they should have to build capacity for the local area, mayhaps in exchange for the normal tax subsidies they seek for all these large capital projects.

Cant wait until we pods/clusters in orbit. With radioisotope batteries to power them along with the panels. (I wonder how close to a node a RI battery can be? Can each node have its own RI?) (sas they can produce upto "several KW" -- but I cant find a reliable source for max wattage of an RI...)

SpaceX should build an ISS module thats an AI DC cluster.

And have all the ISS technologies build its LLM there based on all the data they create?

VirusNewbie · 2024-03-12T23:29:19

But Google built data centers aren't the only data centers google is running their machine fleet in...

chatmasta · 2024-03-13T00:39:01

Yeah, Google buys servers in public datacenters like those from Equinix. One "region" needn't be one datacenter, and sometimes AWS and GCP will even have computers in the same facility. It's actually quite annoying that "region" is such an opaque construct and they don't have any clear way to identify what physical building is hosting the hardware you rent from them.

the-rc · 2024-03-13T01:12:10

Those are almost lost in the noise, compared to the big datacenters. (I've been inside two Atlanta facilities, one leased and one built from scratch, and the old Savvis one in Sunnyvale).

karmasimida · 2024-03-12T20:52:59

I don't think so, AWS hasn't disclosed this numbers, like datacenter spaces occupied, so how do you know.

virtuallynathan · 2024-03-12T22:49:01

I have mapped every AWS data center globally, and I worked at AWS.

Facebook publishes this data.

pgwhalen · 2024-03-12T21:23:31

I have zero evidence, but this seems extremely unlikely. Do you have more than zero evidence?

meiraleal · 2024-03-12T22:40:47

Meta can use all their datacenter space while Amazon, Google, and Microsoft datacenter space is mostly rented.

dsp · 2024-03-12T20:20:50

[citation needed]

rthnbgrredf · 2024-03-12T17:59:57

Meta could build their own cloud offering. But it would take years to match the current existing offerings of AWS, Azure and GCP in terms of scale and wide range of cloud solutions.

Cthulhu_ · 2024-03-12T19:39:42

And then there's sales. All of those three - and more you haven't considered, like the Chinese mega-IT companies - spend huge amounts on training, partnerships, consultancy, etc to get companies to use their services instead of their competitors. My current employer seems all-in on Azure, previous one was AWS.

There was one manager who worked at two large Dutch companies and sold AWS to them, as in, moving their entire IT, workloads and servers over to AWS. I wouldn't be surprised if there was a deal made there somewhere.

oblio · 2024-03-12T19:33:27

The real question is: why aren't they? They had the infrastructure needed to seed a cloud offering 10 years ago. Heck, if Oracle managed to be in 5th (6th? 7th?) place, Facebook for sure could have been a top 5 contender, at least.

krschultz · 2024-03-13T06:30:24

Because they make more money using their servers for their own products than they would renting them to other people. Meta has an operating margin of 41% AFTER they burn a ton on Reality Labs, while AWS has a 21% margin with more disciplined spending. Social media is a more profitable business than infrastructure.

elbear · 2024-03-13T07:00:05

Does Meta make money from anything other than ads? It's not a dismissive question. I'm curious if social media implies anything other than ads.

Thaxll · 2024-03-13T00:31:55

Because it's not their business, they're not good at it and probably the ROI is not worth it.

Also how exactly they would do it, they don't have enough infra for renting, they would need to x10 what they have now.

KaiserPro · 2024-03-12T19:43:49

because meta sucks at software, documentation and making sure end user products work in a supported way.

Offering reliable IaaS is super hard and capital intensive. Its also not profitable if you are perceived as shit.

logicchains · 2024-03-12T20:23:48

>because meta sucks at software

Google started a cloud and their user-facing software is atrocious. Compared e.g. Angular to React, Tensorflow to Pytorch.

negus · 2024-03-12T23:03:02

Why would you prefer Pytorch to Tensorflow/Keras?

chessgecko · 2024-03-12T23:39:29

Tensorflow and keras have gotten better, but pytorch historically had better flexibility than keras and was much easier to debug/develop in than tensorflow.

bionhoward · 2024-03-12T18:28:34

aww, those existing offerings are overcomplicated as hell, a fresh look could yield substantially simpler cloud developer experience and this would compete well against those other cloud offerings on simplicity alone

redleader55 · 2024-03-12T20:42:09

For consumers, AI could just be stateless "micro service". Meta already has enough surfaces where customers can interact with AI.

crowcroft · 2024-03-12T19:00:31

I think Meta have avoided doing this because it would complicate their business priorities. They don’t really do B2B.

carlossouza · 2024-03-12T20:45:25

What do you mean by “they don’t do B2B”? They sell ads to companies, don’t they?

elwell · 2024-03-12T17:19:59

> Meta’s long-term vision is to build artificial general intelligence (AGI)

valzam · 2024-03-13T00:23:18

Don't worry, this goal will change with the next hype cycle

latchkey · 2024-03-13T01:54:12

I pity the fools that think AI is just another internet hype cycle.

brookst · 2024-03-13T03:24:54

I’m old enough to remember the proud, defiant declarations that the internet was just a hype cycle.

bennyelv · 2024-03-13T06:28:28

Well it was wasn’t it? There was a massive boom where loads of companies over promised what they would achieve, followed by a crash when everyone realised lots of them couldn’t, followed by stability for the smaller number that could.

It was the very definition of a hype cycle as far as I can see. Hype cycle doesn’t mean “useless and will go away”, you have the second upward curve and then productivity.

https://en.m.wikipedia.org/wiki/Gartner_hype_cycle

latchkey · 2024-03-13T03:44:44

I got my first email in 1991 and started my first internet business in 1995 (a web dev shop). My entire life has been an endless hype cycle.

mjburgess · 2024-03-12T16:41:23

I'd be great if they could invest in an alternative to nvidia -- then, in one fell swoop, destroy the moats of everyone in the industry.

math_dandy · 2024-03-12T16:50:56

A company moving away from Nvidia/CUDA while the field is developing so rapidly would result in that company falling behind. When (if) the rate of progress in the AI space slows, then perhaps the big players will have the breathing room to consider rethinking foundational components of their infrastructure. But even at that point, their massive investment in Nvidia will likely render this impractical. Nvidia decisively won the AI hardware lottery, and that's why it's worth trillions.

whiplash451 · 2024-03-12T17:42:56

People said the same thing when tensorflow was all the rage and pytorch was a side project.

Granted, HW is much harder than SW, but I would not discount Meta's ability to displace NVIDIA entirely.

Cthulhu_ · 2024-03-12T19:41:13

I don't think they could; nvidia has tons of talent, Meta would have to steal that. Meta doesn't do anything in either consumer or datacenter hardware that isn't for themselves either.

Meta is a services company, their hardware is secondary and for their own usage.

Wazako · 2024-03-12T22:49:30

meta has the Quest. It's not so bad that they're looking to create an LPU for their headset to offer local play.

mjburgess · 2024-03-12T17:28:47

I'm more concerned to avoid nvidia (et al.) market domination, than chasing the top-edge of the genAI benefits sigmoid. This will prevent much broad-based innovation.

hx8 · 2024-03-12T17:55:34

This space is so compeitive, even if Nvidia is asleep at the wheel a competitor will come and push them before too long. AMD has a history of noticing when their competitors are going soft and rapidly being compeitive.

paxys · 2024-03-12T16:58:57

Except that "one fell swoop" would realistically be 20+ years of research and development from the top minds in the semiconductor industry.

logicchains · 2024-03-12T20:26:19

It's not the hardware keeping NVidia ahead, it's the software. Hardware-wise AMD is competitive with NVidia, but their lack of a competitive CUDA alternative is hurting adoption.

brucethemoose2 · 2024-03-12T17:11:30

Facebook very specifically bought and customized Intel SKUs tailored for AI workloads for some time.

John23832 · 2024-03-12T16:54:19

https://engineering.fb.com/2023/10/18/ml-applications/meta-a...

aeyes · 2024-03-12T16:59:56

Isn't Google trying to do this with their TPUs?

crakenzak · 2024-03-12T17:29:34

I still, for the life of me, can't understand why Google doesn't just start selling their TPUs to everyone. Nvidia wouldn't be anywhere near their size if they only made H100s available through their DGX cloud, which is what Google is doing only making TPUs available through Google Cloud.

Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.

aseipp · 2024-03-12T18:17:22

It is an absolutely massive amount of work to turn something designed for your custom software stack and data centers (custom rack designs, water cooling, etc) into a COTS product that is plug-and-play; not just technically but also things like sales, support, etc. You are introducing a massive amount of new problems to solve and pay for. And the in-house designs like TPUs (or Meta's accelerators) are cost effective in part because they don't do that stuff at all. They would not be as cheap per unit of work if they had to also pay off all that other stuff. They also have had a very strong demand for TPUs internally which takes priority over GCP.

dekhn · 2024-03-12T21:38:15

Do you mean, sell TPU hardware to other companies that would run it in their data centers? I can't imagine that would ever really work. The only reason TPUs work at Google is because they have huge teams across many different areas to keep them running (SRE, hardware repair, SWE, hardware infra) and it's coupled to the design of the data centers. To vend and externalize the software would require google to setup similar teams for external customers (well beyond what Google Cloud provides for TPUs today) just to eke out some margin of profit. Plus, there is a whole proprietary stack running under the hood that google wouldn't want to share with potential competitors.

Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.

aeyes · 2024-03-12T22:38:55

> Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.

We had a GSA for intranet search and other than the paint this was a standard Dell server. I remember not being impressed by what the GSA could do.

We also had Google Urchin for web analytics, it wasn't a hardware appliance but the product wasn't very impressive either. They then killed that and tried to get you onto Google Analytics.

They just didn't commit to these on premise enterprise products.

dekhn · 2024-03-12T23:08:52

The server may have been dell, but it included a full stack of google3 software including chubby the lockserver.

We had one at my company and it was widely loved- far better intranet search and domain-specific search for biotech.

ajcp · 2024-03-12T17:54:27

And undercut what they'd like to use as a huge motivator in people moving to GCP? Not likely. Even if they wanted to they can't keep up with their own internal demand.

Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.

htrp · 2024-03-12T18:05:16

>Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.

Once you go out of your heavily curated hardware stack, the headaches multiply exponentially.

qiine · 2024-03-12T17:40:28

Maybe selling hardware to customers worldwide + support like Nvidia does is actually not trivial ?

neuronexmachina · 2024-03-12T18:24:43

The impression I got from this thread yesterday is that Google's having difficulty keeping up with the heavy internal demand for TPUs: https://news.ycombinator.com/item?id=39670121

gingergoat · 2024-03-12T17:11:44

The article doesn't mention MTIA, meta's custom ASIC for training & inference acceleration. https://ai.meta.com/blog/meta-training-inference-accelerator...

I wonder if they will use it in RSC.

spencerchubb · 2024-03-13T01:29:32

All this compute and my Instagram Reels feed still isn't as good as my TikTok feed

zeroonetwothree · 2024-03-13T02:04:01

What does that have to do with Gen AI

lmm · 2024-03-13T03:24:19

If Gen AI doesn't have anything to do with "Meta"'s actual business then WTF are they setting all this money on fire for?

spencerchubb · 2024-03-13T02:21:13

GenAI infra is the same as regular AI infra. They used GenAI in the title because it's a buzzword.

ipsum2 · 2024-03-13T02:32:33

Not really. Ranking and recommendation models require different infrastructure than LLMs. The models are generally smaller and require more data processing before training.

refulgentis · 2024-03-13T02:41:09

Yeah, no.

wseqyrku · 2024-03-12T18:41:19

> Commitment to open AI innovation

I see what you did there, Meta.

owenpalmer · 2024-03-12T19:23:03

Haha, I noticed that too xD

latchkey · 2024-03-12T17:21:40

> we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.

loeg · 2024-03-12T21:03:02

Yeah, and RoCE isn't single vendor. I'm not sure IB scales to the relevant cluster sizes, either.

anonymousDan · 2024-03-12T22:07:10

Is NVLink just not scalable enough here?

loeg · 2024-03-12T22:51:17

I don't know. I haven't actually worked with IB in this specific space (or since before Nvidia acquired MLNX). My experience with RoCE/IB was for storage cluster backend in the late 2010s.

seydor · 2024-03-12T19:34:51

This is great news for Nvidia and their stock, but are they sure the LLMs and image models will scale indefinitely? nature and biology has a preference for sigmoids. What if we find out that AGI requries different kinds of cpu capabilities

jiggawatts · 2024-03-12T22:40:40

If anything, NVIDIA H100 GPUs are too general purpose! The optimal compute for AI training would be more specialised, but then would be efficient at only one NN architecture. Until we know what the best architecture is, the general purpose clusters remain a good strategy.

alexsereno · 2024-03-12T16:31:27

Honestly Meta is consistently one of the better companies at releasing tech stack info or just open sourcing, these kinds of articles are super fun

rshm · 2024-03-12T16:40:05

I think some elements of this stack might flow into the open compute.

adamnemecek · 2024-03-12T16:35:30

Do you find this informative?

alexsereno · 2024-03-12T18:18:22

Yes of course - it depends on what lens though. If you mean "I'm learning to build better from this" then no, but its very informative on Meta's own goals and mindset as well as real numbers that allow comparison to investment in other areas, etc. Also the point was mostly that Meta does publish a lot in the open - including actual open source tech stacks etc. They're reasonably good actors in this specific domain.

pinko · 2024-03-12T19:56:38

The link mentions "our internal job scheduler" and how they had to optimize it for this work -- does anyone know what this job scheduler is called, or how it works?

KaiserPro · 2024-03-12T20:20:57

it might be twine: https://www.usenix.org/system/files/osdi20-tang.pdf

but I suspect its not that, because Twine is optimised for services rather than batch processing, and doesn't really have the concept of priorities.

radicality · 2024-03-12T22:28:06

I would think it’s probably that. Also, has this been renamed to Twine from Tupperware?

mrkramer · 2024-03-12T19:02:33

"Share this: Hacker News" Noice

BonoboIO · 2024-03-12T19:14:57

I thought at first "what are you talking about", when i check my uBlock filters. Was blocking the whole "Share this" content section.

Sharing on Hacker News ... they now their audience.

mrkramer · 2024-03-12T19:53:36

I also use uBlock but my filters are the default ones and I saw it without any problem but tbh this is the first time that I saw some post on the Web have HN as a share option or the first time that I was surprised seeing it. Maybe it has something to do with Google ranking "trusted human information and knowledge" higher than "non-human" information and knowledge[0] or simply some Meta software engineer loves and uses HN so s/he decided to include HN as well, idk.

[0] https://news.ycombinator.com/item?id=39423949

zerop · 2024-03-12T17:17:49

> At Meta, we handle hundreds of trillions of AI model executions per day

Such a large number, makes sense?

GeneralMayhem · 2024-03-12T17:43:58

Sure. 100T/day * 1day/86400sec ~= 1B/sec. They're probably considering at least a few hundred candidates per impression, and every impression is going to go through _at least_ two models (relevance and pCTR/revenue), so you could get there just with online serving at 5Mqps, which is plausible. But they're also going to be doing a lot of stuff in batch - spam predictions, ad budget forecasts, etc - so that every candidate actually runs through four or five different models, and every actual impression could do more than that.

sangnoir · 2024-03-12T18:13:13

How many ads does Meta serve a day, and how many AI model executions are done for each one? Repeat the same for stories, post and comment recommendations on Facebook and Instagram, and you have very big numbers. To that, Add VR, internal modeling and other backoffice/ offline analyses over billions of users and you'll easily get into the trillions.

dakiol · 2024-03-12T18:11:49

What's an "AI model execution"? When I ask something to ChatGPT and it answers to me, does that count as 1 "AI model execution" for OpenAI?

（评论） (comments)

（评论）
(comments)