Mathematical Foundations of Reinforcement Learning

eachro · 2025-03-11T06:07:57 1741673277

During the openai gym era of RL, one of the great selling pts was that RL was very approachable for a new comer as the gym environments were small and tractable that a hobbyist could learn a little bit of RL, try it out on cartpole and see how it'd perform. Are there similarly tractable RL tasks/learning environments with LLMs? From the outside, my impression is that you need some insane GPU access to even start to mess around with these models. Is there something one can do on a normal MacBook air for instance in this LLM x RL domain?

al_th · 2025-03-11T08:16:46 1741681006

This is entirely doable.

I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.

I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.

I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment

363849473754 · 2025-03-11T12:10:11 1741695011

GRPO project is neat. Would you be willing to do a Karpathy-style explainer, breaking down the algorithm from scratch? It’s hard to understand on its own without prior background knowledge.

currymj · 2025-03-11T15:40:44 1741707644

Find materials on PPO which should be widespread since it is the most popular RL algorithm. GRPO works on the same principles, just makes certain estimates from samples rather than training an auxiliary neural network to make them.

zqy123007 · 2025-03-11T00:51:02 1741654262

6-lecture series on the Foundations of Deep RL by Pieter Abbeel is also very recommended. gives very good overview and intuition https://youtu.be/2GwBez0D20A

dualofdual · 2025-03-10T19:21:17 1741634477

The best lectures on Reinforcement Learning and related topics are by Dimitris Bertsekas: https://web.mit.edu/dimitrib/www/home.html

rybthrow2 · 2025-03-10T19:32:16 1741635136

Also one by David Silver of Deepmind, AlphaGo fame are good too: https://www.youtube.com/watch?v=2pWv7GOvuf0

esafak · 2025-03-10T19:44:56 1741635896

His books tend to be dry and geared towards researchers, in my opinion. He has a new one on RL: https://web.mit.edu/dimitrib/www/RLCOURSECOMPLETE%202ndEDITI...

joe_lin · 2025-03-10T20:59:42 1741640382

I'm looking for content (researcher myself) -- mainly on the application side. Should I start with this one? Or anything else?

Very curious about RL for LLMs for example (using data from real use).

esafak · 2025-03-10T22:13:24 1741644804

I have not read it but it looks like a comprehensive reference. For a more applied treatment see Foundations of Deep Reinforcement Learning. https://slm-lab.gitbook.io/slm-lab/publications-and-talks/in...

Neither cover LLMs. I don't follow the literature closely so I can only suggest you read papers: https://github.com/WindyLab/LLM-RL-Papers

richard___ · 2025-03-10T22:46:17 1741646777

No. They are outdated and focused on strange things. You wont understand ppo from his textbooks

cplat · 2025-03-11T04:02:21 1741665741

Which aspects? Foundational textbooks would focus on principles, not necessarily implementations, and don't go "outdated" the same way a snippet does.

forkerenok · 2025-03-10T20:17:27 1741637847

Would you mind explicitly indicating whether you have reviewed the submitted materials? And if so, why is it inferior to the material you linked?

Not trying to catch you, genuine interest.

lemonlym · 2025-03-10T20:32:31 1741638751

Another great resource on RL is Mykel Kochenderfer's suite of textbooks: https://algorithmsbook.com/

noobly · 2025-03-10T20:50:59 1741639859

These books are all RL? I’ve got the decision one, I didn’t think the other had anything to do with RL.

jvanderbot · 2025-03-10T23:03:17 1741647797

He (author) has a strong proclivity for policy-based planning, shall we say.

jgord · 2025-03-11T00:34:12 1741653252

Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.

Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.

RiDiracTid · 2025-03-11T13:53:44 1741701224

RL is definitely really cool but I heavily doubt that we're gonna see 'hyper growth' from RL outside of the context of maybe training reasoning LLMs.

The period from ~2012-2019 of AI research had deepmind (who was the undisputed leader in money and talent) go all in on RL to solve problems and while they did do lots of interesting and useful work, there wasn't anything quite so extraordinary / revolutionary in massively accelerating the field or some sort of crazy breakthrough.

Their over-focus on RL instead of transformers/llms is what allowed OpenAI to surprise everyone and overtake deepmind.

Yes, RL is a useful tool, but outside the context of training LLMs for reasoning there isn't really any breakthrough that makes it more than an interesting tool for certain situations.

almostgotcaught · 2025-03-11T04:45:30 1741668330

> Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

I love when people on hn make market predictions based on how revolutionary they think something is. I guess startup people thank they're also VC people.

FYI Sutton's book came out in 1999; none of this is revolutionary anymore and yet I don't see any "hyper growth". The reason is exactly because while you can train these models to play super Mario, you cannot use them to solve real world problems.

https://www.google.com/books/edition/Reinforcement_Learning/...

jgord · 2025-03-11T06:35:53 1741674953

Sure.. and neural networks came out a very long time ago, but are now arguably approaching usefulness in LLMs.

Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

re : "RL cannot be used to solve real world problems" .. well, I would argue that these are useful real-world problems :

  - predict protein folding structure from DNA sequence
  - stabilizing high temperature fusion plasma
  - improving weather forecasting efficiency
  - improve DeepSeek's recent LLM model

Im currently using RL techniques to find 3D geometry - pipes, beams, walls - in pointclouds. It is of practical benefit, as a lot of this is done manually, ballpark $5Bn/yr

But I concede I cannot point to a plethora of small startups using RL for these real-world problems .. yet.

This is a prediction, and I could be wrong in many ways - not least that LLMs digest RLs in full and learn to express their logical reasoning, approaching AGI, and use RLs internally, and so subsume and automate the use of RL.

Are VCs better at predicting the future.. I guess that is their job, and they have money on the line... but I think even they would admit they need a large portfolio to capture the unicorns.

VCs probably get a less detailed tech view than founders, but the large number of pitches they review should give them a noisy but wider overview of the whole bleeding edge of innovation.

I think startup founders are in the same future prediction business .. and arguably have more skin in the game.

Predictions would be pretty useless if they weren't somewhat controversial - a prediction we all agree on doesn't say much. Come back and chastize me if we dont see more RL startups in 12 months time !

almostgotcaught · 2025-03-11T07:30:27 1741678227

> Come back and chastize me if we dont see more RL startups in 12 months time !

1999 is 26 years ago but ya sure this is the year they finally take off.

> Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

Or perhaps it could be that you're wrong and they're useless? Nah that couldn't be it.

bitvoid · 2025-03-12T05:58:52 1741759132

And 1967 was 58 years ago, which was when the first deep neural network was trained with stochastic gradient descent. Yet, DNNs didn't take off until the 2010s when the hardware became powerful enough and data became plenty enough to successfully train and utilize them such that they were practical.

auggierose · 2025-03-11T12:10:51 1741695051

I think you GOT caught here. That's why you don't respond to the Nobel prize winning example of RL.

bglazer · 2025-03-11T13:08:05 1741698485

Are we talking about AlphaFold? It did not use RL, right?

auggierose · 2025-03-11T13:21:10 1741699270

I think it does: https://juanraul8.github.io/master-praktikum/

smokel · 2025-03-11T06:08:02 1741673282

Reinforcement learning is hard to apply to real-world problems, but one cannot deny the success that a company such as OpenAI has.

bitvoid · 2025-03-11T13:54:03 1741701243

> you cannot use them to solve real world problems

Doesn't waymo and other self-driving systems use reinforcement learning? I thought it was used in robotics as well (i.e., bipedal, quadrupedal movement).

currymj · 2025-03-11T21:07:20 1741727240

generally you are right in spirit.

however multi-armed bandit algorithms are highly useful in practice. these are a special case of RL (RL with one state, essentially).

there are even some extensions of applied bandit algorithms to "true RL", e.g. for recommender systems that want to consider history.

this is the place to look for real-world applications of RL.

also RL uses importance-sampling estimators of the gradient. these sometimes show up in other applications though not framed as "RL".

CamperBob2 · 2025-03-11T15:41:01 1741707661

"FYI Maxwell's paper came out in 1865 and now it's 1896 and Marconi's radio, which he invented a whole year ago, still doesn't pick up anything but buzzes and static. The reason is exactly because while you can manipulate the electromagnetic field with current fluctuations, you cannot use it to solve real world problems."

kristjansson · 2025-03-10T20:43:45 1741639425

Also worth mentioning Murphy's WIP textbook[0] focused entirely on RL, which is an outgrowth of his excellent ML textbooks.

[0]: https://arxiv.org/abs/2412.05265

ivanbelenky · 2025-03-10T21:50:03 1741643403

Awesome resource, in case someone is interested I implemented most of suttons book here https://github.com/ivanbelenky/RL

Iwan-Zotow · 2025-03-11T03:05:52 1741662352

Thanks, looks good

hazrmard · 2025-03-11T17:24:34 1741713874

Thank you. This is great. I also appreciated the linked code for MinRL (https://github.com/10-OASIS-01/minrl).

Having done research in RL, a big problem with incremental research was to reproduce comparative works, and to validate my own contributions. A simple library like this, with built-in tools for visualization and a gridworld sandbox where I can validate just by observation, is very helpful!

Culonavirus · 2025-03-11T08:14:53 1741680893

> This book, however, requires the reader to have some knowledge of probability theory and linear algebra.

This is so funny to me, I see it often and I'm always like "yea, right, some knowledge"... these statements always need to be taken with a grain of salt and an understanding that math nerds wrote them. Average programmers with average math skills (like me) beware ;)

sigmoid10 · 2025-03-11T08:29:02 1741681742

This usually means that average CS or EE university level students should be able to easily follow it even if they have never touched the topic. It's far below the level of math and physics degrees, but still somewhat above what you could expect from an average self taught programmer.

Culonavirus · 2025-03-11T21:10:46 1741727446

I'm not even self-taught, it's just that when I was studying (CS degree, like 15 years ago) we did have a mandatory linear algebra course, graph theory course, statistics course etc., but we never * actually * used any of that in practice, it was all algo this, big o that, data structures, design patterns, languages, compilers, SQL etc. Now that I'm thinking about it pretty much the only course we had to use some linear algebra in was the 3d rendering one. ...

And then you work on .net/java/sql/server crap for a decade and you forget even the little math you used to know :D

monadicmonad · 2025-03-10T23:53:44 1741650824

I don't know how to go from understanding this material to having a job in the field. Just stuck as a SWE for now.

godelski · 2025-03-11T00:13:30 1741652010

  - Do you understand the material?
  - Can you utilize your understanding to build successful models/algorithms?

If the answer is yes to both, do some projects, put them on your github, and update your resume. You might need to take a job at a lower position first, but you can jump from there. But I want to make sure that the answer is "yes" to both and note that it is easy to think you understand something without actually understanding it. Importantly we must recognize that everyone has a different level of sufficient knowledge where they are comfortable saying that they "understand" a topic. One person might say they don't and be more knowledgeable than someone that says they do. But demonstration of the knowledge levels is at least a decent proxy for determining this.

A way I like to gauge someone's understandings of things is by getting them to explain the limitations. This is often less explicitly stated in learning and a deeper understanding is acquired through experience and most importantly, reflection on that experience. This is often an underutilized tactic but it is very effective. If you can't do this, then the good news is that starting now will only accelerate your understanding :)

varelaseb · 2025-03-11T00:18:36 1741652316

Just a random thought:

Understanding the limitations is a complicated thing in tech. You can finnangle most systems into doing mostly anything, as inefficient as that may prove to be.

The question then becomes up to what point is it "a reasonably better than most others" solution. And that's a question of an understanding of a field, not a space in the field.

godelski · 2025-03-11T02:59:06 1741661946

  > is a complicated thing in tech

That's the point. Understanding complex things is what experts are supposed to do.

  > You can finnangle most systems into doing mostly anything

"most" is doing a lot of heavy lifting here and I think the point you're making isn't discrediting my point. Sure you can hamfist a lot of things into working but an expert should know when to use better tools. Being able to identify what would end up as a very hacky solution from one paradigm but could be efficient and/or elegant in another is what an expert should be able to identify. Essentially, are they able to reduce technical debt even before that debt is taken on?

  > an understanding of a field, not a space in the field.

Would you mind clarifying the difference? I agree these are different things but I'm not sure why understanding the limitations would imply not having narrower domain knowledge. Sure, in ML knowing the advantages of convolutions over transformers and vise versa is good. But if you're working on LLMs, ViTs, or anything else it is still good to know what the limitations of transformer models are, and specifically what attention can and cannot do. We should be able to get more and more narrow too. An expert will be able to understand the nuances of specific evaluation methods: metrics, measures, datasets, and other forms of analysis. Being able to discuss nuance and detail is how you determine if someone has expertise or not. IME it tends to be pretty easy to identify experts (even in other fields) due to their ability and frequency of discussing nuances.

CamperBob2 · 2025-03-11T15:43:25 1741707805

Step 1: Build something cool with it.

shidoshi · 2025-03-11T14:25:57 1741703157

Amazing resource. Highly recommended for both content and approachability.

CaffeineLD50 · 2025-03-11T02:10:57 1741659057

And if you want to understand the theory of Skinner's Verbal Behavior check out

https://bfskinner.org/wp-content/uploads/2020/11/978_0_99645...

(评论) (comments)

(评论)
(comments)