(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39443965

在分析和综合各种文章和资源后,这里是一个总结版本: 在人工智能、机器学习 (ML) 和自然语言处理 (NLP) 领域,长短期记忆 (LSTM) 神经网络在对语音识别、语言翻译和股市预测等序列数据进行建模方面变得越来越流行。 These recurrent neural networks can process sequences efficiently by considering past inputs while weighing recent inputs。 However, LSTMs require large datasets for training and can suffer from issues related to vanishing gradients and exploding gradients。 Recently, transformer architecture, specifically Transformers with Self-Attention Mechanisms, have emerged as the dominant architectural paradigms in NLP tasks。 These architectures perform self-attention calculations to capture dependencies across multiple positions in input sentences, leading to superior performance compared to RNNs。 Moreover, recent breakthrough works like GPT (Generative Pretrained Transformer) have shown remarkable achievements in NLP tasks, achieving state-of-the-art performances with unprecedented capacity and scalability through massive unsupervised pretraining on large corpus of text data。 Nevertheless, building a transformer model involves several complex aspects such as training a deep language model, masking of input features during model training, decoupling supervision signals using joint teacher and student loss functions, and dealing with various technical challenges such as overfitting and generalization problems。 Overall, the field of deep learning and NLP continues to grow and evolve rapidly, requiring continuous innovation and research efforts to push boundaries further。 As demonstrated by cutting edge techniques like GPT and BERT, the possibilities for exploring novel methods and applications in these areas appear endless, and it will undoubtedly shape the landscape of AI for years to come。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Let's Build the GPT Tokenizer [video] (youtube.com)
556 points by davidbarker 17 hours ago | hide | past | favorite | 38 comments










No wonder GPT does so horribly on anything involving spelling, or the exact specifications of letters.

To fix it, I'd throw a few gigabytes of synthetic data in the training mix before fine tuning that included the alphabets of all the relevant languages, things like.

  A is an upper case a
  a is a lower case A
  the sequence of numbers is 0 1 2 3 4 5 6 7 8 9 10 11 12
  0 + 1 = 1
  1 + 1 = 2
etc.

It still amazes me that Word2Vec is as useful as it is, let alone LLMs. The structure inherent in language really does convey far more meaning that we assume. We're like fish, not being aware of water, when we use language.



We know OpenAI trains on significant amounts of synthetic data, they probably have something like this.


Andrej's video on building GPT nano is an excellent tutorial of all of the steps involved in a modern LLM.




His earlier videos on micrograd and makemore are a gold mine as well.


It’s pretty wild how little discussion there's been about the core feature of these models. It's as if this aspect of their development has been solved. Basically all NLP publications today take these BPE tokens as a starting point and if they are mentioned at all they’re mentioned in passing.

https://blog.seanbethard.net/meanings-are-tiktokens-in-space



very grateful that he puts out this kind of education. The one bit I have is that he didn’t explained all the abstract question and the beginning which leads into bad taste I guess. I hope I am not disrespecting.


I can't recommend enough the whole series, zero to hero: https://karpathy.ai/zero-to-hero.html

No metaphors trying explain "complex" ideas, making them scary and seem overly complex. Instead, hands on implementations with analogy explainers where you can actually understand the ideas and see how simple it is.

Steeper learning curve at first but it is much more satisfying and you actually earn the ability to reason about this stuff instead of writing over the top influencer BS.



One thing I like about that zero-to-hero series is how he almost never handwave over seemingly minor details.

Definitely recommend watching those videos and doing the exercises, if you have any interest in how LLMs work.



Thanks for this link - I have some free time coming up, and this seems like a great use of it!


A noob question? Do you all intend to work on LLM’s or watching the content for the curious mind.I am asking how anyone like me as a software generalist can make use of this amazing content.Anyone with insights on how to transition from a generalist backend engineer to an AI engineer ? Or its a niche and the only path is the route of PHD …


Speaking for myself, and except for just being curious, it's mostly for similar reasons as to why you'd want to read, for example, CLRS, even though you'll probably never implement an algorithm like that in a real production environment yourself. It's not so much about learning how, but rather why, because it'll help you answer your why's in the future (not that the how can't also be important, of course).


I was not really interested in LLMs till a month back. I had an earlier product where I wanted a no-code app for business insights on any data source. Plug in MySQL, PostgreSQL, APIs like Stripe, Salesforce, Shopify, even CSV files and it would be able to generate queries from user's GUI interactions. Like Airtable but for own data sources. I was generating SQLs including JOINs, or HTTPS API calls.

Then I abandoned it in 2021. This year, it struck me that LLMs would be great to infer business insights from the schema. I could create reports and dashboards automatically, surface critical action points straight from the schema/data and users chatting with the app.

So for the last couple weeks, I have been building it, running test on LLMs (CodeLlama, Zephyr, Mistral, Llama 2, Claude and ChatGPT). The results are quite good. There is a lot of tech that I need to handle: schema analysis, SQL or API calls, and the whole UI. But without LLMs, there was no clear way for me to infer business insights from schema + user chats.

To me, this is not a niche anymore now that I have found a problem I wanted to tackle already.



Just a guess, but understanding how LLMs are built may also help you if you want to fine-tune a model. Someone who knows more may confirm or contradict this.


The best thing is that I know Andrej reads all these comments. Hi Andrej! This is your calling. Miss you though!


There should be awards for this type of content. Andrew Ng series and Karpathy series as first inductees to the hall of fame.


His video on Backpropagation was a revelation to me :

https://www.youtube.com/watch?v=q8SA3rM6ckI



Probably more coming soon given he just left openai to pursue other things.


Even if you pay it is hard to get such a high quality content!


I've been learning a few new CS things recently and honestly I mostly find inverse correlation between cost and quality.

There are books from oreilly and paid MOOC courses that are just padded with lots of unnecessary text or silly "concept definition" quizzes to make them seem worth the price.

And there are excellent free YT video lectures, free books or blog posts.

Andrej's YT videos are one great example. https://course.fast.ai is another.



It's not only about the cost, though. There's an inverse correlation with the glossiness of the content as well.

If the web page /content is too polished, they're most likely optimizing for wooing users.

Unlike a lot of the examples I gave in the sibling comments. Where the optimization is only on the love for the topic being discussed



  There's an inverse correlation with the glossiness of the content as well.
This is probably due to survivorship bias. Sites that have poor content and poor visual appeal (glossiness) never get on your radar.

i.e. Berkson's Paradox: https://en.wikipedia.org/wiki/Berkson%27s_paradox



Full ACK. I have also grown weary of payed course offerings, because many I have checked out were basically low quality or shallow.


There are some extremely good CS textbooks which cost money. That being said, many good ML/AI texts are free. But it's not easy reading.


>And there are excellent free YT video lectures, free books or blog posts.

There's also a tremendous amount of extremely low quality YouTube and blog content.



Sure. I don't claim the free content is all good.

But from my limited sample size, the best free content is better than the best paid content.



Do you have recommendations for other high quality courses teaching CS things?


- operating system in three easy pieces (https://pages.cs.wisc.edu/~remzi/OSTEP) is incredible for learning OS internals

- beej's networking guide is the best thing for network layer stuff https://beej.us/guide/

- explained from first principles great too https://explained-from-first-principles.com/

- pintos from Stanford https://web.stanford.edu/class/cs140/projects/pintos/pintos_...



Wow. Thanks for sharing. I had no idea that Professor Remzi and his wife Andrea wrote a book on Operating Systems. I loved his class (took it almost 22 years ago.) Will have to check his book out.


I can highly recommend CS50 from Harvard (https://www.youtube.com/@cs50). Even after being involved in tech for 25+ years, I learnt a lot from just the first lecture alone.

Disclosure: Professor Malan is a friend of mine, but I was a fan of CS50 long before that!



Replying to bookmark(hoard) all the thread links later.

Fellow hackers might also enjoy:

https://www.nand2tetris.org/



Build an 8-bit computer from scratch https://eater.net/8bit/ https://www.youtube.com/playlist?list=PLowKtXNTBypGqImE405J2...

Andreas Kling. OS hacking: Making the system boot with 256MB RAM https://www.youtube.com/watch?v=rapB5s0W5uk

MIT 6.006 Introduction to Algorithms, Spring 2020 https://www.youtube.com/playlist?list=PLUl4u3cNGP63EdVPNLG3T...

MIT 6.824: Distributed Systems https://www.youtube.com/@6.824

MIT 6.172 Performance Engineering of Software Systems, Fall 2018 https://www.youtube.com/playlist?list=PLUl4u3cNGP63VIBQVWguX...

CalTech cs124 Operating Systems https://duckduckgo.com/?t=ffab&q=caltech+cs124&ia=web

try searching here at HN for recommendations https://hn.algolia.com



Thank you a ton for the links.


nand2tetris: https://www.nand2tetris.org/

I like the book better than the online course.



His previous video onLLM tramsformer foundation is extremely useful.


Had to double check my playback speed - he talks like a 1.25x playback speaker sounds.


let him teach we don't need one more rapper


> you see when it's a space egg, it's a single token

I'm not sure if the crew of the Nostromo would agree ;)







Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com