显示HN:学习LLMS Leetcode样式
Show HN: Learn LLMs LeetCode Style

原始链接: https://github.com/Exorust/TorchLeet

Torchleet提供了分类为“问题集”和“ LLM集”的Pytorch实践问题,以增强深度学习技能。问题集的范围从初学者到高级主题,例如张量,自动克拉德,CNN,gans等。 LLM集合着重于从头开始实施大型语言模型,涵盖了注意机制,嵌入和高级LLM技术,例如量化和增强学习。 每个问题都包括一个不完整的代码块和“动手实践”的“#todo”评论,使其非常适合指导学习。还提供了解决方案以进行比较。该项目在``解决方案/''中的问题/“目录和解决方案”中的问题结构。鼓励用户通过添加新的,有据可查的问题或在既定项目结构后改善现有问题来做出贡献。

这个黑客新闻线程讨论了“ Lear LLMS Leetcode样式”,这是一个旨在通过类似于LeetCode的问题解决方法来教LLM的GitHub项目。 用户赞美这个想法,但指出与leetcode的差异,指出问题更开放,这既可以是专业人士又是骗局。一个建议是为每项练习添加可重现的数据生成功能和清晰的评估指标,以确保代码质量。 作者,Exorust承认该项目将GPT用于生成,并计划增加披露。这引发了关于使用LLM创建学习资源的适当性的辩论,同时建议不要在解决问题中使用。 其他建议包括发布用于生成透明性问题的提示。一些用户还要求对学习Pytorch和Cuda等低级ML工具的建议提出建议。
相关文章

原文

TorchLeet is broken into two sets of questions:

  1. Question Set: A collection of PyTorch practice problems, ranging from basic to hard, designed to enhance your skills in deep learning and PyTorch.
  2. LLM Set: A new set of questions focused on understanding and implementing Large Language Models (LLMs) from scratch, including attention mechanisms, embeddings, and more.

Note

Avoid using GPT. Try to solve these problems on your own. The goal is to learn and understand PyTorch concepts deeply.

Mostly for beginners to get started with PyTorch.

  1. Implement linear regression (Solution)
  2. Write a custom Dataset and Dataloader to load from a CSV file (Solution)
  3. Write a custom activation function (Simple) (Solution)
  4. Implement Custom Loss Function (Huber Loss) (Solution)
  5. Implement a Deep Neural Network (Solution)
  6. Visualize Training Progress with TensorBoard in PyTorch (Solution)
  7. Save and Load Your PyTorch Model (Solution)
  8. Implement Softmax function from scratch

Recommended for those who have a basic understanding of PyTorch and want to practice their skills.

  1. Implement a CNN on CIFAR-10 (Solution)
  2. Implement an RNN from Scratch (Solution)
  3. Use torchvision.transforms to apply data augmentation (Solution)
  4. Add a benchmark to your PyTorch code (Solution)
  5. Train an autoencoder for anomaly detection (Solution)
  6. Quantize your language model (Solution)
  7. Implement Mixed Precision Training using torch.cuda.amp (Solution)

These problems are designed to challenge your understanding of PyTorch and deep learning concepts. They require you to implement things from scratch or apply advanced techniques.

  1. Implement parameter initialization for a CNN (Solution)
  2. Implement a CNN from Scratch
  3. Implement an LSTM from Scratch (Solution)
  4. Implement AlexNet from scratch
  5. Build a Dense Retrieval System using PyTorch
  6. Implement KNN from scratch in PyTorch

These problems are for advanced users who want to push their PyTorch skills to the limit. They involve complex architectures, custom layers, and advanced techniques.

  1. Write a custom Autograd function for activation (SILU) (Solution)
  2. Write a Neural Style Transfer
  3. Build a Graph Neural Network (GNN) from scratch
  4. Build a Graph Convolutional Network (GCN) from scratch
  5. Write a Transformer (Solution)
  6. Write a GAN (Solution)
  7. Write Sequence-to-Sequence with Attention (Solution)
  8. [Enable distributed training in pytorch (DistributedDataParallel)]
  9. [Work with Sparse Tensors]
  10. Add GradCam/SHAP to explain the model. (Solution)
  11. Linear Probe on CLIP Features
  12. Add Cross Modal Embedding Visualization to CLIP (t-SNE/UMAP)
  13. Implement a Vision Transformer
  14. Implement a Variational Autoencoder

An all new set of questions to help you understand and implement Large Language Models from scratch.

Each question is designed to take you one step closer to building your own LLM.

  1. Implement KL Divergence Loss
  2. Implement RMS Norm
  3. Implement Byte Pair Encoding from Scratch (Solution)
  4. Create a RAG Search of Embeddings from a set of Reviews
  5. Implement Predictive Prefill with Speculative Decoding
  6. Implement Attention from Scratch (Solution)
  7. Implement Multi-Head Attention from Scratch (Solution)
  8. Implement Grouped Query Attention from Scratch (Solution)
  9. Implement KV Cache in Multi-Head Attention from Scratch
  10. Implement Sinusoidal Embeddings (Solution)
  11. Implement ROPE Embeddings (Solution)
  12. Implement SmolLM from Scratch (Solution)
  13. Implement Quantization of Models
    1. GPTQ
  14. Implement Beam Search atop LLM for decoding
  15. Implement Top K Sampling atop LLM for decoding
  16. Implement Top p Sampling atop LLM for decoding
  17. Implement Temperature Sampling atop LLM for decoding
  18. Implement LoRA on a layer of an LLM
    1. QLoRA
  19. Mix two models to create a mixture of Experts
  20. Apply SFT on SmolLM
  21. Apply RLHF on SmolLM
  22. Implement DPO based RLHF
  23. Add continuous batching to your LLM
  24. Chunk Textual Data for Dense Passage Retrieval
  25. Implement Large scale Training => 5D Parallelism

What's cool? 🚀

  • Diverse Questions: Covers beginner to advanced PyTorch concepts (e.g., tensors, autograd, CNNs, GANs, and more).
  • Guided Learning: Includes incomplete code blocks (... and #TODO) for hands-on practice along with Answers
  • <E/M/H><ID>/: Easy/Medium/Hard along with the question ID.
  • <E/M/H><ID>/qname.ipynb: The question file with incomplete code blocks.
  • <E/M/H><ID>/qname_SOLN.ipynb: The corresponding solution file.
  • Navigate to questions/ and pick a problem
  • Fill in the missing code blocks (...) and address the #TODO comments.
  • Test your solution and compare it with the corresponding file in solutions/.

Happy Learning! 🚀

Feel free to contribute by adding new questions or improving existing ones. Ensure that new problems are well-documented and follow the project structure. Submit a PR and tag the authors.

Stargazers over time

联系我们 contact @ memedata.com