(评论)
(comments)
原始链接: https://news.ycombinator.com/item?id=38020109
为了微调 text-embedding-ada-002 等预训练模型或微调 llm 模型,您通常需要更大的 GPU,具体取决于模型架构和要处理的数据量。 通常,较大的数据集需要更多的计算资源。 这是一个估计:
- Text-Embedding-ADA-002 使用大约 100 亿个参数,并已在批量大小为 16k 的 Nvidia V100 上成功测试。 在实际设置中,Quadro RTX 5000、Quadro RTX 6000 或 Tesla T4 等 GPU 就足够了。 - 微调 LSTM 或基于 Transformer 的 LLM 需要大量内存来存储推理或训练期间的中间激活。 这意味着您需要一张具有更多内存的卡,例如 V100 PCIe Passthrough 或 A100 PCIe Passthrough 与 Quadro RTX 6000 或 5000 的组合。以下是一些参考资料,可以为您提供进一步的见解:
- Intel Broadwell 架构上的 NLP 处理性能基准 (https://software。intel。com/content/www/us/en/attachments/white_paper-314433。pdf) 提供了对模型大小、参数计数和要求的深入了解 硬件加速器。 - NVIDIA 的 DL 管道硬件加速器优化指南 (https://developer。nvidia。com/optimize-dl-training-inference-gpus-nvidias-accelerated-compute-technology) 提供了针对 Nvidia 硬件堆栈优化 dl 管道的建议。 Additionally, consider reading NLP: An Introduction to Natural Language Processing for practical advice on selecting machine learning algorithms that suit your specific task or set of tasks。 祝你好运!
The 8k context window is new, but isn't the 512 token limitation a soft limit anyway? I'm pretty sure I can stuff bigger documents into BGE for example.
Furthermore, I think that most (all?) benchmarks in the MTEB leaderboard deal with very small documents. So there is nothing here that validates how well this model does on larger documents. If anything, I'd pick a higher ranking model because I put little trust in one that only ranks 17th on small documents. Should I expect it to magically get better when the documents get larger?
Plus, you can expect that this model was designed to perform well on the datasets in MTEB while the OpenAI model probably wasn't.
Many also stated that a 8k context embeddings will not be very useful in list situations.
When would anyone use this model?
reply