Cohere Launches Embed 4

simonw · 2025-04-15T16:31:06 1744734666

I have huge respect for Cohere and this embedding model looks like it could be best-in-class, but I find it hard to commit to a proprietary embedding model that's only available via an API when there are such good open weight models available.

I really like the approach Nomic take: their most recent models are available via their API or as open weights for non-commercial use only (unless you buy a license). They later relicense their older models under Apache 2.0 licenses.

This gives me confidence that I can continue to use my calculated vectors in the future even if Nomic's model is no longer available because I can run the local one instead.

Nomic Embed Vision 1.5 for example started out as CC-BY-NC-4.0 but was later relicensed to Apache 2.0: https://www.nomic.ai/blog/posts/nomic-embed-vision

lukebuehler · 2025-04-15T16:28:36 1744734516

I just started to look into multi-modal embedding models recently, and I was surprised how few options there are.

For example, Google's model only supports 30 text tokens [1]!!

This is definitely a welcome addition.

Any pointers to similarly powerful embedding models? I'm looking specifically for text and images? I wish there'd be also one that could do audio and video, but I don't think that exists.

[1] https://cloud.google.com/vertex-ai/generative-ai/docs/embedd...

moojacob · 2025-04-15T16:30:07 1744734607

Seems to under-perform voyage-3-large on the same benchmark. At the same time, I'm unsure how useful benchmarks are for embeddings.

moralestapia · 2025-04-15T16:18:30 1744733910

A bit expensive but the benchmarks look quite good!

(评论) (comments)

(评论)
(comments)