好的主色:如何选择图像的主要颜色
Okmain: How to pick an OK main colour of an image

原始链接: https://dgroshev.com/blog/okmain/

## Okmain:为您的卡片提供更好的背面颜色 许多应用程序使用单一颜色来表示卡片的背面,该颜色基于其图像,通常是通过将图像缩小到单个像素并使用该颜色来实现。然而,这经常导致颜色暗淡、浑浊。为了解决这个问题,作者开发了 **Okmain**,一个 Rust 库(带有 Python 封装器),用于从图像中提取具有视觉代表性的颜色。 Okmain 通过在 **Oklab 颜色空间** 中使用 **颜色聚类**(使用 K-means,最多四个聚类)来改进简单的调整大小——这是一种感知上统一的空间,可以避免 sRGB 平均值的浑浊感。然后,它根据像素数量、图像内的中心位置(权重偏向中心)和颜色 **色度**(饱和度)对这些聚类进行排序。 性能至关重要,因此 Okmain 会对图像进行降采样并利用自动矢量化的优化。作者还尝试使用 LLM 代理来辅助开发,发现它对初始草稿和调试很有帮助,但最终需要手动完善关键的、对性能敏感的代码。 Okmain 可以在大约 100 毫秒内从多兆像素图像中提取主色调,并且可在 crates.io 和 PyPI 上获取。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 好的:如何选择一张图片的合适主色调 (dgroshev.com) 4 分,作者 dgroshev 1小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Your app has a card with an image. You want the back of the card to be a solid colour that is somewhat representative of the image and also visually pleasant. How would you do that?

A company I consult for did that by resizing the entire image to 1x1 (a single pixel) and using the colour of the pixel. This is a super popular approach! However, the colours were often dull and muddy even when the original image had vivid colours. It irked me, so I spent a weekend searching for prior art and trying a few tricks to do better. Then, I wrote a library. Inspired by Oklab's naming, it's called Okmain because it looks for an OK main colour:

Comparison of okmain to 1x1 pixel resizing

Here are the tricks I came up with:

  • colour clustering
  • Oklab colour calculations
  • chroma + position cluster sorting

The rest is just implementing the tricks in Rust, writing a Python wrapper, making everything fast and robust, writing documentation, releasing to crates.io and PyPI, and writing this blogpost. Easy!

Colour clustering

Most images have multiple clusters of colours, so simply averaging all colours into one doesn't work well. Take this image: while the green of the field and the blue of the sky are beautiful colours, simply averaging the colours produces a much less exciting colour (source):

A photo of a pasture with a solid block of colour to the left of it

Instead, we can find groups of similar colours and average inside the group. K-means is a well-known algorithm for exactly that. We can run it on all pixels, clustering their colours and ignoring the pixel positions (for now).

For Okmain, I decided to only allow up to four clusters. In my testing, it was enough for decent quality, and limiting the number was handy to make clustering more performant. We will come to that later.

Here's how the image looks after colour clustering (including the extracted groups as swatches):

A photo of a pasture with extracted centroids and a version with clustered colours

Note, however, that not all images have four meaningfully distinct clusters. Picking the number of clusters in a general case is a non-trivial problem. However, with just four colours it's simple enough to check if all clusters have different enough colours and re-run with fewer clusters if some clusters are too similar. Here's an example of an image with three distinct clusters (source):

A photo of a red moon over dark blue sea

Oklab

Another reason for the muddiness is the resizing library operating directly on sRGB colours.

In either clustering or resizing, colours need to be averaged. In a naïve implementation, this is done in the same colour space the image is in, which is most likely to be sRGB: red, green, and blue subpixel values with gamma correction applied. This is not ideal for two reasons.

First, gamma correction is non-linear, and applying linear operations over the correction leads to incorrect results.

Second, perceived colour intensity is also non-linear, which is why a sweep through all colours without correcting for perceptual differences produces vertical strips in the gradient (source):

A comparison of HSV and Oklch gradients

To solve both problems at once, Okmain operates in the Oklab colour space.

The result of averaging colours in Oklab is smoother mixing with fewer muddy browns:

Comparison of mixing colours in sRGB and Oklab

Here, a pixel at (X, Y) is a mix of colour X (from the top gradient) with colour Y (from the left gradient). In the top right triangle, I'm mixing the two colours in sRGB. In the bottom left, the colours are first transformed to Oklab, mixed, and then transformed back to sRGB. The sRGB triangle is visibly less smooth, with too many muddy browns in green+yellow and blue+orange areas.

This over-representation is what skews sRGB-averaged "main colours" towards unattractive, dirty-looking colours.

Cluster sorting

After colours are clustered in Oklab, the clusters need to be sorted by their visual prominence. After all, the user likely wants the more prominent, dominant colour and not just four colours with no idea which one is more prominent.

I came up with three heuristics for how prominent a cluster is:

  • how many pixels are in the cluster?
  • how central the pixels are?
  • how visually prominent the colour is in itself?

Okmain combines the first two heuristics into one and calculates the number of pixels per cluster, discounting pixels that are closer to the periphery using a mask that looks like this (by default):

Distance mask applied to pixels

Intuitively, pixels that are closer to the centre of the image are more prominent, but only to an extent. If a pixel is central enough, it doesn't matter where it is exactly.

This weighting ensures that the most prominent colour on an image like this is the foreground green and not the background grey. The swatches are sorted top-down, the most prominent at the top (source):

An image of a green pendant on the grey background

Finally, Okmain tries to guess how visually prominent a particular colour is. This is tricky because prominence depends on how much a colour contrasts with other colours. However, using Oklab chroma (saturation) as a proxy for prominence seems to help on my test set, so it's now a factor in Okmain.

Performance

I wanted Okmain to not just produce nice colours but also be reasonably fast, ideally comparable to a simple 1x1 resize. I spent some time optimising it.

The simplest optimisation is to reduce the amount of data. Okmain downsamples the image by a power of two until the total number of pixels is below 250,000, simply averaging pixel values in Oklab. This also helps to remove noise and "invisible colours": on a photo of an old painting, paint cracks can create their own colour cluster, but ideally they should be ignored.

The downsampling is also an opportunity to de-interleave the pixels from an RGBRGBRGB… array into a structure-of-arrays (three separate arrays of L, a, and b floats), which helps to make a lot of downstream code trivially auto-vectorisable. Having a low fixed number of clusters that fits into a SIMD register (f32x4) seems to help, too.

One of the biggest hurdles for auto-vectorisation is Rust's insistence on correct floating point math. This is a great default, but it's impossible to opt out on stable Rust yet.

Another complexity is runtime dispatch based on the available instruction set. Rust defaults to a very conservative SIMD instruction set (SSE2), and it's a correct solution for a library like Okmain. However, AVX2 seems to help even for auto-vectorisation, so eventually I'll just add a dispatch with something like target_feature_dispatch. For now, Okmain is fast enough, extracting dominant colours from multi-megapixel images in around 100ms.

Initially, I implemented mini-batch k-means clustering, but for this particular case it proved slower after accounting for the sampling step. The entire dataset is small enough to fit into the cache, so going through the full dataset is quicker than having to first extract a mini-batch with unpredictable branching, even if the mini-batch itself is much smaller. K-means++ initialisation, on the other hand, helps a lot, despite the upfront cost of picking good starting points.

A tangent on LLMs

I was curious how LLM agents would work on this project. It felt like a good fit for agentic development: a small, well-constrained problem, greenfield development, and a lot of pre-existing data in the training set since k-means is a very popular algorithm. Armed with Opus (4.5 and 4.6) and sprites.dev for sandboxed, accept-everything autonomous development, I tried to retrace Mitchell Hashimoto's steps. The results are mixed, but I learned a lot.

With a good explanation and the planning mode, the very first version was ready really quickly. Unfortunately, it was subtly wrong in several places, and the code was awkward and hard to read. Additionally, I tried to make the code autovectorisation-friendly, and Opus seems confidently wrong about autovectorisation more often than it's right. Closing the loop with cargo asm helped, but the loop ate tokens frighteningly fast, and Opus was still struggling to be both idiomatic and verifiably vectorised.

After a few evenings and many tokens of trying to make Opus write as cleanly as I wanted, I gave up and rewrote the most crucial parts from scratch. In my opinion, the manual rewrite is cleaner and clearer, and this is a part where readability matters, since it's the hottest part of the library.

It seems that even frontier LLMs are struggling with intentful abstraction. LLMs split things out in the most mechanical way possible, instead of trying to communicate the intent with how things are split.

On the other hand, with the core API settled, Opus saved me a lot of time working autonomously on "debug" binaries that are easy to read through and don't need to be developed any further. I suppose that's exactly what Mitchell meant by "outsourcing slam dunks" — this works very well.

Throughout this experience, Sprites' stability was a thorn in my side. The UX and the idea are great when it works, but I had my sprite slow down to a crawl every few days. Once it went completely down and was unconnectable for most of the day. I hope fly.io folks make Sprites more stable. It's a super convenient way to run agents.

Good for now

I'm pretty satisfied with how this project turned out. You all got a decent library, and I learned more about k-means, SIMD, releasing mixed Python/Rust libraries, productive greenfield LLM use, and general performance.

Now go and extract all the main colours!

P.S. / Shameless plug: are you a manager at Apple London? Let's talk.


联系我们 contact @ memedata.com