我把一个网站存进了网站图标里

我把一个网站存进了网站图标里
I Stored a Website in a Favicon

原始链接: https://www.timwehrle.de/blog/i-stored-a-website-in-a-favicon/

在尝试过将数据隐藏在硬件寄存器中后，作者又探索了一种非传统的存储介质：网站图标（favicon）。通过将图像的 RGB 像素值视为原始字节，作者成功地将整个 HTML 文档编码到了一个 9x9 像素的图标中。该过程包括将 HTML 文本转换为字节、在头部添加长度信息，并将这些字节直接映射到图像像素的颜色通道。生成的图标看起来像视觉噪点，但其中包含了网页的完整源代码。为了提取数据，该网站使用了一个 JavaScript 引导加载程序，将图标绘制到 HTML 画布上，读取像素值，并重构出原始文本。尽管作者承认这种方法并不实用，且需要加载脚本的支持，但该项目是对数据存储极限的一次创意探索。它凸显了一个原则：数字资产——无论其预期用途如何——归根结底都只是一组可以重新用于数据存储的字节。您可以通过作者提供的链接查看该实验及其源代码。

最近的一场 Hacker News 讨论聚焦于一种创新但颇具争议的技术：将数据存储在网站图标（favicon）中。虽然最初的帖子详细介绍了一个实验性的实现方案，但评论者们很快探讨了该方法的更广泛影响。讨论主要围绕两个主题：实用性和安全风险。在功能方面，用户指出了其潜在的应用场景，例如直接在 URL 或图标中存储个人资料数据。然而，其带来的安全隐患却不容忽视；由于浏览器通常会长期缓存图标（即使在隐私浏览模式下也是如此），这项技术可能会被用于浏览器指纹识别、跨配置文件跟踪，或绕过基于 Cookie 的隐私限制。讨论认为，虽然将数据隐藏在小型图像文件中是一项巧妙的技术手段，但它也为跟踪用户和规避现代隐私保护措施创造了一种隐蔽的途径。

原文

A while ago I wrote about storing two bytes inside my mouse's DPI register.

It wasn't useful. It wasn't practical. But it did something unfortunate to my brain.

Once you've successfully hidden data somewhere it doesn't belong, you start looking at everything as potential storage.

A monitor is storage.

A keyboard is storage.

A BIOS splash screen is (maybe) storage.

A favicon is storage.

And yes, here we are.

Every website has a favicon. It's that little icon in your browser tab. Usually you upload it once and then never think about it again. But. A favicon is just an image. An image is just pixels. And pixels are just bytes.

So of course I wondered if I could store something inside one.

The idea

My first thought was steganography.

Steganography is basically about hiding data in an image without making it obvious. You take a perfect normal photograph and modify a few bits so it secretly contains a message.

The favicon itself (at least in my demo) doesn't need to look like an icon. It could become pure storage.

Every pixel has red, green and blue values. That's three bytes.

If I wanted to store text, I could just take the UTF-8 bytes of the text and write them directly into the RGB channels.

The browser doesn't care what those bytes represent. To the browser they're colors. To me they're HTML.

Building a favicon website

I started with a tiny HTML payload:

<h1>Website in a Favicon</h1> 
<p>Everything you're reading right now was decoded from favicon pixels.</p>

The process is pretty straightforward.

First I convert the HTML into bytes using TextEncoder.

Then I prepend four bytes containing the payload length.

The length header is important because the image itself may contain unused pixels at the end. If there's no length value, there's no way to know where the real payload stops.

Once I have the byte array, I start filling pixels.

The first byte becomes the red channel of the first pixel.

The second byte becomes the green channel.

The third byte becomes the blue channel.

Then the next pixel. And the next. And the next…

Eventually the entire HTML document exists as colored pixels.

The resulting image looks like visual noise.

Very small

What surprised me most wasn't that it worked, to be honest. It was how small the resulting image was.

The payload ended up being 208 bytes.

Adding the 4-byte header brings the total to 212 bytes.

Since every pixel stores three bytes, I needed:

212 bytes total
71 pixels
A square image large enough to contain them

The smallest square that works is 9x9 pixels.

That's only 81 pixels.

The final stats looked like this:

Payload: 208 bytes
Image size: 9x9 pixels
Capacity: 239 bytes
Used: 87%

Somehow a whole little website (okayy, html with some styling) fits inside an image that's smaller than the usual favicon.

Reading the website back out

Storing data is only half the problem. The other half is getting it back.

Browsers already have everything needed for this.

The favicon gets loaded as image.
The image gets drawn onto a canvas.
The canvas API lets JavaScript read every pixel.

Once I have the pixel data, I simply reverse the process.

Read the RGB values.
Reconstruct the byte array.
Read the first four bytes to determine the payload length.
Extract the payload.
Decode the UTF-8 text.

At that point I have the original HTML again.

The browser read a website out of its own favicon.

The important catch

The favicon doesn't actually contain the whole website itself.

It contains the content of a website.

You still need a tiny bootstrap loader to decode the image.

Without the JavaScript the favicon is just a PNG (which contains your website content).

For showing this scenario the site includes a "Render Website" button. It reads the favicon, decodes the HTML, and replaces the page with the reconstructed content.

Is this useful?

No, of course not.

The amount of data you can store is tiny. The page needs JavaScript to bootstrap itself. There are dozens of better ways to distribute a small HTML document.

But at the end its about testing the boundaries, right?

A favicon feels like a very specific thing. It's supposed to be an icon.

But at the end it can just be a PNG.

And a PNG file is basically just bytes.

And this is probably the smallest website I've built…

Here is the link to the site: https://www.timwehrle.de/labs/favicon-site/

And if you want to see how it works: https://github.com/timwehrle/favicon