Base64 编码解释
Base64 Encoding, Explained

原始链接: https://www.akshaykhot.com/base64-encoding-explained/

在计算中,Base64 是一种用于将二进制数据转换为可以电子传输的文本格式的技术。 Base64 允许使用安全集中的字符以 ASCII 格式表示数据,这意味着即使是旧系统也可以安全地解释它。 使用由 64 个可打印 ASCII 字符(不包括空格和特殊字符)组成的特定字母表将数据中的每组六位转换为单个数字(由两个字符表示),该字母表通常用于传输电子邮件附件,尤其是包含非美国格式的电子邮件附件-ASCII 字符。 换句话说,Base64 编码涉及将原始二进制数据划分为六位块,使用 Base64 字母表将每个块转换为单个数字,与将其存储为原始二进制文件相比,这会产生更少的字节,从而更容易存储、传输和处理。 展示。 默认情况下,Base64 编码会在数据末尾添加等号('='),以确保其可被 3(或某些情况下为 4)整除,因此无需执行额外的计算来完成每块信息。 This feature enhances efficiency and practicality, particularly when dealing with large amounts of data, but note that adding '=' implies the loss of up to three bytes of data in the process。 Base64 编码可以派上用场的示例包括将图像数据嵌入数据 URL、通过本身不支持二进制数据(例如 HTTP)的协议通信数据,以及允许通常以二进制形式表示的文件(例如 PDF)的文本编辑功能 documents or videos), to mention a few scenarios。 然而,无论上下文如何,Base64 编码都是一种将原始二进制数据转换为人类可读且普遍可访问的格式的宝贵方法,这在一定程度上得益于流行的编程语言(如 Ruby、PHP 和 Javascript)以及广泛采用的工具(如 cat)。 , vi, emacs, and bash。 Overall, Base64 provides a convenient mechanism for storing, exchanging, and processing data over a wide variety of platforms and devices while ensuring compatibility, accuracy, and speed。 For anyone unfamiliar with Base64 encoding, learning this skill will provide significant benefits in terms of improving their productivity as well as broadening their horizons in computer science。 可以通过查阅相关网站或相关 RFC 文档标准来获取深入探讨该主题的更多资源

是的当然。 不同的文本编码以不同的二进制格式表示相同的字符。 然而,这并不意味着所有文本都已经是二进制格式的,并且我们可以直接传输或操作它。 首先,我们需要将这些二进制格式解释为文本信息,然后才能对它们执行任何其他操作。 当我们谈到“文本”时,我们指的是以特定编码表示的字符序列。 这些编码字符从源文本转换为二进制格式以进行传输或存储,但我们对该二进制格式的解释允许我们在接收或加载后将文本恢复为其原始格式。 虽然我们可以从技术上将文本定义为特定二进制格式(例如 7 位 ASCII)的同义词,但这在可能使用其他文本编码的情况下会造成不必要的混乱和复杂性。 处理各种编码中的文本变得更加复杂,特别是在处理缺乏适当支持机制的不太常见的文本编码时(正如我们在较旧的编程环境中看到的那样)。 然而,在当今的计算世界中,大多数“非文本”二进制数据实例实际上只是二进制流,除了其底层二进制结构之外,没有任何固有含义。 本质上,为了避免不必要的复杂性,我们在某些情况下将某些二进制格式解释为表示文本,无论目标二进制格式是否支持表达所呈现概念的必要功能。 因此,我们对二进制数据的感知是基于我们如何与它交互,而不是基于所述数据与文本是否具有固有或显式关系。 尽管如此,为了保持任何给定文本编码中的实际文本和二进制数据之间的区别,如果不首先将其解释为该特定编码中的文本,则不能认为二进制数据具有固有意义。 总之,在讨论文本或传输或操作文本的方法时,至关重要的是要考虑文本可能存在或传输的二进制格式,以及将每种类型的文本与每种各自的文本编码和相应的文本相关联所固有的细微差别。 二进制格式,反之亦然,而不是仅仅关注任何特定编码本身的固有性,并将其视为完全独立且专属于二进制数据的东西。 此外,为了进一步澄清有关文本和二进制数据之间关系的要点,应该注意的是,虽然当然可以以二进制格式嵌入文本,但大多数文本都以如下方式传输:
相关文章

原文

When you're programming, it's easy to get by with a superficial understanding of many things. You can easily fool yourself by thinking that you are programming when you are blindly copy + pasting code from Stack Overflow or some random article you stumbled upon.

Base64 encoding was one of these topics that was bugging me for a while. I often came across Base64 encoded images or URLs, and had no idea whatsoever it meant or why it was even used. Finally, I decided to do some research to fill that knowledge gap, and spent the Sunday reading RFC 4648 (my idea of a fun weekend).

What follows is everything I learned about Base64 encoding.

💡

Update: This article got to the front page of Hacker News and a ton of interesting discussion followed. Check it out.

What is Base64 Encoding?

Base64 encoding takes binary data and converts it into text, specifically ASCII text. The resulting text contains only letters from A-Z, a-z, numbers from 0-9, and the symbols + and /.

As there are 26 letters in the alphabet, we have 26 + 26 + 10 + 2 characters. Hence this encoding is named Base64. These 64 characters are considered "safe", that is, they cannot be misinterpreted by legacy computers and programs unlike characters such as , >, \n and many others.

💡

Here's what the text "Ruby on Rails" looks like when Base64 encoded: UnVieSBvbiBSYWlscw==.

It's important to remember that we are not encrypting the text here. Given Base64 encoded data, it's very easy to convert it back (decode) to the original text. We are only changing the representation of the data, i.e. encoding.

In its essence, Base64 encoding uses a specific, reduced set of characters to encode binary data, to prevent against data corruption.

The Base64 Alphabet
The Base64 Alphabet

As there are only 64 characters available to encode into, we can represent them using only 6 bits, because 2^6 = 64. Every Base64 digit represents 6 bits of data. There are 8 bits in a byte, and the closest common multiple of 8 and 6 is 24. So 24 bits, or 3 bytes, can be represented using four 6-bit Base64 digits.

(If that last paragraph totally went over your head, don't worry. Hopefully it should be clear by the end of this post.)

Why Base64?

You must have included an image in your HTML document using the tag. Did you know you can embed the image data directly into the HTML without linking to the external image file? Data URLs let you do this, and they use Base64 encoded text to embed files inline.



data:[][;charset=][;base64],
Historically it has been used to encode binary data in email messages where the email server might modify line-endings. A more modern example is the use of Base64 encoding to embed image data directly in HTML source code. Here it is necessary to encode the data to avoid characters like '' being interpreted as tags.
From: Why do we use Base64?

Another common use case is when we have to store or transmit some binary data over the network that's supposed to handle text, or US-ASCII data. This ensures data remains unchanged during transport. Base64 can also be used for passing data in URLs when that data includes non-URL friendly characters.

Base encoding is also used in many applications simply because it makes it possible to manipulate objects with text editors.

You can also transfer files as text, using Base64 encoding. First, get the file's bytes and encode them as Base64. Then transfer the Base64 encoded string, and then decode it back to the original file content on the receiving side.

Let's take a deeper look into this algorithm in the next section.

Base64 Encoding Algorithm

Here's the simple algorithm that converts some text into Base64.

  1. Convert the text to its binary representation.
  2. Divide the bits into groups of 6 bits each.
  3. Convert each group to a decimal number from 0-63. It cannot be greater than 64 as there are only 6 bits in each group.
  4. Convert this decimal number to the equivalent Base64 character using the Base64 alphabet.

That's it. You have a Base64 encoded string. If there're insufficient bits in the final group, you can use = or == as padding.

Sounds confusing? Don't worry, the following example should make it pretty clear. Let's convert my name "Akshay" to its Base64 equivalent string.

  • Convert the text "Akshay" to binary by first converting each character to its corresponding ASCII number and then converting that decimal number to binary (or just use this tool):
01000001 01101011 01110011 01101000 01100001 01111001

   A        k        s        h        a        y
  • Divide the bits into groups of 6 bits:
010000 010110 101101 110011 011010 000110 000101 111001
  • Convert each group to a decimal number between 0 to 63:
010000 010110 101101 110011 011010 000110 000101 111001
  
  16     22     45     51     26     6      5      57
  • Now use the Base64 alphabet (see above image) to convert each decimal number to its Base64 representation:
16  22  45  51  26  6  5  57

Q   W   t   z   a   G  F  5

And we're done. The name "Akshay" is represented in Base64 as QWtzaGF5.

At first glance, the benefit of Base64 encoding is not quite obvious. What exactly did we achieve by converting "Akshay" to "QWtzaGF5"?

Imagine, instead of my name, you had an image or a sensitive file (PDF, text, video, anything, really), and you wanted to store it as text. You could first convert it to binary, and then Base64 encode it to get corresponding ASCII text.

Now you could send or store that text anywhere and anyhow you like, without worrying whether some legacy device, protocol or software won't misinterpret the raw binary data to corrupt your file. Makes sense?

How to Encode and Decode Base64

All programming languages have support for encoding and decoding data to and from the Base64 format.

Here is the Ruby code that takes some text as input and converts it into Base-64 encoded string.

require "base64"

encoded = Base64.encode64("Ruby on Rails")  # "UnVieSBvbiBSYWlscw==\n"

decoded = Base64.decode64(encoded)  # "Ruby on Rails"

Here's the equivalent program in C#, my second-most favorite language:

public static string ToBase64(string value)
{
    byte[] bytes = System.Text.Encoding.ASCII.GetBytes(value);

    string base64 = Convert.ToBase64String(bytes);

    return base64;
}

public static string FromBase64(string encoded)
{
    byte[] data = System.Convert.FromBase64String(encodedString);
    
    string decodedString = System.Text.Encoding.UTF8.GetString(data);
}

PHP makes it very simple with its base64_encode and base64_decode top-level functions.

Similarly, in JavaScript, use the btoa() to encode and atob() functions to encode and decode the text.

const text = "Ruby on Rails"
btoa(text) // "UnVieSBvbiBSYWlscw==\n"

const encoded_text = "UnVieSBvbiBSYWlscw==\n"
atob(encoded_text)  // Ruby on Rails

What's more, your terminal has built-in support for Base64 encoding. Try this in terminal:

$ echo "akshay" | base64
YWtzaGF5Cg==

$ echo "YWtzaGF5Cg==" | base64 -d
akshay

That's a wrap. I hope you found this article helpful and you learned something new. If you are interested in learning more, I highly recommend you read RFC 4648, which describes the Base64 encoding in detail.
As always, if you have any questions or feedback, didn't understand something, or found a mistake, please leave a comment below or send me an email. I reply to all emails I get from developers, and I look forward to hearing from you.
If you'd like to receive future articles directly in your email, please subscribe to my blog. If you're already a subscriber, thank you.

联系我们 contact @ memedata.com