This time I play with Base64 encoding. It is a bit easier to understand than UTF-8.
I read it on https://www.lifewire.com/ about "How Base64 Encoding Works", and starts experimenting with examples with an online encoder. (Decoder is the reversed process, though it is not demonstrated in this post)
I will show you the examples, four examples.
First of all, as its name implies, base64 is all about 64 different combination of characters, and 2^6 = 64, so basically base64 is 6-bit encoding, can I say so?
Let me quote the original paragraph:
QUOTE
;The 64 characters (hence the name Base64) are 10 digits,
;26 lowercase characters, 26 uppercase characters as well
;as the Plus sign (+) and the Forward Slash (/).
;There is also a 65th character known as a pad, which is
;the Equal sign (=). This character is used when the last
;segment of binary data doesn't contain a full 6 bits.
This is the encoding table, from 0 to 63:;26 lowercase characters, 26 uppercase characters as well
;as the Plus sign (+) and the Forward Slash (/).
;There is also a 65th character known as a pad, which is
;the Equal sign (=). This character is used when the last
;segment of binary data doesn't contain a full 6 bits.

QUOTE
;To ensure the encoded data can be properly printed and does
;not exceed any mail server's line length limit, newline
;characters are inserted to keep line lengths below 76 characters.
This is optional as seen from online encoder:;not exceed any mail server's line length limit, newline
;characters are inserted to keep line lengths below 76 characters.

QUOTE
;At the end of the encoding process, there might be a problem.
;If the size of the original data in bytes is a multiple of three,
;everything works fine. If it is not, there may be empty bytes.
;For proper encoding, exactly 3-bytes of binary data is needed.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.
This is the most tricky part, notice the highlighted text.;If the size of the original data in bytes is a multiple of three,
;everything works fine. If it is not, there may be empty bytes.
;For proper encoding, exactly 3-bytes of binary data is needed.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.
--
So now let's start with examples!
abc (input)
97,98,99 (corresponding ASCII value)
0110 0001, 0110 0010, 0110 0011 (the binary representation of ASCII value)
011000 010110 001001 100011 (grouped into 6-bit block)
24,22,9,35 (ASCII value for each group)
YWJj (output)
Notice how "abc" becomes "YWJj"? a =97, b=98, c=99, each is 8-bit value, so 8 x 3 = 24 bits, it is perfect match for 6-bit grouping, since 24 / 6 = 4, that's why 3 characters input becomes 4 characters output (base64 encoded).
I think the most difficult part when doing programming (assuming you don't use library API) is 6-bit grouping.
After grouping, it is easy to refer 24 = Y, 22 = W, 9 = J, and 35 = j. (the index starts from 0)
Now let's go to the tricky part, what if it cannot fit exactly 6-bit grouping?
abcd
97,98,99,100
0110 0001, 0110 0010, 0110 0011, 0110 0100
011000 010110 001001 100011 011001 00[00][00]
24,22,9,35,25,0
YMJjZA==
This one is same as the first example, except I appended a "d" to the input string. I believe you understand the most part, except the "pad" byte (using 65th character "=").
Let me quote again the original paragraph you read in the beginning of this article post:
QUOTE
For proper encoding, exactly 3-bytes of binary data is needed.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.
The 00[00][00] means two "pad" bytes ("=") are appended to the data to make the last block complete 6-bit group.
Hence two "=" equal sign character at the output.
The example below just requires one "pad" byte:
abcde
97,98,99,100,101
0110 0001, 0110 0010, 0110 0011, 0110 0100, 0110 0101
011000 010110 001001 100011 011001 000110 0101[00]
24,22,9,35,25,6,20
YMJjZGU=
As you see, the [00] in bracket is only one needed to make the last block a complete 6-bit group.
What if we only encode a single 8-bit character "a"? Since it must be grouped to 6-bit block, four zeros (0000) are needed to make the last block full 6-bit.
a
97
0110 0001
011000 01[00][00]
24,16
YQ==
Remember every two zeros appended is one "=" equal sign character.
So, "a" becomes "YQ==" (base64 encoded), wonderful isn't it?
Am I clear with my four examples? I too learn from it myself.
You can try to write a program using raw function to experiment it yourself, and I will let you explore how the base64 decoding works, I believe it is just reversal of encoding.
Hope you enjoy this article post.
Corrections are welcome if there is mistake in my explanation above.
This post has been edited by MatQuasar: Jul 24 2023, 07:59 PM
Jul 24 2023, 07:38 PM, updated 3y ago
Quote
0.0140sec
0.24
5 queries
GZIP Disabled