Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 Playing with Base64 Encoding (with examples), When 'abc' becomes 'YWJj'

views
     
TSMatQuasar
post Jul 24 2023, 07:38 PM, updated 3y ago

Casual
***
Validating
329 posts

Joined: Jun 2023
Since I gained understanding about UTF-8 (my old forum post by FlierMate), I haven't done any research on another encoding.

This time I play with Base64 encoding. It is a bit easier to understand than UTF-8.

I read it on https://www.lifewire.com/ about "How Base64 Encoding Works", and starts experimenting with examples with an online encoder. (Decoder is the reversed process, though it is not demonstrated in this post)

I will show you the examples, four examples.

First of all, as its name implies, base64 is all about 64 different combination of characters, and 2^6 = 64, so basically base64 is 6-bit encoding, can I say so?

Let me quote the original paragraph:

QUOTE
;The 64 characters (hence the name Base64) are 10 digits,
;26 lowercase characters, 26 uppercase characters as well
;as the Plus sign (+) and the Forward Slash (/).
;There is also a 65th character known as a pad, which is
;the Equal sign (=). This character is used when the last
;segment of binary data doesn't contain a full 6 bits.
This is the encoding table, from 0 to 63:
user posted image

QUOTE
;To ensure the encoded data can be properly printed and does
;not exceed any mail server's line length limit, newline
;characters are inserted to keep line lengths below 76 characters.
This is optional as seen from online encoder:

user posted image

QUOTE
;At the end of the encoding process, there might be a problem.
;If the size of the original data in bytes is a multiple of three,
;everything works fine. If it is not, there may be empty bytes.
;For proper encoding, exactly 3-bytes of binary data is needed.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.
This is the most tricky part, notice the highlighted text.

--

So now let's start with examples!

abc (input)
97,98,99 (corresponding ASCII value)
0110 0001, 0110 0010, 0110 0011 (the binary representation of ASCII value)
011000 010110 001001 100011 (grouped into 6-bit block)
24,22,9,35 (ASCII value for each group)
YWJj (output)


Notice how "abc" becomes "YWJj"? a =97, b=98, c=99, each is 8-bit value, so 8 x 3 = 24 bits, it is perfect match for 6-bit grouping, since 24 / 6 = 4, that's why 3 characters input becomes 4 characters output (base64 encoded).

I think the most difficult part when doing programming (assuming you don't use library API) is 6-bit grouping.
After grouping, it is easy to refer 24 = Y, 22 = W, 9 = J, and 35 = j. (the index starts from 0)

Now let's go to the tricky part, what if it cannot fit exactly 6-bit grouping?

abcd
97,98,99,100
0110 0001, 0110 0010, 0110 0011, 0110 0100
011000 010110 001001 100011 011001 00[00][00]
24,22,9,35,25,0
YMJjZA==


This one is same as the first example, except I appended a "d" to the input string. I believe you understand the most part, except the "pad" byte (using 65th character "=").

Let me quote again the original paragraph you read in the beginning of this article post:

QUOTE
For proper encoding, exactly 3-bytes of binary data is needed.
;The solution is to append enough bytes with a value of 0 to
;create a 3-byte group. Two such values are appended if the data
;needs one extra byte of data, one is appended for two extra bytes.


The 00[00][00] means two "pad" bytes ("=") are appended to the data to make the last block complete 6-bit group.
Hence two "=" equal sign character at the output.

The example below just requires one "pad" byte:

abcde
97,98,99,100,101
0110 0001, 0110 0010, 0110 0011, 0110 0100, 0110 0101
011000 010110 001001 100011 011001 000110 0101[00]
24,22,9,35,25,6,20
YMJjZGU=


As you see, the [00] in bracket is only one needed to make the last block a complete 6-bit group.

What if we only encode a single 8-bit character "a"? Since it must be grouped to 6-bit block, four zeros (0000) are needed to make the last block full 6-bit.

a
97
0110 0001
011000 01[00][00]
24,16
YQ==


Remember every two zeros appended is one "=" equal sign character.

So, "a" becomes "YQ==" (base64 encoded), wonderful isn't it? rclxs0.gif

Am I clear with my four examples? I too learn from it myself.

You can try to write a program using raw function to experiment it yourself, and I will let you explore how the base64 decoding works, I believe it is just reversal of encoding.

Hope you enjoy this article post.

Corrections are welcome if there is mistake in my explanation above.

This post has been edited by MatQuasar: Jul 24 2023, 07:59 PM
jibpek
post Jul 24 2023, 08:04 PM

Enthusiast
*****
Junior Member
710 posts

Joined: Jul 2012
padding is optional.

b64 url safe version not mentioned.

TS rookie
flashang
post Jul 25 2023, 09:06 AM

Casual
***
Junior Member
355 posts

Joined: Aug 2021


QUOTE(jibpek @ Jul 24 2023, 08:04 PM)
padding is optional.

b64 url safe version not mentioned.

TS rookie
*
The general strategy is to choose 64 characters that are common to most encodings and that are also printable.

for base64url (URL- and filename-safe standard), replace "+" with "-", "/" with "_",
But "-" and "_" may confuse reader on print media.

Full document :
Base64 - Wikipedia
https://en.wikipedia.org/wiki/Base64#Variants_summary_table

smile.gif



This post has been edited by flashang: Jul 25 2023, 09:07 AM
iammyself
post Jul 28 2023, 07:42 PM

Getting Started
**
Junior Member
238 posts

Joined: May 2011
Nice writeup.

This post has been edited by iammyself: Jul 28 2023, 07:43 PM
TSMatQuasar
post Jul 28 2023, 08:11 PM

Casual
***
Validating
329 posts

Joined: Jun 2023
QUOTE(iammyself @ Jul 28 2023, 07:42 PM)
Nice writeup.
*
Thanks for your support! biggrin.gif

 

Change to:
| Lo-Fi Version
0.0181sec    0.81    5 queries    GZIP Disabled
Time is now: 23rd December 2025 - 11:03 PM