In this article I take a quick look at the Base64 binary-to-text encoding system, and see how to encode and decode in Python.
Why binary to text encoding?
So, back in the day, there was a need to transmit binary data such as images, over dial-up communications lines. The problem with sending binary data is the protocol can end up treating certain binary codes as control codes. For example,
00000010 might be interpreted as the
STX control code, when it's actually just binary data in an image. The solution they came up with is to convert binary data into safe text. Safe text are ASCII characters
=. These characters are standard ASCII characters and are never interpreted by mistake as control codes.
What is Base64 encoding?
Base64 is a six digit code where each of the 'safe' ASCII characters has a corresponding 6-bit binary code associated with it. 6-bits means there are 64 entries in the encoding table, where
A and 63, or
111111 is /. The
= character is used for output padding, and does not have a position in the table.
Let's imagine I just want to encode A, which has an ASCII binary value of
01000001 (0x41). I'm actually going to encode
Q and then I still have another
01 to encode. I'll need to append extra 0s there to get another
Q. So far I've got
By way of another example, let's encode
AA. This would be
010000 010100 0001. First 6 bits would be
Q. Next 6 bits would be
U. The next 6 bits would be
0001, and to make it 6 bits
E. This gives us a total of 18 bits of encode characters (3 * 6). Adding one padding character would give us another 6 for 4 * 6 is 24 bits, a multiple of 8.
Further, an input of
AAA would not require any padding characters as it would encode to 24 bits (
QUFB is 4 * 6 = 24 bits).
AAABBB would not require any padding (
QUFBQkJC is 8 * 6 = 48 bits). Basically, if the input is a multiple of 3 characters, you won't require padding. For example,
AAABBBCCC would not require padding.
AAABBBCCCD is short two characters to make a multiple of 3, so would require two padding characters for an output of
You can test this encoding (and decoding) and the output padding process here.
How to encode and decode in Python
The following code shows a simple example. You read an image file as binary data. You then encode it, and decode it, and write the decoded data out to a file. You can then compare the output file to the input file to make sure they are the same. You can also optionally print out the encoded string if you want.
import base64 # Base64 encodes binary info as ASCII string fn1 = "timmy.jpg" fn2 = "test.jpg" # Open test file as input in binary mode and read in bytes f = open (fn1, "rb") bytes = f.read() f.close() # Encode the picture as Base64 string e = base64.b64encode(bytes) # print(e) # decode the string into bytes d = base64.b64decode(e) # Write decoded bytes to test file f = open (fn2, "wb") f.write(d) f.close() # Make sure you check the test file now displays correctly
You'll notice how easy Python makes it to both encode and decode.
Base64 is common where you need to send binary data over systems designed primarily to handle text. An example of this is an email system. Here, binary attachments have to be encoded as text for transmission. Any attached files such as images will be Base64 encoded, and sent as text.
In this article I took a very quick look at Base64 encoding, and saw how you can encode and decode Base64 using Python.