WikiPlus

Base64 Encoding Guide: What It Is and How It Works

Base64 shows up in JWT tokens, HTTP headers, CSS data URIs, email attachments, and Kubernetes secrets — yet many developers use it without fully understanding how the encoding works. This guide explains the Base64 algorithm from the ground up: how 3 bytes become 4 characters, why the encoded output is always larger than the input, what the padding equals signs mean, and how Base64url differs from standard Base64. By the end you will have a solid mental model for any Base64 situation you encounter.

The Base64 Algorithm Step by Step

Base64 works by taking raw bytes and representing every 6 bits as one of 64 printable characters. Here is the step-by-step process. Step 1 — Concatenate the bytes: take the input bytes and lay their bits out in sequence. For example, the three ASCII characters M, a, and n have byte values 77, 97, and 110, which in binary are 01001101, 01100001, and 01101110. Concatenated, that is 24 bits: 010011010110000101101110. Step 2 — Split into 6-bit groups: divide the 24 bits into four groups of 6: 010011, 010110, 000101, 101110. In decimal, those are 19, 22, 5, 46. Step 3 — Map to the Base64 alphabet: the standard Base64 alphabet assigns A to 0, B to 1, and so on up to Z for 25, then a–z for 26–51, 0–9 for 52–61, + for 62, and / for 63. Applying the mapping: 19→T, 22→W, 5→F, 46→u. So Man encodes to TWFu. Step 4 — Handle padding: because Base64 processes 3 bytes at a time, input that is not a multiple of 3 bytes needs padding. If the input has one remaining byte, two = padding characters are added. If two bytes remain, one = is added. Decoders use the padding to know how many bytes to discard at the end. The result of this process is that every 3 bytes of input produce exactly 4 Base64 characters. That fixed 4:3 ratio means Base64-encoded data is always 33.33% larger than the original.

The Base64 Character Alphabet and Its Variants

The standard Base64 alphabet (defined in RFC 4648) uses 65 characters: 26 uppercase letters, 26 lowercase letters, 10 digits, plus (+), slash (/), and equals (=) for padding. This set was chosen because all 65 characters are safe in most text contexts and were supported by all early computing systems. However, + and / cause problems in specific contexts: In URLs, + is interpreted as a space in query strings, and / separates path segments. A Base64 string used as a URL parameter or path component would be misinterpreted or need percent-encoding, which defeats the purpose of using Base64 for compactness. In filenames, / is the directory separator on Unix systems and \ on Windows, making a Base64 string with those characters invalid as a filename. Base64url addresses this by replacing + with - (hyphen) and / with _ (underscore). Both characters are safe in URLs and filenames. Base64url also typically omits padding, since the length of the encoded string already implies how much padding would be needed. Another variant is MIME Base64, defined in RFC 2045. It uses the standard alphabet but inserts a CRLF line break every 76 characters. This was required by early mail servers that could not handle long lines. When decoding MIME Base64, you must strip line breaks first, or your decoder will error. You will encounter each variant in different contexts: standard Base64 for binary data in JSON and config files, Base64url for JWTs and OAuth tokens, and MIME Base64 in raw email source.

Where You Will Encounter Base64 in Practice

Base64 is so widely used that you likely interact with it every day without noticing. Here is a map of the most common locations. JWT (JSON Web Tokens): the header and payload of every JWT are Base64url-encoded JSON objects. Paste any JWT into a decoder and you will immediately see the algorithm, claims, and expiry. The signature is the third segment — it is Base64url-encoded bytes of the cryptographic signature and is not human-readable after decoding. HTTP Basic Authentication: the Authorization header for Basic Auth looks like Authorization: Basic dXNlcjpwYXNzd29yZA==. Decode that string and you get user:password in plain text. This is why Basic Auth must always be used over HTTPS. Data URIs in HTML and CSS: an <img> tag with src="data:image/svg+xml;base64,..." embeds the entire image inline. CSS background-image properties work the same way. This eliminates HTTP requests but increases HTML size. Kubernetes Secrets: all values in Kubernetes Secret objects are Base64-encoded, not encrypted. Running kubectl get secret mysecret -o yaml shows the encoded values; decode them to see the actual credentials. Remember this when auditing cluster access. Email attachments: raw email source (visible when you view message source in a mail client) shows attachments as blocks of MIME Base64 between Content-Transfer-Encoding: base64 headers. Decode a section to recover the original attachment. X.509 certificates in PEM format: PEM files are just Base64-encoded DER certificates wrapped in -----BEGIN CERTIFICATE----- and -----END CERTIFICATE----- headers. The Base64 inside is the actual certificate binary data.

Encoding vs Compression vs Encryption: Key Distinctions

Base64 is frequently confused with compression and encryption. Understanding the distinctions prevents security mistakes and helps you choose the right tool. Encoding transforms data between representations without any intent to hide it. Base64 encoding is fully reversible with no key. Anyone who sees the output can recover the input. The purpose is compatibility — making binary data safe for text-only channels — not secrecy. Compression reduces data size by finding and eliminating redundancy. Formats like gzip, Zstandard, and Brotli are compression algorithms. Base64 does the opposite: it increases size by 33%. You will sometimes see data that is both compressed and Base64-encoded: gzip the data first (to reduce size), then Base64-encode it (to make it text-safe). The two operations serve different purposes and are applied in sequence. Encryption scrambles data so that only someone with the correct key can recover the original. AES, RSA, and ChaCha20 are encryption algorithms. Encrypted data is often then Base64-encoded for transmission, which leads to the common but dangerous misconception that Base64-encoded data is somehow protected. It is not. A helpful mental model: encryption is a lock, compression is a zip, and encoding is a translator. You might lock something, zip it to save space, and then translate it into a format a postal service can handle — in that order. Base64 is only the translator step.

Frequently Asked Questions

Why is Base64-encoded data always larger than the original?
Because Base64 represents every 3 bytes of binary data as 4 ASCII characters. That is a 4:3 ratio, which works out to a 33.33% size increase. Additionally, MIME Base64 adds CRLF line breaks every 76 characters, adding a further small overhead. This size cost is unavoidable — it is the price of converting binary data into a printable, text-safe format. If size is a concern, compress the data before encoding it.
Can I Base64-encode any type of file?
Yes. Base64 operates on raw bytes and has no knowledge of file type. You can encode images, PDFs, ZIP archives, audio files, executables, or any other binary format. The browser-based WikiPlus Base64 tool uses the FileReader API to read the file locally and produces the encoded output without uploading anything. When decoding binary files, the tool will offer the result as a download rather than trying to display it as text.
How do I know if a string is Base64-encoded?
Look for these characteristics: the string contains only A–Z, a–z, 0–9, +, /, and optionally trailing = or == padding. Its length is a multiple of 4 (including padding). Base64url strings use - and _ instead of + and / and may omit padding. Note that a short string of random alphanumeric characters can look like Base64 by coincidence, so always attempt to decode it and check whether the output is meaningful rather than relying on visual inspection alone.