Have you ever wondered how computers understand text? ๐Ÿค”

Onafowokan Testament
2 min readJan 27, 2025

--

You were probably told in primary school that computers only understand two values: 0 and 1. So how does it actually understand what youโ€™re saying? Is it magic?

๐—Ÿ๐—ฒ๐˜ ๐—บ๐—ฒ ๐—ฒ๐˜…๐—ฝ๐—น๐—ฎ๐—ถ๐—ป ๐—ถ๐—ป ๐—ฎ ๐˜„๐—ฎ๐˜† ๐˜†๐—ผ๐˜‚ ๐˜„๐—ถ๐—น๐—น ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ

Computers indeed only understand 0 and 1. To bridge the gap between human language and machine language, researchers developed something called ๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด. Think of it like this: if you see โ€œ01,โ€ it could represent an โ€œAโ€ โ€” similar to the substitutions youโ€™ve done in math.

One of the earliest encoding systems was called ๐—”๐—ฆ๐—–๐—œ๐—œ(American Standard Code for Information Interchange). ASCII mapped English letters, numbers, and a few symbols to binary codes. For example, โ€œAโ€ was encoded as `1000001`.

Photo by Markus Spiske on Unsplash

However, ASCII had its limitations. First, it only supported the English alphabet and couldnโ€™t handle characters from other languages. Second, ASCII used 7 bits for encoding, leaving 1 bit unused in an 8-bit byte(kinda waste of memory). While this extra bit was later used for โ€œextended ASCII,โ€ it still didnโ€™t address the need for global language support.

To solve these issues, researchers now developed ๐—จ๐—ป๐—ถ๐—ฐ๐—ผ๐—ฑ๐—ฒ, which assigns a unique code point to every character across all languages. Unicode is not an encoding system โ€” itโ€™s more like a map that ensures every character in every language has a unique identifier.

But to make these Unicode code points readable by computers, they needed to be encoded into binary. Thatโ€™s where ๐—จ๐—ง๐—™-๐Ÿด comes in.

Photo by Christian Wiediger on Unsplash

๐—จ๐—ง๐—™-๐Ÿด is an encoding system that efficiently converts Unicode characters into binary. Itโ€™s designed to be flexible:
- It uses 1 byte (8 bits) for common characters like English letters.
- It can use up to 4 bytes for more complex characters, such as emojis or non-Latin scripts.

๐—ค๐˜‚๐—ถ๐—ฐ๐—ธ ๐—•๐—ฟ๐—ฒ๐—ฎ๐—ธ: Think of a byte as 8 wires bundled together, where each wire can either be on (1) or off (0). The combination of on/off signals creates a unique binary representation of a character.

The clever part is that the first few bits in a byte indicate how many bytes are needed for the character. For example, an English letter like โ€œAโ€ requires only 1 byte, while a character like โ€œ๐Ÿ˜Šโ€ requires 4 bytes.

And thatโ€™s why today you can send texts, emojis, and messages in any language to your friends โ€” thanks to encoding systems like UTF-8. Amazing, isnโ€™t it?

01010100 01001000 01000001 01001110 01001011 00100000 01011001 01001111 01010101

(reply in binary in the comment section๐Ÿ˜‚)

--

--

Onafowokan Testament
Onafowokan Testament

No responses yet