Have you ever wondered how computers understand text? 🤔

2 min readJan 27, 2025

You were probably told in primary school that computers only understand two values: 0 and 1. So how does it actually understand what you’re saying? Is it magic?

𝗟𝗲𝘁 𝗺𝗲 𝗲𝘅𝗽𝗹𝗮𝗶𝗻 𝗶𝗻 𝗮 𝘄𝗮𝘆 𝘆𝗼𝘂 𝘄𝗶𝗹𝗹 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱

Computers indeed only understand 0 and 1. To bridge the gap between human language and machine language, researchers developed something called 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴. Think of it like this: if you see “01,” it could represent an “A” — similar to the substitutions you’ve done in math.

One of the earliest encoding systems was called 𝗔𝗦𝗖𝗜𝗜(American Standard Code for Information Interchange). ASCII mapped English letters, numbers, and a few symbols to binary codes. For example, “A” was encoded as `1000001`.

However, ASCII had its limitations. First, it only supported the English alphabet and couldn’t handle characters from other languages. Second, ASCII used 7 bits for encoding, leaving 1 bit unused in an 8-bit byte(kinda waste of memory). While this extra bit was later used for “extended ASCII,” it still didn’t address the need for global language support.

To solve these issues, researchers now developed 𝗨𝗻𝗶𝗰𝗼𝗱𝗲, which assigns a unique code point to every character across all languages. Unicode is not an encoding system — it’s more like a map that ensures every character in every language has a unique identifier.

But to make these Unicode code points readable by computers, they needed to be encoded into binary. That’s where 𝗨𝗧𝗙-𝟴 comes in.

𝗨𝗧𝗙-𝟴 is an encoding system that efficiently converts Unicode characters into binary. It’s designed to be flexible:
- It uses 1 byte (8 bits) for common characters like English letters.
- It can use up to 4 bytes for more complex characters, such as emojis or non-Latin scripts.

𝗤𝘂𝗶𝗰𝗸 𝗕𝗿𝗲𝗮𝗸: Think of a byte as 8 wires bundled together, where each wire can either be on (1) or off (0). The combination of on/off signals creates a unique binary representation of a character.

The clever part is that the first few bits in a byte indicate how many bytes are needed for the character. For example, an English letter like “A” requires only 1 byte, while a character like “😊” requires 4 bytes.

And that’s why today you can send texts, emojis, and messages in any language to your friends — thanks to encoding systems like UTF-8. Amazing, isn’t it?

01010100 01001000 01000001 01001110 01001011 00100000 01011001 01001111 01010101

(reply in binary in the comment section😂)

Have you ever wondered how computers understand text? 🤔

Written by Onafowokan Testament

No responses yet