Have you ever wondered how computers understand text? ๐ค
You were probably told in primary school that computers only understand two values: 0 and 1. So how does it actually understand what youโre saying? Is it magic?
๐๐ฒ๐ ๐บ๐ฒ ๐ฒ๐ ๐ฝ๐น๐ฎ๐ถ๐ป ๐ถ๐ป ๐ฎ ๐๐ฎ๐ ๐๐ผ๐ ๐๐ถ๐น๐น ๐๐ป๐ฑ๐ฒ๐ฟ๐๐๐ฎ๐ป๐ฑ
Computers indeed only understand 0 and 1. To bridge the gap between human language and machine language, researchers developed something called ๐ฒ๐ป๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด. Think of it like this: if you see โ01,โ it could represent an โAโ โ similar to the substitutions youโve done in math.
One of the earliest encoding systems was called ๐๐ฆ๐๐๐(American Standard Code for Information Interchange). ASCII mapped English letters, numbers, and a few symbols to binary codes. For example, โAโ was encoded as `1000001`.
However, ASCII had its limitations. First, it only supported the English alphabet and couldnโt handle characters from other languages. Second, ASCII used 7 bits for encoding, leaving 1 bit unused in an 8-bit byte(kinda waste of memory). While this extra bit was later used for โextended ASCII,โ it still didnโt address the need for global language support.
To solve these issues, researchers now developed ๐จ๐ป๐ถ๐ฐ๐ผ๐ฑ๐ฒ, which assigns a unique code point to every character across all languages. Unicode is not an encoding system โ itโs more like a map that ensures every character in every language has a unique identifier.
But to make these Unicode code points readable by computers, they needed to be encoded into binary. Thatโs where ๐จ๐ง๐-๐ด comes in.
๐จ๐ง๐-๐ด is an encoding system that efficiently converts Unicode characters into binary. Itโs designed to be flexible:
- It uses 1 byte (8 bits) for common characters like English letters.
- It can use up to 4 bytes for more complex characters, such as emojis or non-Latin scripts.
๐ค๐๐ถ๐ฐ๐ธ ๐๐ฟ๐ฒ๐ฎ๐ธ: Think of a byte as 8 wires bundled together, where each wire can either be on (1) or off (0). The combination of on/off signals creates a unique binary representation of a character.
The clever part is that the first few bits in a byte indicate how many bytes are needed for the character. For example, an English letter like โAโ requires only 1 byte, while a character like โ๐โ requires 4 bytes.
And thatโs why today you can send texts, emojis, and messages in any language to your friends โ thanks to encoding systems like UTF-8. Amazing, isnโt it?
01010100 01001000 01000001 01001110 01001011 00100000 01011001 01001111 01010101
(reply in binary in the comment section๐)