October 8, 2014

Unicode

This is a continuation of the story about the Unicode standard. Read The ASCII table and The 8th bit if you haven’t already.

Last time I finished with a horrible story of overlapping encoding standards, and a general confusion in the software world. It was a world where a French scientists was unable to talk with a German friend.

The hero who solved all of the problems was the Unicode standard. It started with a simple idea. Collect all the symbols, create a big table, and assign a number to each of them. Oh, and try to be compatible with existing standards as much as possible.

When I said all the symbols, I really meant all the symbols. Music notes, Klingon and Emoji charactes are only some bizzare examples. Collecting all these symbols is much easier said than done, so there exists a consortium a.k.a the Unicode Consortium who is responsible for the whole standard.

Representation #

There are various ways to represent Unicode characters in different contexts. Here is a list of common representations for the symbol below

Interesting facts #

The Unicode standard can be viewed as an extension of the ASCII table because they are identical in the first 128 entries
There are around 107,000 characters in Unicode
It includes symbols from fictional stories
It can be encoded with various standards such are UTF-8 and UTF-16

And for the end here is the list of all the Unicode characters.

Happy encoding!

Kudos

Unicode

Representation #

Interesting facts #

Now read this

Named pipes in the shell