Friday, July 27, 2007

EDSK ... about Unicode and character encodings

Every developer should know about Unicode and character encodings.

This is a complicated subject but it's a very important one. It's also a subject that I expect to return to several times.

“If you are a programmer working in 200[7] and you don’t know the basics of characters, character sets, encodings, and Unicode, and I catch you, I’m going to punish you by making you peel onions for six months in a submarine.” Joel Spolsky

Joel is famous for his site If you haven't read it I recommend you do.

Of particular relevance today is the article:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

If you haven't read this article before, read it now. If you have, read it again. A little refresher is rarely a bad thing.

But why do you need to know this?

Well, at some point you are going to have to know:
  • Why some fonts show characters differently.
  • Why characters with diacritics don't always display (correctly).
  • How you display text from languages which uses alphabets other than the English one (A to Z, no diacritics).
  • How you create characters with multiple diacritics.
  • What UTF-8 & UTF-16 really mean.
  • How you display text from languages which read from right to left.
  • How you display text from the Chinese, Japanese and Korean languages.
  • How you should display text that contains a words from varying languages.
This list could easily be a lot longer, and as you start to investigate the subject you'll soon see how.
The most important thing to take from this is that you will not be able to get through the rest of your career in software development assuming that the only letters you will ever need are A though Z. As the world gets smaller and international boundaries blur, especially on-line, there will be more and more need to know this.
This is a complex subject. Start learning about it now and you'll have a head start when have to implement related functionality.


Post a Comment

I get a lot of comment spam :( - moderation may take a while.