« What Does Experience Actually Look Like? | Main | Next Free SirsiDynix Institute Event »

May 6, 2008

Unicode

It's finally becoming normal which is great. Last December there was a new web milestone when, for the first time, Unicode was the most frequent encoding found on web pages, overtaking both ASCII and Western European encodings.

Mark Davis, Google's Senior International Software Architect posted to the Official Google blog that Unicode is now surpassing most other codes (ASCII, etc.) on the web. This is a great thing as the web becomes more multilingual.

"Web pages can use a variety of different character encodings, like ASCII, Latin-1, or Windows 1252, or Unicode. Most encodings can only represent a few languages, but Unicode will handle anything from Chinese to French to Arabic. We have long used Unicode as the internal format for all the text we search: any other encoding is first converted to Unicode for processing. So we regularly update to each new version of Unicode (and relevant related standards like CLDR and BCP 47) to make sure we are current."

Uptick in native Unicode webpages

Unicode2.png

"You can see a long-term decline in pages encoded in ASCII (unaccented letters A through Z). More recently, there's been a significant drop in the use of encodings covering only Western European letters (ASCII and a few accented letters like Ä, Ç, and Ø). We're seeing similar declines in other language-specific encodings. Unicode, on the other hand, is showing a sharp increase in usage."

Of course Unicode is a very important part of any ILS/OPAC and.user experience strategy.

Stephen

Posted by stephen at May 6, 2008 5:24 PM

Comments

Post a comment




Remember Me?