Architecture FAQ for Localization and Globalization: Part I

Introduction

When we see around, architectures mainly discuss about loose coupling , scalability , performance etc etc. Many architecture forget one of the important aspects in software is making application globalized. Depending on project some application would not really want to have multi-language based websites , but I am sure many will. So in this article we will go through a series of FAQ which will give you a quick start on making application multi-language based.

What is Unicode & Why was it introduced?

In order to understand the concept of Unicode we need to move little back and understand ANSI code. ASCII (ask key) stands for American Standard Code for Information Interchange. In ASCII format, every character is represented by one byte (i.e. 8 bits). So in short we can have 256 characters (2^8). Before UNICODE came in to picture programmers used code page to represent characters in different languages. Code page is a different interpretation of ASCII set. Code pages keep 128 characters for English and the rest 128 characters are tailored for a specific language.

Below is a pictorial representation of the same. 

Figure 14.1:- Code page in action

There are following disadvantages of the CODE page approach:-

  • Some languages like Chinese have more than 5000 characters, which is difficult to represent only with 128-character set.
  • Only two languages can be supported at a one time. As said in the previous note you can use 128 for English and the rest 128 for the other language.
  • The end client should have the code page.
  • Code Representation change according to Operating system and Language used. That means a character can be represented in different numbers depending on operating system.

For all the above problems, UNICODE was introduced. UNICODE represents characters with 2 bytes. So if its two bytes that means 18 bits. You can now have 2^16 characters i.e. 65536 characters. That is a huge number you can include any language in the world. Further if you use surrogates you can have additional 1 million characters...Hmm that can include type of language including historian characters.

ASCII representation varied according to operating system and language. However, in UNICODE it assigns a unique letter for every character irrespective of Language or operating system, which makes programmers life much easier while developing international compatible applications.

Does .NET support UNICODE and how do you know it supports?

Yes, .NET definitely supports UNICODE. Try to see size of (char), you will see 2 bytes. Char type data type stores only one character, which needs only 8 bits, but because .NET has to support UNICODE, it uses 16 bits to store the same.

What is the difference between localization and globalization?

Below are the definition is which is taken from the Microsoft glossary.

Globalization: -It is the process of developing a program core whose features and code design are not solely based on a single language or locale.

Localization: - The process of adapting a program for a specific local market, which includes translating the user interface, resizing dialog boxes, customizing features (if necessary), and testing results to ensure that the program still works.

You can visualize globalization as more of architecture decisions. While localization is adapting your content to local market. Globalization phase occurs before localization phase.

What architecture decisions you should consider while planning for international software's?

Note: - Many programmers think its only converting the text from one language to other. It's a very wrong assumption that just by translating strings from one language to other language the software is localized. Interviewer will definitely get disappointed by such an answer. So let's try to visualize what are the design considerations to be taken when we design software globally.

  • Avoid hard coding of strings in the project. Any display right from labels to error messages read it from a resource file.
  • Length of the string is also of prime importance. It's a noted fact that when we translate English language in to other language the words increase by minimum 30 to 40 %. For instance you can see from the below figure how the Hindi text has increased as compared to English text. 

Figure 14.2: - Text length increase

So all your labels, message boxes should design in such a way that the text size mismatched gets adjusted. Do not crowd all your fields on one screen you will definitely end with the text length issue. Leave some room for expansion.

  • Decimal separator varies from locale to locale. For instance 25, 12.80 in the United States are 25.12,80 in Greece or Germany. Yes, you guessed right the decimal separator in Europe is a "," (Comma).
  • Calendar changes from country to country. Definitely, Gregorian calendar is the most used. However, there are some other calendars like Hebrew, Islamic, Chinese etc. All these calendars have huge differences. For instance, Nepal follows Nepali Calendar, which is 56.7 years ahead of Gregorian calendar. So according to cultural settings user can expect the dates accordingly.
  • Sort order is affected by language. You can see from the figure below Hindi and English languages have different sorting order.


    Figure 14.3: - Different sorting order according to locale
  • Time varies from locale to locale. For instance, an 8 PM in India is 20:00 in Europe. In Europe, there is not concept of AM and PM.
  • If you are using built-in fonts, use the resource file to bundle the same. You can load the fonts from the resource file rather than telling the user explicitly to install the fonts in his PC.
  • Keyboards layout changes according locale and region. So be careful while designing the short cut keys. The function keys are mostly present in all key boards. Probably you can consider the function keys for short cut keys. Below is a sample Hindi key board. If you define CTRL + V as a short cut for paste functionality it can create confusion for Hindi users on the below key board.


    Figure 14.4: - Localized Hindi keyboard


Courtesy: - Image taken from http://www-306.ibm.com/

So you can see from the above points that making software adapt to global culture is not only related to string translation. It is much beyond that.

How do we get the current culture of the environment in windows and ASP.NET?

"CultureInfo.CurrentCulture" displays the current culture of the environment. For instance if you are running Hindi it will display "hi-IN". Please note one thing in mind "Current Culture" will only give you the culture on which your application is running. Therefore, if it is a windows application this will work fine. However, in ASP.NET 2.0 we need to know what culture the end user has.

For a real international website, different users can log in with different culture. For instance, you can see from the given figure below different users are logging in with different regional settings. Client browser sends the information in the request headers to the server. For instance, a Korean user will send "KO" in the request headers to server. We can get the value using the "Request.UserLanguages". 

Figure 14.5: - Different users logging in from different countries

Regional settings are defined on the user's browser as shown below. Click on Tools - Internet options - Languages. You can then add languages in the language preference box. Using "Move up" and "Move down", you can define the priority of the languages. In the below figure we have four languages defined with "Hindi" language set at the top priority. "Request.UserLanguages" returns an array of string with the sorted order defined in your language preference tab of the browser.

Figure 14.6: - Setting language preferences in browser

Below is the code snippet, which shows how we can display the user languages. The first figure is the code snippet, which shows how to use "Request.UserLanguages". The second figure shows the output for the same. 

Figure 14.7: - Request.UserLanguages in action 

Figure 14.8: - Output from Request.UserLanguages

One of the things to be noted is "q" value. "q" stands for quality factor. In the above figure, the quality factor means the following:-

"I prefer Hindi, but will accept English US (with 80% comprehension) or Greek (with 50% comprehension) or French (with 30 % comprehension)."

Just for Non-English speakers meaning of Comprehension.

It is the process of understanding and constructing meaning from a piece of text.

The comprehension is from the perspective of the end user. It says the end browser user will understand with this much comprehension for that language. For instance in the above example the end browser under stands English comprehension of 80 %.

Note: - You can find the sample to display the user languages in "Globalization" folder. Run "DisplayAllLanguageSettings.aspx" and see the output. Just to mention the source is coded in VS.NET 2005 so if you try to open it using VS.NET 2003 you can get errors.

Which are the important namespaces during localization and globalization?

There are two most important namespaces:-

  • 'System.Globalization' - This namespace contains classes that define culture-related information, including the language, the country/region, the calendars in use, the format patterns for dates, currency and numbers, and the sort order for strings.
  • 'System.Resources' - This namespace provides classes and interfaces that allow developers to create, store, and manage various culture-specific resources used in an application. With this namespace, you can read a resource file and display it accordingly to the user's culture. 

What are resource files and how do we generate resource files?

Resource files are files, which contain program resources. Many programmers think resource files for only storing strings. However, you can also store bitmaps, icons, fonts, wav files in to resource files.

In order to generate resource file you need click on tools - generate local resource as shown in the figure below. Do not forget to keep page in designer view or else you will not see the option. Once you generate the resource file you will see the resx file generated in the solution explorer.

Figure 14.9: - Generating resource files using IDE

If you see the resource file it will basically have a key and the value for the key. 

Figure 14.10: - Resource file in action

If you see the above figure, the key is basically the object name. You can see the Label1 has some value stored in the resource file.

Can resource file be in any other format other than resx extensions?

Yes, they can be in .txt format in name and value pairs. For instance below is a simple .txt file with values.

Lbluserid = User Id
LblPassword = Password
CmdSubmitPassword = Submit

How is resource files actually used in project?

How can we use Culture Auto in project?

Note: - Hmmm we have talked so much theoretically its time to see something practically in action. Let's make small project to understand how we can implement the same. In Globalization folder you can run the "LoginScreen.aspx" to see it practically. Below goes the explanation.

We will make a simple login screen, which we will try to use for English as well as Greek. The login screen will display English settings when an English user logs in and Greek Settingswhen a Greek user logs in. So below are the steps to start with. 

Figure 14.11: - Culture Auto in action

In the above figure, you can see the login page. You can find the same in CD as named "LoginScreen.aspx". It is a simple page with two labels and two text boxes. Now the labels values i.e. "User ID" and "Password" should be changed according to regional settings set on the browser. So below are the steps for the same:-

  • Make two resource files as shown below one for Greece and other for English. There are three values defined for "Userid", "Password" and the main title of the page. The other important thing to note is the naming convention of the files. You need to tag the naming convention with the language code. You can see from the below figure the resource files naming convention is divided in two three parts File name, Language code and resource file extension. In this sample, we will demonstrate for English and Greek language so we tagged the file with "el" language code. 


    Figure 14.12: - Resource file naming conventions

    Below are the two resource files defined? 


    Figure 14.13: - Greek and English resource files
  • Once you have defined your resource files we need to define two attributes "UICulture=Auto" and "Culture=Auto". See the above figure "Culture Auto in Action".
  • Final step you also need to define resource key at the UI object level. You can see a sample of the same in figure "Culture Auto in Action".


    Figure 14.14: - Login screen according to settings

Compile the project, run, and see the output after changing regional settings for both languages. You should see different outputs as shown in the above figure. With out a single line of code everything works... That is the magic of "UICulture=Auto" attribute. 

Note: - You can the get the above source code in "Globalization" folder. See for "LoginScreen.aspx" page.

Note: - In the further section we will be answering three questions in one go.

References

Hanselman has given list of containers useful link to visit.

Up Next
    Ebook Download
    View all
    Learn
    View all