Mastering phpMyAdmin 3.3.x for Effective MySQL Management
上QQ阅读APP看书,第一时间看更新

Character sets, collations, and language

A character set describes how symbols for a specific language or dialect are encoded. A collation contains rules to compare and sort the characters of a character set. The character set used to store our data may be different from the one used to display it, leading to data discrepancies. Thus, a need to transform the data arises.

Since MySQL 4.1.x, the MySQL server does the character recoding work for us. Also, MySQL enables us to indicate the character set and collation for each database, each table, and even each field. A default character set for a database applies to each of its tables, unless it's overridden at the table level. The same principle applies to every field.

Collations

When strings have to be compared and sorted, precise rules must be followed by the system (MySQL in this case). For example, is "A" equivalent to "a"? Is "André" equivalent to "Andre"? A set of these rules is called a collation.

A proper choice of collation is important for obtaining the intended results when searching data (for example, from phpMyAdmin's Search page), and also when sorting data.

For an introduction to collations, see http://dev.mysql.com/doc/mysql/en/Charset-general.htm, and for a more technical explanation of the algorithms involved, refer to http://www.unicode.org/reports/tr10/.

Unicode and UTF-8

Unicode is an industry standard designed to allow text and symbols [...] to be consistently represented and manipulated by computers.

For more information, visit http://en.wikipedia.org/wiki/Unicode and http://www.unicode.org.

Unicode currently supports more than 600 languages, which is its main advantage over other character sets available with ISO or Windows. This is especially important with a multi-language product such as phpMyAdmin.

To represent or encode these Unicode characters, many Unicode Transformation Formats (UTF) exist. A popular transformation format is UTF-8, which uses one to four octets per character. For more details, visit http://en.wikipedia.org/wiki/UTF-8.

Note that the browser must support UTF-8 (as most current browsers do). The phpMyAdmin distribution kit includes a UTF-8 version of every language file in the lang subdirectory.

Selecting languages

A Language selector appears on the login panel (if any) and on the home page. The default behavior of phpMyAdmin is to use the language defined in our browser's preferences, if there is a corresponding language file for this version.

The default language used—in case the program cannot detect one—is defined in config.inc.php, in the $cfg['DefaultLang'] parameter with'en-utf-8'. This value can be changed. The possible values for language names are defined in the libraries/select_lang.lib.php script as an array.

Even if the default language is defined, each user (especially on a multi-user installation) can choose his or her preferred language from the selector. The user's choice will be remembered in a cookie whenever possible.

We can also force a single language by setting the $cfg['Lang'] parameter with a value, such as'fr-utf-8'. Starting with version 2.7.0, another parameter, $cfg['FilterLanguages'], is available. Suppose we want to shorten the list of available languages to English and Français (French), as these are the ones used exclusively by our users. This is accomplished by building a regular expression indicating which languages we want to display based on the ISO 639 codes of these languages. To continue with our example, we would use:

$cfg['FilterLanguages'] = '^(fr|en)';

In this expression, the caret (^) means starting with and the pipe (|) means or. The expression indicates that we are restricting the list to languages whose corresponding ISO codes start with fr or en.

By default, this parameter is empty, meaning that no filter is applied to the list of available languages.

The small information icon beside Language gives access to phpMyAdmin's translator page, which lists, by language, the official translator and the contact information. This way, we can reach the translator for corrections, or to offer help with untranslated messages.

Effective character sets and collations

On the home page, we can see the MySQL charset information and a MySQL connection collation selector. Here is the MySQL charset information:

Effective character sets and collations

The character set information (as seen here after MySQL charset) comes directly from the $charset variable located in the language file that corresponds to the currently-selected language. It's used to generate HTML information, which tells the browser what the page's character set is.

We can also choose which character set and collation will be used for our connection to the MySQL server using the MySQL connection collation dialog. This is passed to the MySQL server. MySQL then transforms the characters that will be sent to our browser into this character set. MySQL also interprets what it receives from the browser according to the character set information. Remember that all tables and fields have a character set information describing how their data is encoded.

Effective character sets and collations

Normally, the default value should work. However, if we are entering some characters using a different character set, we can choose the proper character set in this dialog.

The following parameter defines both the default connection collation and the character set:

$cfg['DefaultConnectionCollation'] = 'utf8_unicode_ci';