Czech version
logolink

< Back to the list of lessons

Text Encoding and Character Sets II

ostatni-sinContent of the lesson:

  • Notepad
  • PSPad
  • Additional Text Editors
  • Conversion of Character Sets
  • Online Tools
  • Using Character Sets on Web

Notepad

Notepad offers only basic possibilities for changing the character set. You can choose from ANSI, several Unicode sets and UTF-8. In case of UTF-8, the selection of character set is clear. Using the ANSI character set is more difficult. In this case the character set depends on the language settings in the operating system. In case that Czech language is set, the character set Windows-1250 will be used.

ascii

PSPad

Advanced editors offer you more advanced settings of character sets, you can use very popular PSPad for example (http://www.pspad.com/cz/). This text editor can be used (except from writing unformatted text) for editing parts of source codes in different programming languages or to create HTML pages.

A great advantage is that it can save documents in the character set ISO 8859-2 and in more other sets compared to the Notepad.

ascii

Additional Text Editors

Except the popular PSPad there are many other text editors which can set the character encoding. We can introduce for example Notepad2 (http://notepad-plus.sourceforge.net/uk/site.htm) or Notepad++ (http://notepad-plus.sourceforge.net/uk/site.htm) which can be seen in the following image.

ascii

Conversion of Character Sets

Software

In case you need to convert your file to another encoding you can use one of many applications which are usually available for free in the Internet. A good example can be the application "Conversion of Czech" (http://www.pokluda.com/FreewareCz.aspx). This application is easy to control. You only choose the source and the final file (you can load data from the clipboard or save the final data to the clipboard), then you set the encoding of the source and the final file and click on the button "Proveп konverzi". This application supports the most of known Czech character sets (Windows-1250, Kуd kamenickэch, CP852, ISO 8859-2 and more). There is one disadvantage because it does not support Unicode encoding which means character sets UTF-8 and UTF-16.

ascii

Another program is from the shareware category and its name is Prekodйr (http://zmsoft.cz/prekoder/index.html). This program can be used for free for a month and it offers you also the possibility to convert text to UTF-8 compared with the previous one.

ascii

The previous programs were aimed to Czech character sets. Sometimes you might want to convert your document to a different character set. To make such a conversion you can use the application Character Set Converter (http://www.kalytta.com/tools.php) which offers you a large list of character sets, however, it is only a shareware.

ascii

Online Tools

Besides these applications which can be used to convert diacritic and which were partly described in the previous chapter, you can also use online tools for converting a text to a different character set. A good example can be the following application at website motobit.com (http://www.motobit.com/util/charset-codepage-conversion.asp) which offers converting a text to different character sets and saving the file with it.

ascii

There are many other convertors, we can mention for example this website: http://kanjidict.stc.cx/recode.php which offers only basic possibilities of conversion compared to the previous one.

ascii

Using Character Sets on Web

The problematic of character sets is also necessary to be solved when creating WWW pages. The document which you create can be saved using one of the mentioned character sets.

Besides saving the document in a suitable character set, you have to tell the information about used character set to the browser. In case you do not tell it, one of the following two possibilities will be used:

  • Default encoding or last used character set will be used.
  • Browser will try to detect the character set automatically.

In the first case there is a risk that our document was created using a different character set. The second case can result in a wrong detection and the text can be unreadable as illustrated in the following image.

ascii

To ensure that a correct detection will be done you have to insert the following META tag:

<meta http-equiv="content-type" content="text/html;charset=znakova_sada" />

Replace the text znakova_sada with your chosen character set. For encoding of Czech documents you can use for example one of the following names: iso-8859-2, windows-1250, UTF-8, …

Additional Texts

Links

Questions

  1. Which tools for changing the text encoding do you know?
  2. Show how to change text encoding using an on-line application.
  3. Show how to change text encoding inside a web browser.
webdesign, xhtml, css, php - Mgr. Michal Mikláš