Czech version
logolink

< Back to the list of lessons

Markup Languages

ostatni-sinContent of the lesson:

  • Development of Markup Languages
  • Characteristic of Markup Languages
  • Division of Markup Languages
  • Examples of Markup Languages

Development of Markup Languages

The technology of markup languages is nowadays used especially in company sphere (B2B - what is it?). The first motivation for development of markup languages was the need of standardization and uniformity when exchanging information.

A great boom of computers caused a massive competitive fight for platforms, operating systems and of course application equipment - especially for office usage. Every system contains its own solution (format) for data. This means that systems are generally not able to communicate between each other and exchange information (in case that Microsoft releases a new version of the office package, usually a new format for saving files is released and all other applications are unable to open those files). The solution of this could be a standard which will be used to store files and any program which supports this format will be able to open that file.

However main players at the software market stand behind their closed formats of documents. They rather follow marketing targets than try to improve that situation. The closed formats of single documents only strengthens their position at the market. This trend has not changed so far and is kept especially by Microsoft, the leader of development of operating systems and office software.

Companies which have a large data base, tens of thousands documents practically do not have possibility to change their software. The problem of incompatibility has to be solved also between single versions of the most spread office software MS Office.

binární podoba souboru ve formátu docx

Each application saves data typically in its own closed format which is saved in the binary form and is not readable for a human. The advantage is that the final file is usually smaller. The effort to change this already appeared in the past. One of the first ancestors of XML was the language GML (Generalized Markup Language) used by IBM to store lawyer data. This language was already known in 1960. The language SGML (Standard Generalized Markup Language) was developed in 1986 and is so complex that its complexity prevents it from being extended.

A derivation of the SGML language was later the language HTML (HyperText Markup Language) which is used to create internet pages.

The language HTML was spread because of its simplicity. The definition of the languages defines only a limited group of elements which describe the structure of a document as the size of headline, paragraph, block of text, bold text etc. The condition of an uniform format which offers portability and the possibility to share text between organizations and platforms was satisfied. This language allows you to define not only the structure but also the appearance of items - it combines formal and presentation logic (meaning and appearance). An example of this problem is for example using tables for the design of a website. On the top of that HTML is too simple with a limited set of tags. A motivation for XML (eXtensible Markup Language) was assigned - this is another derivation of SGML.

XML describes only the logical structure, it does not contain any presentation tags (appearance) and allows you to give not only the value meaning (bigger or smaller) but also the logical meaning to single blocks of text or data.

Example of XML code - simple list (catalogue) of CD
−<CATALOG>
−<CD>
<TITLE>Songs of Distant Earth</TITLE>
<ARTIST>Mike Oldfield</ARTIST>
<COUNTRY>GB</COUNTRY>
<COMPANY>Warner</COMPANY>
<PRICE>499</PRICE>
<YEAR>1994</YEAR>
</CD>
−<CD>
<TITLE>Seascapes</TITLE>
<ARTIST>Michael Jones</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Narada</COMPANY>
<PRICE>399</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>

Characteristic of Markup Languages

A Markup Language is a language whose source text contains the own text as well as instructions for its processing. There are usually written as commands or tags. The source text is usually an ordinary ASCII file which allows you to edit it using the simplest editors like Notepad in MS Windows or in Unix.

Typical features of markup languages are characters with special meanings. Those are used to define control constructions - commands and tags. For example inside XML which is the most important representative of markup languages nowadays, the characters "less than" (<) and "greater than" (>) are special because they begin and end all marks. The text between them is considered as an instruction which will be used by a processing program. You can see an example of source code:

Example of notation in a markup language
<h1>Main Headline</h1>
<p>Text of the first paragraph with a <em>highlighted</em> word.</p>

The result of the notation in HTML:

Example of notation in a markup language

Main headline

Text of the first paragraph with a highlighted word.

The couple of tags - beginning <h1> and ending </h1> - inside the code marks the headline etc. Because several characters have a special meaning, there is another construction to insert them into text.

Traditional representatives of markup languages are tools for formatting of a text - programs nroff, troff and more under the operating System Unix or the typesetting system TeX. Also PostScript can be marked as a markup language.

The main advantage of markup languages is that they do not require any special program equipment inside a computer to be edited. Specialized programs or adaptations of several more powerful editors can make the editing easier and faster. Documents in these languages can be easily generated by a machine. Tools for their processing are free and usually allow an opened source code. On the other side, using them requires particular knowledge but it is not hard to learn it. Unlike the WYSIWYG tools you cannot just sit down and experiment with tags without any knowledge of the markup languages.

Division of Markup Languages

Markup languages can be divided to two basic groups:

Descriptive Languages

Their construction is used to describe which information is contained inside a document. Typical representatives are XML or HTML - you can define a headline, single paragraphs or describe a link to another website. The processing program can handle this information as it want and can for example display the final document.

Procedural Languages

They contain also procedural instructions on the level of a programming language - typically a particular form of memories or variables and tools for assigning and using their values. Procedural languages also allow you to describe the visual characteristic of output in details. The user can very accurately control the appearance of the final document. The languages TeX or PostScript belong to this group. To demonstrate the expressive power of the first one - it was used to create the interpreter of BASIC language although it is a typesetting program.

Examples of Markup Languages

HTML

HTML is probably the most known markup language in the world because it is used for creating all websites (the skeleton for other languages or the whole websites)

The shortcut HTML means HyperText Markup Language and it is a markup language for hypertext. After a long development the current version is 4.01, developers are working on the version 5 which will bring significant improvement. HTML can be viewed in every browser by the command Display source code.

Source code in HTML
<html>
 <!-- this is comment -->
 <head>
  <meta charset="encoding">
  <title>Title of the page</title>
 </head>

 <!-- body of the document -->
 <body>
  <h1>Headline</h1>
  <p>This is body of the document</p>
 </body>
</html>

You can see from the previous source code that most of the commands (tags) are in pair but this is not a condition. A pair tag begins with the mark <tag> and ends with </tag>. The content is written between these tags.

There is a pair tag title in the example - the content of this tag is written as the title of the whole page inside your browser. You can notice that the whole page is closed inside the tag html which consists of head part (styles, title of page, metadata, scripts and more are defined here) and body part (content is placed here - text, images, videos, ...).

Texts closed between <!-- and --> are considered as comments - they remain in the source code but are not drawn to the page.

XML

XML is a shortcut for Extensible Markup Language which was developed and standardized. It is used to serialize data between programs and services. Its processing is supported by many tools and programming languages.

This language is designed for exchanging data between applications and for publishing documents because it describes the structure of the content of single parts, it does not deal with the appearance.

XML file
<?xml version="1.0" encoding="UTF-8" ?>
<!-- Note - more recipes should be added. -->
<recipe name="chleba" time_for_preparing="5 min" time_for_cooking="3 hrs">
  <title>Easy bread</title>
  <ingredient amount="3" unit="pots">Flour</ingredient>
  <ingredient amount="0,25" unit="ounces">Yeast</ingredient>
  <ingredient amount="1,5" unit="pots">Hot water</ingredient>
  <ingredient amount="1" unit="coofee spoon">Salt</ingredient>
  <instructions>
    <step>Mix all ingredients togher and knead the result</step>
    <step>Cover by a fabric and let an hour inside a warm room.</step>
    <step>Knead again, place on a sheet and bake inside an oven.</step>
  </instructions>
</recipe>

You can see that the structure is similar to HTML, single items are closed between tags. There is the definition of version and encoding in the first line which is important for other programs to be able to work with the document properly.

Then you can see a tag named recipe which contains one title, several ingredients and then a set of instructions which consist of several steps. A great advantage is that this notation is readable for most programs and you can add any number of items without the need to treat this in a program

You can also notice so called attributes which are added for ingredients - an attribute is inserted inside the brackets <> and its value is written into quotation marks. The attribute of amount with corresponding value is added to ingredients. The unit is added using the same procedure.

The final XML can be displayed like the following one inside a browser:

xml

TeX

TeX is a program for computer typesetting. It was created by professor Donald Ervin Knuth who was not satisfied in 70s of the 20th century how a colleague publishing company typeset his scripts for students (there were many mistakes, especially in mathematical formulas and the typesetting was poor) and he released it for free for the others

TeX is popular especially in mathematical, physical and informatics areas. It is generally considered as the best tool for typesetting more complex formulas.

There are several sets of macros (commands) for TeX, very famous is LaTeX which is also free.

File in LaTeX
\documentclass{article} 
\pagestyle{empty}
\usepackage{czech} 
\begin{document}

\bf{Zadání:}
$$\log_3 \sqrt{3+x} + \log_3 \sqrt{x+4} = \log_3 \sqrt{2} + \log_3 \sqrt{7x+1}$$

\end{document}

You can recognize several content of the document, nevertheless, it is not like a document in Microsoft Word. All single parts of a document are described by commands which are later translated by the translator.

The whole source code of one page in LaTeXu is inserted between commands \begin{document} and \end{document}. There are several details for the translator in the header - used character set, style for this page and the type of document.

The command \bf means a headline, the text of this headline is written into the braces. Then you can see a mathematical formula which is closed between characters $$ because typesetting of mathematical texts was very expensive before the LaTeX was developed. There are several roots (sqrt), logarithms with the base of 3 (log_3) and simple adding operations.

The result of this code can be seen in the following image:

latex

PostScript

PostScript is a programming language used for graphical description of printable documents and was developed in 1985 by Adobe Systems Incorporated. Its main advantage is that it is device independent. It is considered as a standard for more expensive printers. Thanks to its large possibilities it is also used to store images.

It is a set of mathematical commands which set the layout of a page. It is true that the final document can be much larger but the whole content is converted to simple commands.

Document converted to PostScript
newpath
100 200 moveto
200 250 lineto
100 300 lineto
closepath
gsave
0.5 setgray
fill
grestore
4 setlinewidth
0.75 setgray
stroke

The final document which will be printed from those commands:

postscript

Source and more examples here (Paul Bourke).

You can see that the final shape was described using elementary commands like "100 200 moveto" (move the drawing head to the given point), "200 250 lineto" (create line from the previous point to the given point) etc.

XAML

The shortcut means Extensible Application Markup Language and this is a language developed by Microsoft to be used in technologies .NET Framework 3.0 and 4.0, Silverlight, WPF a more. It is also a markup language which for example in case of the Silverlight technology describes the appearance and content of a document.

Part of XAML document
<LinearGradientBrush>
  <LinearGradientBrush.GradientStops>
  <!-- no explicit new GradientStopCollection, parser knows how to find or create -->
    <GradientStop Offset="0.0" Color="Red" />
    <GradientStop Offset="1.0" Color="Blue" />
  </LinearGradientBrush.GradientStops>
</LinearGradientBrush>

This part of document can be used as a background for any element on the page (for example a panel or a button). There is a linear gradient with described colors inside it, including offsets of these colors. A browser which supports Silverlight will draw an element with the correct background from this code (HTML 4 still does not allow you to draw gradients as a background).

The final background which will be drawn can be seen in the following image (the quality of image is worse because of the conversion to jpg - loss of colors):

silverlight

This website was used for several examples of markup languages http://www.wikipedia.org/.

Additional Texts

Links

Questions

  1. What is a markup language?
  2. Which markup languages do you know?
  3. What is the purpose of markup languages?
webdesign, xhtml, css, php - Mgr. Michal Mikláš