Czech version
logolink

< Back to the list of lessons

Principle of Markup Languages and Basic Structure of (X)HTML Document

DreamweaverContent of the lesson:

  • Markup Languages - History
  • Markup Languages - Standards
  • SGML (Standard Generalized Markup Language)
  • HTML (HyperText Markup Language)
  • XML (eXtensible Markup Language)
  • Development of (X)HTML
  • Structure of (X)HTML page

Markup Languages - History

In the beginning look at the short history of web standards:

Theodor Nelson began connecting documents in computers in 1960s and he was the one who came with the term hypertext. There was a short way to the shortcut HTML, which means HyperText Markup Language.

SGML (Standard Generalized Markup Language) was created in 1986, defined by the norm of ISO 8879. This general markup language allows us to define our own markup languages on the basic of the definition of a type of document - DTD. The HTML language is one of the SGML applications; every HTML version uses marks in the way described in every relevant DTD.

The HTML development was described by Pavel Mikle in the publication Dynamic HTML (Unis Publishing 1997):

1989
  • Tim Berners-Lee came with the project of creating a distribute hypertext system
  • the WWW project was begun in laboratories in CERN (Switzerland)
1991, 1992
  • the first informal specification of HTML was released
  • the first browser was released to public
1993
  • there were around 50 servers around the world
  • the first graphic browser NCSA Mosaic for X-Window environment was released
  • the preview of HTML 2.0
1994
  • the first international conference about WWW system
  • the author of Mosaic program established the Mosaic Communications Corp.; this company released a new browser Netscape
  • in September the World Wide Web Consortium (W3C in the rest of document) was established
  • a French institute INRIA takes the development of WWW from CERN
1995
  • WWW spreads quickly
  • official specification of HTML 2.0 (INRIA)
  • Netscape releases an unofficial extension of HTML known as HTML 3.0
1996
  • the official specification of HTML 3.2 (W3C); it is poorer compared to HTML 3.0
  • Microsoft releases its first free browser, Internet Explorer 3.0; the first support for CSS
  • the specification of CSS level 1
1997
  • the specification of HTML 4.0 (W3C); new frames and floating added, forms upgraded, tables added, better script support, new line elements (also ABBR, not supported by IE until now), Microsoft releases Internet Explorer 4.0; target - DHTML
1998
  • Microsoft releases Internet Explorer 5.0
  • the first XML 1.0 specification
  • the specification of CSS level 2
1999
  • the specification of HTML 4.01
2000
  • the first specification of XHTML 1.0
  • the specification of XML 1.0 (Second edition)
2001
  • Microsoft releases Internet Explorer 6
  • the specification of XHTML 1.1
2002
  • reformulation of XHTML 1.0
2003
  • draft of XHTML 2.0
  • draft of CSS level 2.1 revision
  • draft of CSS level 3

Markup Languages - Standards

SGML (Standard Generalized Markup Language)

This is an universal markup language which allows us to define markup languages at their own subsets. SGML is a complex language offering many markup settings but its complexity prevents it from being extended.

HTML (HyperText Markup Language)

HTML became very popular in the first version because of its simplicity. HTML was focused on displaying the structure of a document without any visual of graphic interpretation.

HTML code sample
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<html>
<!-- this is a comment -->
<head>
<title>The title of the page</title>
</head>

<!-- the document body -->
<body>
<h1>Headline/h1>
<p>This is the body</p>
</body>
</html>

There was a pressure from web and browser creators when developing HTML so new marks for appearance and interactivity were added. The main mistake was created when tables started being used for design. A table should have been used for presenting a table data but not for creating the layout of website. This caused that the simple markup language changed into a tool for creating presentations and marks started creating the main part of the content (HTML 3.2 from 1996). The structure disappeared along with the readability which was the main target of HTML and internet started being controlled by chaos.

XML (eXtensible Markup Language)

It is a general markup language which was developed and standardized by W3C. It allows to create a concrete markup languages for any purposes.

This language is mainly aimed to transfer data between applications and to public documents. It allows us to describe the structure of a document in terms of content of the parts. It does not deal with the appearance or the parts of the document. The presentation of the document is realized by adding styles. Another option is to create a transformation into another type of document using styles. You can also transform it to another XML structure.

The original language for presenting HTML was not satisfying anymore because of its complexity which was created by extending it. The XML language has no predefined marks (tags, element names) and its syntax is much stricter than the syntax of HTML.

XML code sample - a simple list (catalogue) of CSs
−<CATALOG>
−<CD>
<TITLE>Songs of Distant Earth</TITLE>
<ARTIST>Mike Oldfield</ARTIST>
<COUNTRY>GB</COUNTRY>
<COMPANY>Warner</COMPANY>
<PRICE>499</PRICE>
<YEAR>1994</YEAR>
</CD>
−<CD>
<TITLE>Seascapes</TITLE>
<ARTIST>Michael Jones</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Narada</COMPANY>
<PRICE>399</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>

XHTML (eXtensible Hypertext Markup Language)

At the edge of century HTML needed to be stricter in syntax. In 2000 the first XHTML specification was created (XHTML is a markup language for creating hypertext documents in WWW environment developed by W3C). Developers though that it would be the follower of HTML and the development of HTML was ended with 4.01 version. In 2007 a new workgroup was created which started creating a new version of HTML - version 5.0. XHTML is being developed in parallel; the next version should be 2.0.

CSS (Cascading Style Sheets)

This is a language created for describing the appearance of pages written in HTML, XHTML or XML.

The language was suggested by W3C organization, the author of the first suggestion was Hakon Wium Lie. Two versions of CSS specifications were released - CSS1 and CSS2. The revision of CSS 2.1 is being finished and the next version will be CSS3. The main purpose is to allow developers to separate the appearance and the structure of a document. HTML should have allowed this but because of not capable standards and rivality it was developed differently. Older version of HTML has elements which describe not only the content but also the appearance. Such a development is not required because of searching and processing documents.

CSS code sample - rules for formatting a website
/* CSS Document */
*{ padding:0; margin:0; }
body{ background-image: url(../images/design/background/bodybackground.jpg);
/* background-image: url(../images/design/background/bodybackground_autumn.jpg);*/
/* background-image: url(../images/design/background/bodybackground_winter.jpg);*/
background-repeat: no-repeat;
background-position:center;
background-position:top;
background-color: #133366;
font-family: Tahoma, Verdana, Arial, serif;
/* font-family:Georgia, "Times New Roman", Times, serif;*/
text-align:center; /*kvuli exploreru nutno centrovat a pak zase davat doleva*/
font-size:62.5%; /*Richard Rutter trick 62.5 % z 16px je 10, pak 100 % = 1em */ }
strong {font-weight:bold;}
a[href$=".pdf"]
{ background: url(../images/design/other/pdf-icon.gif) no-repeat right top; padding-right:1.3em; }

Development of (X)HTML

The HTML language was suggested in 1990 (together with the http protocol to transfer it in networks). The first HTML version is called HTML 0.9 and it has no support for any graphics (only text and links).

We should remind that HTML is a markup language and its principle is to mark a text and set what it means. HTML language has for example a <p> tag which marks the beginning of a paragraph. If you write "<p>This is a paragraph.</p>" for example, you set that the sentence "This is a paragraph" creates a paragraph and it should be drawn like a paragraph. How does a paragraph appear? It should start on a new line after the previous text and be displayed inside something like a panel (block).

Another example is <h1> tag which marks the headline. If you write "<h1>This is a headline</h1>" then the marked text "This is a headline" appears in the browser being enlarged and strengthened.

HTML code sample
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<html>
<!-- this is a comment -->
<head>
<title>The title of the page</title>
</head>

<!-- the document body -->
<body>
<h1>Headline/h1>
<p>This is the body</p>
</body>
</html>

From the two samples you can see that HTML tags are important for a computer to be able to recognize the meaning of a text. Without these marks it is not available to divide a headline from a paragraph. Using this tags browser can display the final document exactly as we want because the author marked the text properly.

Every HTML version is a group of tags (marks) which mark the parts of every internet website. This group was developed during years and the number of tags was increased and decreased permanently.

HTML 2.0 was created in 1994 when the Netscape Company released the Netscape Navigator (NN) browser which contained many extra tags which were not involved inside the actual version of HTML. These tags allowed developers to create interactive websites with new possibilities (text colors etc.) and HTML websites started being created but these HTML websites functioned in Netscape browser only. This moment was marked as the start of the wars of browsers.

In the next two years the Netscape Navigator was spread and became the most used browser in the world. In 1996 Microsoft released the Internet Explorer which had its own extended set of HTML tags. WWW got into a situation when developers had to write several versions of a page for all browsers. The only solution was to define and enforce HTML as a standard because browsers contained mistakes when drawing HTML pages and they had their own tags. Developers of browsers were forced to follow the standards of HTML which is developed by W3C (led by Tim Berners-Lee, the author of WWW (you can view the section of Web Architecture and a subsection All Standards and Drafts and see for example the definition of the last standard of HTML - HTML 4.01 or a newer XHTML 1.0 standard).

One of the following standards is usually used nowadays when creating a web site:

  • XHTML 1.0 Transitional - the transition from HTML 4.01 - a benevolent standard allowing the usage of many old and banned HTML tags (see the differences between HTML4 a XHTML).
  • XHTML 1.0 Strict - a strict version of the previous one (the X letter before HTML means a connection to XML).

Individual Work

Try to get information about the difference between HTML4 and XHTML 1.0 from the W3C website. Assume that you have written websites using HTML4 so far and you want to continue using the XHTML standard. What should you do to prepare your code for XHTML?

Structure of (X)HTML page

In case you want to write a HTML page using the XHTML 1.0 standard you should follow the appropriate specification. Use a .html or .htm file and insert the HTML code inside. You can use programs like Adobe Dreamweaver, PSPad, Microsoft Visual Web Developer of the Notepad but writing more complex websites using the Notepad is too lengthy (more information is available in the lesson software for creating websites).

You can find information about the definition of XHTML document at W3C website :

  • The root element has to be html (remember using a root element in connection with markup languages and XML language).
  • The root element has to contain xmlns declaration of the namespace (see http://www.linuxzone.cz/index.phtml?idc=280&ids=2 for more information).
  • The DOCTYPE declaration has to be added.
A possible xmlns declaration sample
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Possible DOCTYPE declaration samples
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"       
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">    
    
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"       
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">    
    
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">  

Short summary: DOCTYPE and xmlns declarations tell the browser which version of (X)HTML language is used at the current website (you can see the versions stressed in bold in the previous samples). It is essential information for the browser to know how to display the page.

You can see the newly created HTML file using Adobe Dreamweaver in the following image. All HTML code which is written inside the file was generated automatically. You can see that the DOCTYPE can be chosen when creating a new file - you can select the (X)HTML version you want to use. You do not have to remember this information but you should be able to use it - you see that using the Notepad is not a good idea when creating a website because no code is automatically generated).

ukázka generované hlavičky v programu Adobe Dreamweaver

You can see the basic structure of the HTML page in the image:

  • DOCTYPE.
  • Element html which is closed at the end using </html> tag.
  • Element <head> starts the head of the page (there are basic information and links inside the head).
  • Meta tag with information about the encoding (charset) of the page. We will discuss this problem in one of the following lessons.
  • The page title inside the <title> tag.
  • The body of the page between <body> and </body> tags. You can write everything which the user will see here.

Additional Texts

Links

Questions

  1. What do you know about markup languages?
  2. What do you know about SGML?
  3. What do you know about HTML?
  4. What do you know about XML?
  5. What do you know about XHTML?
  6. What is CSS?
  7. What is HTML standard?
  8. How do the (X)HTML versions differ?
  9. What does DOCTYPE in a HTML page do?
  10. Describe the basic structure of a HTML page?
webdesign, xhtml, css, php - Mgr. Michal Mikláš