Unix Tools
unix tutorials, unix security, unix help


Understanding HTML Codes

Article by Lee Asher

HTML is a relatively simple language, in some places it is almost completely readable and understandable but that doesn't stop people from having problems with it. Why is that? It's mainly because, while the HTML tags themselves are easy, creating an HTML document that works as intended on a web server requires you to know a few extra things that aren't often explained.

Here, then, is a quick guide to understanding those parts of HTML that they just don't tell you about in the books. It is meant to work as a check-list for those writing HTML documents - perhaps with emphasis on all those doing it for the first time.

[Most of the below requires a fairly advanced understanding of HTML and web pages. If you're using any webpage builder at all then much of it will be handled automatically. -UT]

Understanding Doctypes.

It isn't often noted that valid HMTL documents don't actually start with the <html> tag - they have one extra tag before it. This is the doctype, and it must be present right at the top of your document for it to be valid HTML.

There are only really two doctypes that you really need to know about. The HTML4 doctype looks like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

The XHTML one looks like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

These versions of the doctypes that are a little more forgiving - if you're a purist, you can use the strict ones instead by changing the words 'transitional' and 'loose' to 'strict'.

But what is the doctype for? Well, its purpose is simple enough: it tells web browsers exactly what version of HTML your page was written in, to help them to interpret it correctly. If the browser can't quite understand the page then you get missing or overlapping sections and other strange errors.

Understand HTTP Errors.

A truly shocking number of people writing HTML pages don't know how HTTP works - and they quickly run into trouble because of it. HTTP is the way a web browser communicates with a web server, and this communication includes information about your pages, such as cookies.

You don't need to worry too much about the internals of HTTP, but it's worth knowing that it works by the browser sending a request to the server for a certain page, and the server then responding with a code.

Your website should be set up to handle error codes well. For example, a 404 (page not found, example: unixtools.com/no_page) error should show a page with links to the most useful parts of your site. Other common error codes include:

200 - OK

301 - Page moved.

403 - Forbidden (no authorisation to access).

500 - Internal server error.

For more information, visit www.w3.org/protocols.

Understand MIME Types.

MIME types are another part of the HTML header - an important one. Also known as the content-type header, they tell the browser what kind of file they are about to send. Browsers don't rely on HTML files ending in .html, JPEG images ending in .jpeg, and so on: they rely on the content-type header. If you don't know about this, you can have problems if you need to configure your server to send anything unusual.

[Server configuration is not something that most of us will ever have to worry about, unless we become our own web hosts. -UT]

Here are some common MIME types:

text/html - HTML.

text/css - CSS

text/plain - plain text.

image/gif - GIF image.

image/jpeg - JPEG image.

image/png - PNG image.

audio/mpeg - MP3 audio file.

application/x-shockwave-flash - Flash movie.

Understand Link Paths.

One of the hardest things to understand about HTML is all the different things that you can put in an 'href' property. Abbreviated URLs are created using the rules of old text-based operating systems, and there are plenty of people writing HTML today who are completely unfamiliar with these rules.

Here are some examples. For each one, the assumption is that the link is on a page at http://www.example.com/example1/example1.html.

- links to http://www.example.com/example1/example2.html - links to http://www.example.com/example1/example2.html - links to http://www.example.com/example2.html - links to http://www.example.com/example2.html - links to http://www.example.com/ - links to http://www.example.com/example1

To put it simply, one dot means "in the folder we're in now", while two dots means "in the folder above the one we're in now". This can get confusing fast - just look at the difference one dot can make! Be careful with it.

[UT comment: Ok, the above was pretty poorly written. If you want to link to a page in the same folder you can use the abreviated URL or the full URL. Let's say the page you want to link to is this page,html-codes2.html, you can do it like this:

abreviated: <a href="html-codes2.html">HTML Codes</a> or you can use this:
full URL: <a href="http://www.unixtools.com/html-codes2.html">HTML Codes</a>

If the file you're linking to is one folder level higher, the you would use this:
<a href="../PageName.html">

If the file is in another folder, you could do this:
<a href="../NameOfOtherFolder/PageName.html">

It's probably best to always use the full URL when making links to pages on the same site. When linking to another page you have to use the full URL. -UT]

Understand How to Insert Things That Aren't HTML.

One of the most common HTML questions is how to insert things like Javascript and CSS into an HTML document. This is one of the easiest questions to answer: you simply use the link and script tags, like this:

<link rel="stylesheet" type="text/css" href="default.css" />

[Any site that uses CSS will have a statement like the above. -UT]

<script src="scriptname.js" type="text/javascript" language="javascript"></script>

For example, the google adsense code below is a javascript:

About the Author

Information supplied and written by Lee Asher of Eclipse Domain Services, Domain Names, Hosting, Traffic and Email Solutions.