Anybody who has ever attempted to make a web page knows what a tag soup is. It’s the innocent crime that most of us, unwantedly make. An ill-structured, invalid HTML file of your website design process is basically a tag soup. Since such a file never provides us with reliable output, it’s nothing but a soup of tag. The initial standards of web browsers were not equipped to parse the HTML files and so most of us ended up creating invalid HTML files or tag soups. Presently the web browsers have what we call the tag soup parser to detect and parse even an invalid HTML file.
A tag soup essentially refers to mistakes such as:
- Incomplete tags
- Mismatched tags
- Improper styling(files lacking proper indentation)
- Incorrect use of escape characters
- Use of proprietary HTML extensions
Here is a small example of a very unhealthy and painful tag soup:
<html>
<head>
<title>Lets nurture</title
<body>
<h1> This is the case of mismatched tags</h2>
<img src=”letsnurture.jpg”, border=”5>
</body>
</html>
As a general practice, on receiving unexpected output from a file like that, one would go back to HTML file, manually scan the document and find the invalid or missing tags, correct them and try of run again. But in the case of long HTML files containing multiple web pages (which is usually the case), this is a rather long and tiring process.
HTML5 comes to rescue at this point. Web browsers that are HTML5 compatible are able to handle tag soups. HTML5 has both forward as well as backward compatibility in the sense that in addition to supporting HTML4 it also houses many of the new features. HTML5 has laid down rules for parsing the HTML parsing which was not present before.
Apart from HTML5, there are a couple of tools that help fix the tag soup. Let’s look at each one of them briefly:
HTML Tidy
Developed by Dave Raggett of World Wide Web Consortium (W3C) is a library which has its source code written in ANSI C. It is a tool for a number of platforms. Fixes provided by HTML tidy includes:
- Correcting missing or mismatched tags
- Add missing items such as quotations, escape character
- Provide proper styling and indentation to HTML files
- Reporting use of proprietary HTML extensions
Tag soup
Tag soup is a java library that parses HTML file.Although it is not as efficient as HTML tidy, it corrects the HTML file on the go. It does guarantee well-structured results: tags will wind up properly nested, default attributes will appear appropriately, and so on. It is the free and open source.
Beautiful soup
It is a python web library that turns the invalid HTML file for your website design into a parse tree. A Beautiful Soup constructor takes an XML or HTML document in the form of a string (or an open file-like object). It parses the document and creates a corresponding data structure in memory. If you give Beautiful Soup a perfectly-formed document, the parsed data structure looks just like the original document. But if there’s something wrong with the document, Beautiful soup uses heuristics to figure out a reasonable structure for the data structure.
Read more about how to convert PSD to HTML.
After a proper use of any of the above solutions one should have a properly structured file as follows:
<html>
<head>
<title>Lets nurture</title>
<head>
<body>
<h1> This is the case of mismatched tags</h1>
<img src=”letsnurture.jpg”, border=”5></img>
</body>
</html>
So go ahead and drink a healthy tag soup.
Let’s discuss more about Tag soup and website design aspects. Leave a message to us on our Facebook page – LetsNurture.