Who can be blamed for invalid HTML? The coder who created the markup? The browser that has an "anything goes" mentality to rending HTML? The editor that doesn't flag improper tag usage? Or the specification itself for being to strict?
We can point the finger anywhere, but in the end, the basic rules aren't that hard to follow, and the coder shouldn't be allowed too much slack. But, the deeper problem is that creating HTML doesn't involve any formal training, are there really aren't any great books or resources out there that explain the rules so a beginner can understand them.
If a young driver blows through a stop sign and the police car sitting at the intersection just lets it go, that driver might assume it's an okay thing to do, especially if they never attended a driving school. The web browser is essentially the HTML police and it doesn't do much in the way of enforcement! Without any training, the HTML coder will learn by what works in the browser, rather than by what the actual specification allows.
I don't intend to walk through every rule in the HTML specification (I don't pretend to know all of them!), so I'll present one that I think is very common among novice coders, the use of the <a> element. My hope is that this example will inspire the readers of this article to more closely scrutinize their markup, and understand the importance of using an HTML validator (such as validator.w3c.org) to verify their work.
Wrapping it all in an anchor tag, is it okay?
The short answer is, in HTML 5, yes, otherwise, no. Believe it or not, HTML 5 allows the inline <a> element to "wrap" a block-level element such as a <div> or <p>. Does this make coding easier? Sure, but it adds complexity to the age-old rule that an inline element cannot contain block-level elements. Exceptions like this are what bloat specifications beyond comprehension by the average coder.
If you're a seasoned web developer and haven't actually read the new HTML 5 spec ('cmon, it's a nail biter), you might find this hard to believe. So, I urge you to confirm it for yourself.
Copy/paste the following HTML into http://validator.w3.org/
<html>
<head>
<title></title>
</head>
<body>
<a href=""><div></div></a>
</body>
</html>
Change the Doctype between HTML 4 & HTML 5 and notice how HTML 4 puts the smack down while HTML 5 gives you the thumbs up.
Until the day comes that the web is HTML 4 free, it might be wise to stick to the HTML 4 rule and not wrap any block-level elements with an <a>. Every one of us is no doubt supporting legacy HTML 4 sites; it would be difficult to stay conscious about this new freedom as we switch between the various Doctypes.
Granted it can be annoying and cause an awful lot of extra markup to wrap each individual inline element with an <a> element, especially when they all point to the same resource. One trick I like to employ involves using JavaScript/jQuery to manage the additional linking requirements.
HTML
<div class="product">
<a href="http://www.mysite.com">A link!</a>
<div class="image-wrapper">
<img src="/src/example.png" alt="" />
</div>
</div>
jQuery
$('.product').click(function() {
var link = $(this).children('a').first().attr('href');
location.href = link;
})
The example above maintains the DOM's adherence to the HTML 4 specification, as it doesn't actually alter the DOM.
Here's an example where the HTML itself complies with XHTML Strict, but when altered with jQuery, the DOM no longer complies with the spec:
<a href="http://www.mysite.com">A link!</a>
$('a').attr('target', 'blank');
In this case, we're adding the target attribute to the <a> element, which is not allowed in XHTML Strict.
Why did they do it?
I have tremendous respect for the W3C, for any group that authors a technical specification for that matter. Those guys are smarter than I am, so I'm reluctant to criticize too much of their work. However, it makes me wonder. Why allow such use of the <a> element all of a sudden? Back to the police car analogy, if everyone is blowing stops signs do we just make it legal to roll through a stop sign? Is it possible that the W3C was tired of being the bad guy when it comes to seemingly trivial validation errors?
Perhaps they considered alternatives, such as a block-level link element. <adiv> might have been a fine addition to the spec, certainly more useful than the new HTML 5 <mark> tag!
It's probably clear to you now how easy it can be to fill an entire book just on the HTML basics. Every once in a while, take a quick glance at the HTML spec and try to learn something new. Remember, your markup is available to everyone, simply by viewing the source within a web browser. So, put your best foot forward!