Jeff Cutsinger → Semantic Nonsense

N
  1. Adventures in HTML5

    So, I’ve been toying with using HTML5 on my site. I’m thinking that, because this is a spec built with backwards compatibility in mind, the transition is going to be smooth, right? Well, mostly. It was pretty easy to make my site conforming. I restructured my markup a little bit to throw in some of the new bits like header and nav. Here’s what it looks like (comments added):

    
    <!doctype html> <!-- Love the new doctype -->
    <html lang='en'> <!-- Language codes are a good thing -->
    	<!-- No head tag. Got a problem with that? -->
    	<title>Jeff Cutsinger</title>
    	
    	<header>
    		<h1>Jeff Cutsinger</h1>
    		
    		<div><a href="#navigation" id="nav-link"
    			title="Skip to Navigation">N</a>
    		</div>
    	</header>
    
    	<-- Yes, the navigation skip link is broken.
    	    This is a WIP, though -->
    	<nav>
    		<ul>
    			<li><a href='news/'>Weblog</a></li>
    			<li><a href='sitemap'>Site Map</a></li>
    			<li><a href='search'>Search</a></li>
    		</ul>
    	</nav>
    

    I’ve left bits out, but these are the important ones. As you may have noticed, I have a little script that makes the navigation pop up when you mouse over the nav link. It’s nice and unobtrusive in my opinion, but the changes broke it. In the meantime, I’m switching to use base2 because it’s the bomb. So I write:

    
    	var navLink = document.getElementById('nav-link');
    	var navigation = document.matchSingle('nav ul');
    	
    	var ns = navigation.style;
    

    Great. So it grabs the ul child of the nav element, and screws with its style (actual manipulations not shown). But Firebug comes back with an error. Turns out that document.matchSingle('nav ul') is actually null. What? After some digging, I see in Firebug that the DOM is not at all what I expected.

    What‽ What‽ You’re flipping kidding me, right‽ You have got to be kidding me! You’re telling me that the h1 is a sibling of header

    Alright. Crazy behavior from browsers is to be expected. Still, this doesn’t bode well for HTML5. It appears as though their new elements and backwards compatibility don’t mix. That is, you can’t reliably (or at least trivially) script HTML5 documents in existing browsers, except for Opera (and maybe Safari, which I can’t get to work on my machine). I really hope there’s something I’m missing.

  2. Site Update

    I’ll just come out and admit it. I’m pathological when it comes to my site. If I spent half the time I spend upgrading it technologically on content, I’d have a lot more traffic. But I’m pathological.

    I’ve built an adhoc php-based (I know, adhoc and php-based are practically redundant) CMS for my static content. Its features include:

    • Advanced caching. The pages themselves support ETag based conditional get (I might add Last-Modified at some point, but for now, ETags will have to do). External resources have caching based on a far future Expires date. If I need to update an external resource, I bump the “version” which changes the URI.
    • Speaking of URIs, they are now pathologically clean. I’ve made it so resources that aren’t collections don’t end in slashes. The old URIs now redirect to the new ones.
    • And speaking of redirection, any variance in the URI requested will cause a redirect to a canonical version of the URI. I do mean any variance. If your query string parameters are out of order, it will redirect with the correct order (for that matter, if you use a parameter that is unrecognized, it will redirect with that parameter removed).
    • Pages now respond with 405 Method Not Allowed for any verb they don’t recognize. This is, by default, anything other than GET and HEAD.
    • Besides the obvious template changes, there are some niceties under the hood, like all my scripts using type='application/ecmascript' instead of type='text/javascript' (this is also the media type my script resources send). It’s true that the browser that shall not be named doesn’t recognize this media type and thus ignores the scripts. It’s also true that I really could not care less.
    • Also, my site is no longer valid HTML. You have no idea how much this pains me, to the roots of my soul. I am going to kill myself. My site is no longer usable in any browser, brutalizes the disabled, and eats tiny fuzzy bunnies live. Darfur, move over. Iraq? Ha! jeff.cutsinger.org is no longer valid. Woe to you, world, for I am your undoer.
    • I’ve been using the time element in place of the abbr design pattern in my microformats, because it is broken.

    So you see, I’m pathological. Guess what? I’m still not happy with my site.

    1. I want to get rid of PHP. I’m not just talking about on my site, I’m talking about in all of the world. But my site is probably the best place to start. While I’m at it, I wish I could lose mod_rewrite.
    2. I’d like to support pingback on all of my pages.
    3. As before mentioned, I might like Last-Modified support.
    4. I’m thinking about throwing caution to the winds and just using HTML5.
    5. 1812 is pretty cool, but I’m working on my own replacement.
    6. I wonder how much work it would take to use FastCGI.
    7. Maybe I’ll actually add some real content one of these days.
  3. On Presentational Elements

    Another lively debate we are seeing on the public-html mailing list is about semantic vs. presentational markup. Now, it should be clear that the best practice for web design today is:

    1. Design semantically structured HTML.
    2. Validate.
    3. Write CSS to make the result look as desired.
    4. Test in all browsers.

    The focus being point 1. Why is it that it’s best practice to design your HTML to be a semantically structured document which you then style, rather than to simply design presentational markup that looks like you want?

    • In general, semantic markup is more accessible (although this is not always the case).
    • Semantic markup tends to gracefully degrade.
    • The semantics in the document can be harvested.

    Et cetera. My point is, that semantic markup is best practice because of its fruits, not because it is semantic. There is a conception that if it’s more semantic it must be better, on which I will call BS. If there is no benefit to the semantics, they are wasted and pointless. So for a new feature it’s not enough to simply show that it’s better semantically, it needs to also be shown what use cases exist.

    Switching gears. There’s a lot of misunderstanding about semantics out there and a lot of well-meaning people get it all wrong. For instance, many people view semantic markup as:

    1. No tables.
    2. XHTML + CSS.
    3. Using em and strong in place of i and b.

    Which is complete rubbish. Tables have perfectly good semantic meaning. I saw someone the other day who tried marking up calendars as nested ordered lists. What a horrid abuse! Calendars are naturally semantically marked up as tables (just make sure you use caption, summary, th, etc). XHTML is not more semantic than HTML and, while CSS enables semantic markup to look nice, it does not add any semantic value to a document.

    The last one is probably the worst. The web standards movement has brought us a lot of good stuff, but it killed em and strong. That’s right, it killed them. It is absolutely, completely untrue that em and strong are drop in replacements for i and b, and to use them as such is to do violence to their meanings. To understand this it is important to consider that it is not the name of an element that gives it its meaning. The name of an element is only a useful mnemonic. The semantics of an element are granted by their usage. If ul was magically changed to bacon but it was used in all the same situations, it would not have lost its semantic value at all. I would know that if I grabbed an element called “bacon”, it would be an unordered list. What has happened to em and strong is that people have been using them in all the same places they would have used i and b. What does that tell you? It tells you that if you grab an element called em, you have gotten something that someone wanted italicized! So the point being that the web standards advocacy meant to add semantic value to the web has actually robbed it of the meaning of two important elements!

    This is one reason why it is important to have presentational elements alongside semantic ones. The knowledgeable author can use semantic elements to the exclusion of presentational markup with great effect. But the vast majority of users of HTML don’t even know what semantic or presentational markup is, much less how to choose the element with appropriate semantics in a given situation. If we design a language with no presentational elements, they will be forced into making these decisions in ignorance. The result will be a net loss of meaning on the web.

  4. Error Handling

    Well. The W3C HTML Mailing List is going nuts. I’m somewhat worried that I have contributed to this, but I think this was bound to happen because of the disconnect between a lot of web developers and the WHATWG. It’s not that the WHATWG has been insular, closed and secret, but that a lot of web developers haven’t been interested enough to keep track of a lot of the decisions that were hammered out carefully there. And so we have a lot of issues being brought up about these decisions.

    The one I care about most is error recovery. The WHATWG spec defines exactly how a document claiming to be HTML is to be parsed, even if that document contains gross errors. This is the polar opposite of the XML approach which is to mandate that a conforming parser cannot recover from an error. The XML approach (“draconian error handling”) is a terrible idea and is the absolute worst part of the whole spec.

    Proponents of draconian error handling seem to think that we can create a “clean, lean” language and rid the web of all its ills. They are worried that new browser vendors will have to live with the legacy of bad HTML that is so predominant on the web. They view the new HTML spec as a throwback to the days of tag soup. Unfortunately, these arguments do not pan out.

    1. There already exists a “clean, lean” syntax: XML. It hasn’t worked on the web, for a number of reasons (not the least of which is its draconian error handling). The last attempt at creating a “clean, lean” semantic language was XHTML 2, and we’ve seen how well that’s worked.
    2. There’s a large body of documents on the web that are crappy, horrible HTML. This is not going to change no matter what some standards body does.
    3. Because of the previous point, browser vendors are going to have to live with the legacy of bad HTML one way or the other. Adding another language to deal with will simply make their job harder.
    4. While the WHATWG spec does give exact handling for erroneous markup, it does not suggest or recommend that authors generate erroneous markup. Conformance checkers will reject bad markup.

    Furthermore, there are a number of problems with the draconian stance.

    1. It’s bad practice. We’ve known since before the web that the best policy in these matters is Postel’s law: be conservative in what you produce and liberal in what you consume. Error handling is something that exists on the client (or consumer) side. The draconian approach is conservative, so it’s the wrong thing to do. Now, it’s important to recognize that this doesn’t mean that we be lax in what we produce. We should produce very strict HTML. That’s best practice.
    2. It’s a bad division of labor. It puts the burden of conformance checking where it doesn’t belong. It doesn’t make sense to have a browser running all the overhead of a conformance checker when its user doesn’t even know what conformance checking (or HTML) is.
    3. It ignores the possibility of bugs. If a producer puts best effort into outputting strict HTML and fails, it is not an appropriate response to deny service to the consumer. Firstly, this is bad for business. I can’t imagine Ebay adopting HTML5 if it means that a bug in their frontend causes lost business. Secondly, the draconian approach doesn’t necessarily eliminate bugs before they are deployed because it is an instance of testing. Testing is remarkably good at showing the presence of bugs and remarkably bad at showing their absence.
    4. In the case of HTML, it is problematic from an implementation standpoint. For one, it obviously requires the addition of another “rendering mode” (or actually a parsing mode) alongside existing modes. This in and of itself is difficult for user agents to maintain. Remember that the old modes can’t be eliminated because doing so would (and always will) break compatibility with the rest of the web. So this is a burden placed on user agents. However, the most problematic issue is that of triggering the new mode. Some have suggested switching on the doctype, but the big browser makers have made it clear that is not a workable solution. The alternative is to switch using a mime-type, but this completely breaks backwards compatibility since current user agents won’t know how to handle it. So it’s impracticable.
    5. In the end, it doesn’t prevent tag soup because market pressure compels consumers to follow Postel’s law. This has already been demonstrated with XML. A lot of the XML producers on the web screw up, and XML consumers have followed suit by disobeying the standard. And of course, since there is no standard governing this disobedience, they all do it differently (as is done with HTML today), which means interoperability is lost.

    The WHATWG approach:

    1. Allows the use of XML or the more liberal HTML syntax,
    2. Specifies a method of consuming the tag soup that exists on the web,
    3. Is one language that is well specified and therefore easy to implement,
    4. Encourages best practices for producers and consumers,
    5. Follows Postel’s law,
    6. Does not require User Agents to perform the unnecessary labor of conformance checking, nor to pester their users with error messages they don’t and shouldn’t care about,
    7. Acknowledges that authors are not and never will be perfect, and defines how best to deal with the inevitable problems that will come up,
    8. Does not require the User Agent to implement yet another mode, and
    9. Does all this in an interoperable way.

    Finally, it’s important to realize that all this debate is moot. If the spec adopts draconian error handling, it will be DOA, as none of the major browser manufacturers will implement it.

  5. XKCD googlebombing

    Explanation. Not that I have much Google Juice.

  6. The joy of being a web developer

    The joy of being me. In which Chris Wilson ponders why people are so rude about the versioning issue. I think maybe it has to do with the fact that what he’s proposing is the worst possible thing for the web. But don’t worry. According to him, we’ll get a way to ‘get into “really standards” mode prior to HTML5.’ Great. So we’ll have quirks mode, almost standards mode, standards mode, and “really standards” mode. I’m just waiting for “No really I promise this is actually standards mode”, “cross my heart hope to die stick a needle in my eye standards mode”, and “bet you ten bucks this is actually standards mode”.

    I’ve got an idea. Let’s just quit the adding of new rendering modes. Then we’ll have two modes: the “I don’t have a clue and I just copied this HTML codes from my friends page and what is a doctype? mode” and standards mode. And that is one too many.

  7. Twitter

    WhatWG has a website, a wiki, forums, three public mailing lists, and a blog. It's clear that they want to have an open, collaborative process. But this is just going too far.

    It's old news that Twitter is all the rage on the web. For me, though, it's not an option. I value control over my content too seriously to throw it in the hands of a third party corporation, no matter how "altruistic" they claim or seem to be. With corporations, the bottom line is always the same. I pay Textdrive decent money (for the services I get), so I know they'll be interested in protecting me and mine. So until I see a TwitPress that allows me to have http://jeff.cutsinger.org/status/, no Twitter for me.

  8. Awesome Things

    • This may be old news to some, but I just heard about it yesterday. Roy Fielding is working on a replacement for HTTP called Waka. I have no doubt it's going to be sweet.
    • Thomas Broyer has added HTML5 support to Genshi. This would be really cool if it were limited to output. But it isn't; it has support for using HTML5 for templating, too. Amazing.
    • Subtext? Subtext subtext subtext. Subtext.
  9. HTML5 feed autodiscovery

    I received an e-mail through my contact form stating that this page doesn't have a feed autodiscovery link. Here is the relevant markup:

    
    <link href="http://jeff.cutsinger.org/news/feed/"
          type="application/atom+xml"
          rel="feed alternate"
          title="Jeff Cutsinger">
    

    This works in Firefox. Sadly, I haven't really tested it in any other browsers. It seems that it conforms to the current HTML5 spec's definition of feed and alternate. In addition, it seems like the Right Thing to do from a semantic standpoint, because it is exactly what the link is: an alternate representation of the current document (it has everything this one does or a close approximation) that is also a feed. The individual entries don't have the link because it isn't a feed for them nor is it an alternate representation, whereas the comments page has a feed for the comments.

    Am I doing something wrong?

  10. So much more comfortable

    Roger Johansson recommends putting your mouse in front of your keyboard. I tried it, and I now echo this. Just one day and I can already tell the difference.

  11. Awesome

    Fixing the usemap attribute. In which Ian Hickson removes all doubt as to where the future of the web lies.
  12. 1812 Install Instructions

    As promised, step-by-step instructions on how to install 1812.

    Full text
  13. New Site

    I finally have the site in a somewhat workable state. I'm looking to add a lot of meaningful content in the coming days. Criticism welcome.

CC *