Migrating from XHTML to HTML5

When I first started this website, I chose to write the web pages in XHTML (specifically -//W3C//DTD XHTML 1.0 Transitional//EN). It seemed like the right thing to do at the time. I wrote a Python program, using libxml2 capabilities, to validate web pages before they were published to the site. This worked well for many years.

Much later, when I wanted to take advantage of the new HTML5 Canvas feature, I found I had to introduce a relaxed validation for these pages, which were flagged by a suffix of .htmx. Then, I added a mandoc-generated man page for remind. This was also HTML5. I began to think that the cool kids had left me behind. In fact, it looked like the whole world had moved on. Perhaps I shoud as well.

To convert to HTML5, first I needed something to perform page validation. I first looked at a Python module, html5lib. While this worked, it was very slow to validate a page (over two seconds on my server). The existing XHTML checking was sub-second. I then found tidy, which is more of a markup corrector. However, it also warned of markup errors and was of a similar speed to the XHTML validation. Even better, it came with a command-line program, so I didn't need any Python front-end.

That left the process of converting the site template to HTML5 and modifying page contents to remove deprecated or illegal constructs. These were mainly:

These corrections were mainly driven by the W3C Validator. It emitted warnings and errors ignored by tidy.

The increased importance of CSS also led me to clean-up the existing style sheets.

I did think about using html5lib as a validator separate from the publishing process, but it seems no more sensitive to the errors identified by the W3C validator than tidy.