Ruby HTML5 tokenizer, written, fittingly, by Sam Ruby, porting the Python HTML5 html5lib tokenizer.
As a result of rigorous specification and careful thought, the HTML5 tokenizer is pretty easy to write AND works on almost all HTML and XHTML documents. I implemented it in Objective-C once, and it supports an unadvertised feature in Monocle currently (all the way back to 1.0) – parsing of Mycroft search engines, so you can click any Mycroft link that works from within Firefox in the ‘add engine’ sheet instead.
No comments yet.
Leave a comment
Your e-mail address is never shown. If you type a line break in the comment, it will show up as a line break (naturally). The following HTML is allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>