Ruby HTML5 tokenizer, written, fittingly, by Sam Ruby, porting the Python HTML5 html5lib tokenizer.
As a result of rigorous specification and careful thought, the HTML5 tokenizer is pretty easy to write AND works on almost all HTML and XHTML documents. I implemented it in Objective-C once, and it supports an unadvertised feature in Monocle currently (all the way back to 1.0) – parsing of Mycroft search engines, so you can click any Mycroft link that works from within Firefox in the ‘add engine’ sheet instead.
No comments yet.
Sorry, the comment form is closed at this time.