waffle

Slash and Burned

Wepricot is now checked into the Subversion repository of its Google Code project. (Commit bits are available per email, or you can just clone it.) In it are some new features as well. DOMNodeList now includes Enumerable and gets a lot of neat methods and I’m using more idiomatic Ruby code in some places (would you believe I actually instantiated an array, looped through an enumerable and added results to that array instead of just collecting them?).

The big news, though, is that DOMNodeList and DOMNode now implement / and % (querySelectorAll and querySelector, respectively, with names that harken back to Hpricot). This wasn’t easy. The contract is for those to return their values in document order, and while that’s easy for a single node, you must order them for the set of results you get by querying every child node in a node list. Fortunately, I could wrap the relatively new compareDocumentPosition DOM method to do this.

Unfortunately, another part of the contract is that you return DOMNodeLists and nothing else. (I could probably bend this rule since this code would most likely not leave Ruby, and in Ruby it’s what you offer that matters, not how, but I’ve chosen not to right now since the type soup is already muddled.)

The way I’ve done this right now is to straight-up subclass DOMNodeList, but it’s not supposed to be subclassed. It holds one private field, which is some kind of opaque struct holding a bunch of private objects. I don’t fault it for that, but I have to override finalize with an empty body or it tries to release this struct, which I will remind you I haven’t allocated nor populated.

Or, to quote the code:

def finalize  
    # EMPTY  
    # UGLY UGLY UGLY  
    # HACK HACK HACK
    [..]

This is the part where your feedback, or your interest in writing some code, comes in handy.

Wepricot

So, about yesterday’s crazy MacRuby+WebKit Hpricot-lookalike hack, Wepricot.

I ran a comb through its hair and uploaded the entire thing to a new Google Code project after some slight reader prodding (I was hoping someone would) from someone working on MacRuby (now that I did not expect). BSD license, although it’s not really much of an original invention. This isn’t anywhere near polished, full-featured, thought-out, finished or final, and I’m not even sure what ‘final’ would entail. I just thought of something and am now posting what I came up with because it seems like it’s useful and it could become more useful in the hands of other people.

It’s not even checked into Subversion yet, it’s just a download; nevertheless, have at it, and by all means help me make it do something more than what it currently does, because it doesn’t do a whole lot yet (the sample code pretty well maxes it out). I hope to have something in a more patchable form later this weekend, but mail me your improvements, post them to the wiki and/or file issues with fixes and we can all benefit. And if one of you want to host it somewhere git-ish or hg-y for instant collaboration, that’s fine too, just mention the link here or on the project page.

Rocket Fuel

Wepricot.fetch('http://google.com') do |result|
    puts "URL: #{result.url.absoluteString}, source: #{result.html}"
    i = 0
    (result / 'img').each do |node|
        src = node['src']
        resource = result[src]
        puts "DOM traversed img (##{i}) with src: #{src}"
        puts "resource: #{resource.MIMEType}, #{resource.data.length} bytes"
        i += 1
    end
end

You wonder why I like MacRuby. I wrote the beginnings of a layer in one and a half hours and 190 lines of code to essentially replace what I use Hpricot for (parsing HTML into a tree), but by using WebKit it gets “native” access to the DOM, selector queries and the loaded resources themselves.

Output:

URL: http://www.google.com/, source: [..snip..]
DOM traversed img (#0) with src: /intl/en_ALL/images/logo.gif
resource: image/gif, 8558 bytes

That is exactly why I love MacRuby. Mad ideas implemented in a jiffy by crossing the streams.

Update: this code is now available.

Today in New Developments

Older posts »