I've been spending a lot of time with Amazon's cloud database SimpleDB recently and was invited to give a presentation in July on how Memcached and SimpleDB work really well together. We've made a copy of the presentation available online and you can view it below. If you want a PDF copy of the presentation, you can order it for free on the same page.
Archives for: August 2009
We've been working on a little research tool that helps keep you in the loop of what's going on in cyberspace and are pleased to launch www.webtrendsnow.com
The site works like this.
Ever hour, Google trends publish data about the hot search terms as determined by analysis of their US based users.
We then run a search against each of the top 100 search terms to find the top sites that web users will be visiting as a result.
We show you this information, together with a thumbnail of the website (made by another mindsizzling service we will soon be launching to the public) which makes it very quick and easy to see if the site is something you are interested in.
If you find it useful, do let us know ...
We're working on a project at the moment that has a lot of XML flying about, for example we wrap data coming out of Amazon SimpleDB in XML and then consume that data in the rest of the program.
I've been using XML::XPath to extract the data from the xml, so I can write this sort of thing;
my $xp = XML::XPath->new( xml => $xml );
foreach my $walk ($xp->findnodes('/walks/walk'))
{
my $walkid = $walk->findvalue('./@itemname');
etc ...
}
It's easy to write, easy to read and works well. However recently I've begun noticing that the project has become a bit, well, sluggish. I was kind of hoping that XPath would be using the C (and hence very fast) LibXML under the hood since I had recently installed that parser on the system, however the lack of speed led me to think this might not be the case.
Reading around, I discovered that there already is XPath support built in to LibXML and so I was able to rewrite my code as follows;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $xp = XML::LibXML::XPathContext->new($doc->documentElement());
foreach my $walk ($xp->findnodes('/walks/walk'))
{
my $walkid = $walk->findvalue('./@itemname');
etc ...
}
Note how it is just the setup that has changed, the actual data processing stays the same (in most cases).
This makes things *MUCH* speedier as you would expect. My perception is perhaps as much as 10 times faster for large XML files, but I haven't done any quantitative analysis.
BEWARE though, it's not a completely transparent drop-in as the parser in LibXML has some quirks. For example if there is a namespace declared in the xml file, then you will get no data returned unless you correctly attach this to the context.
For example, when writing an Atom parser, note the registerNs line
$PARSER = XML::LibXML->new();
$DOC = $PARSER->parse_string($xml);
$XP = XML::LibXML::XPathContext->new($DOC->documentElement());
$XP->registerNs( xatom => "http://www.w3.org/2005/Atom" );
foreach my $data ($XP->findnodes('//xatom:entry/xatom:content[@type="text/xml"]'))
This despite the fact that inside the atom feed, NO namespace is explicitly used in elements. The atom file contains <entry> and NOT <xatom:entry> but you MUST attach a namespace to be able to read the data. You could choose any namespace, I picked xatom but it could just as well have been fred. Go figure ...