I've been spending a lot of time with Amazon's cloud database SimpleDB recently and was invited to give a presentation in July on how Memcached and SimpleDB work really well together. We've made a copy of the presentation available online and you can view it below. If you want a PDF copy of the presentation, you can order it for free on the same page.
Category: News
We've been working on a little research tool that helps keep you in the loop of what's going on in cyberspace and are pleased to launch www.webtrendsnow.com
The site works like this.
Ever hour, Google trends publish data about the hot search terms as determined by analysis of their US based users.
We then run a search against each of the top 100 search terms to find the top sites that web users will be visiting as a result.
We show you this information, together with a thumbnail of the website (made by another mindsizzling service we will soon be launching to the public) which makes it very quick and easy to see if the site is something you are interested in.
If you find it useful, do let us know ...
We're working on a project at the moment that has a lot of XML flying about, for example we wrap data coming out of Amazon SimpleDB in XML and then consume that data in the rest of the program.
I've been using XML::XPath to extract the data from the xml, so I can write this sort of thing;
my $xp = XML::XPath->new( xml => $xml );
foreach my $walk ($xp->findnodes('/walks/walk'))
{
my $walkid = $walk->findvalue('./@itemname');
etc ...
}
It's easy to write, easy to read and works well. However recently I've begun noticing that the project has become a bit, well, sluggish. I was kind of hoping that XPath would be using the C (and hence very fast) LibXML under the hood since I had recently installed that parser on the system, however the lack of speed led me to think this might not be the case.
Reading around, I discovered that there already is XPath support built in to LibXML and so I was able to rewrite my code as follows;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $xp = XML::LibXML::XPathContext->new($doc->documentElement());
foreach my $walk ($xp->findnodes('/walks/walk'))
{
my $walkid = $walk->findvalue('./@itemname');
etc ...
}
Note how it is just the setup that has changed, the actual data processing stays the same (in most cases).
This makes things *MUCH* speedier as you would expect. My perception is perhaps as much as 10 times faster for large XML files, but I haven't done any quantitative analysis.
BEWARE though, it's not a completely transparent drop-in as the parser in LibXML has some quirks. For example if there is a namespace declared in the xml file, then you will get no data returned unless you correctly attach this to the context.
For example, when writing an Atom parser, note the registerNs line
$PARSER = XML::LibXML->new();
$DOC = $PARSER->parse_string($xml);
$XP = XML::LibXML::XPathContext->new($DOC->documentElement());
$XP->registerNs( xatom => "http://www.w3.org/2005/Atom" );
foreach my $data ($XP->findnodes('//xatom:entry/xatom:content[@type="text/xml"]'))
This despite the fact that inside the atom feed, NO namespace is explicitly used in elements. The atom file contains <entry> and NOT <xatom:entry> but you MUST attach a namespace to be able to read the data. You could choose any namespace, I picked xatom but it could just as well have been fred. Go figure ...
We're working on a consumer project at the moment that makes heavy use of mapping technologies. We built the project with google maps, but ran into a deal breaker where placemark links in parsed KML always open in a new browser window, rather than the same window that is sometimes preferable.
Google seem to have acknowledged this is an issue, but they acknowledged it 2 years ago and have done nothing since - presumably for their own good reason - however for someone developing tools against the API, it can be a real headache.
Enter the open source community and a great product by Lance Dyas called GeoXML which is a new parser you can use for KML files (and much more) within the Google Maps API. And the BEST feature ... well for us, it DOES allow you to open links in placemark info windows in the SAME browser window. Great!
Beware that sadly the project lacks any real documentation and getting started can be rather hard. The documentation for another project EGeoXML is a useful starting point.
We're a fan of Amazon cloud computing and have been using SimpleDB within some projects that we will be launching soon. The problem though is that there is no backup solution for the service and whilst Amazon's redundancy etc means that we should never lose any data anyway, that doesn't protect you against the accidental deletion of data (been there, done that) or the desire to have data in files that you can import into other databases (eg Mysql)etc.
I was a bit surprised that this hadn't been addressed by the Amazon community at large so I took a few days (well, a week) out of my schedule to pull together www.backupsdb.com - the world's first online backup solution for Amazon SimpleDB.
We launched it last week and you can read more over at the backupsdb blog