On performance: Sometimes the wheel just ain't up to scratch

Posted by David Felstead Thu, 23 Feb 2006 04:01:00 GMT

It’s one of the cornerstone concepts of programming these days – Don’t Re-Invent the Wheel. These days there are so many third party libraries, utilities and frameworks available that more often than not you would be crazy to write the difficult stuff yourself. Occasionally, however, you find yourself outside the “more often”, and run into one of those “not” situations, one where just throwing more hardware at the problem won’t make it go away. Just recently, the Site5 Engineering Team (of which I am a member) ran into one of those “not” situations. The product? Flashback.

The problem

Flashback is a really nice piece of software. It’s a file explorer for your webspace with a difference – it not only allows you to see your website now, but also as it was a day ago. Or a week ago. Or a month. You get the idea. Any changes you make in your webspace are picked up and versioned by the Flashback engine and are recorded for posterity. You want to revert back to your old layout? No problem. Want to retrieve those images you accidentally deleted? They’re there.

The core of the original Flashback used to be the source control management software Subversion (or SVN), which is a great tool to add to any developer’s repertoire – and joy of joys, it even comes with and external API and, more importantly to us: bindings for Ruby. Now at first glance, one would assume that SVN would be pretty fast and performant – after all, it’s written in C and has a thriving open-source community contributing to its development. Unfortunately, that assumption (the word should have raised alarm bells) came back to bite us. Whilst being a great source control system, it turns out that when it comes to performance and efficiency, SVN is a real dog. And you know what? That’s fine. It is the “more often” than the “not” that you don’t care about performance in managing your source code, and for what it’s designed for, SVN ain’t so bad. Anywhere outside its comfort zone though… BZZZZZT! – no good.

YOU (yes you) can always do it better

Off on a tangent for a second – back when I was at university, probably in my second year of a computer science degree, we were assigned the typical task of implementing a Quicksort algorithm in C, and benchmarking it against various other sorting algorithms. Of course, the cynics and the realists in the group wondered what the point of this was? Any programmer worth their salt knows that the C standard library’s qsort function implements the Quicksort – Why Re-invent the Wheel?. The surprise came when the class implemented the algorithm themselves and benchmarked it against the original qsort function. The result? Around 80% of the class had implemented a faster version of the algorithm, and these were second year uni students! A similar revelation came when a friend of mine, studying for his PhD re-implemented some of the functions in string.h (rather than relying on the standard library) in a very CPU intensive experimental search engine. The result? It ran about 40% faster.

The moral? When it comes to performance, you can always do it better. Why? Because you know the problem you’re trying to solve.

FlashbackPRIME – a faster, more efficient wheel

So it turns out that SVN wasn’t up to scratch, and not viable for long term deployment – it’s just too slow and too much of a resource hog. So what to do? The first step was taking a few benchmarks. As a test, I implemented a few algorithms (change detection and repository updating) in pure Ruby and measured them against the same functions in SVN. The performance results were amazing – the pure Ruby solution outperformed the C based SVN (with Ruby bindings) by several orders of magnitude – it was literally hundreds (sometimes thousands) of times faster. With this data in hand, the Site5 Management Team gave me the go-ahead to re-implement the guts of Flashback, and with our lovely modular design of the first system, slotting it in was a breeze.

The final feature set of FlashbackPRIME is comparable to SVN’s:
  • Both systems use a filesystem based repository
  • They both have atomic, transactional commits with rollback capabilities
  • Both have storage engines based on delta compression
  • Both can store arbitrary metadata on items

There are a lot of things that SVN does that FlashbackPRIME does not, but the guts of the functionality is the same… and the results? Incredible. Here are some rough timings:

Task Flashback/SVN FlashbackPRIME
Populating large repository (several gigabytes, thousands of files) about 2 hours about 126 seconds
Sweeping same repository for changes about 38 minutes about 81 seconds
Sweeping smaller repositories about 15 seconds less than 1 second!

Very unscientific figures, but you get the jist.

Sometimes it comes to a point where the wheel just won’t cut it any more, and these are the times that YOU as a developer need to take control and say “You know what? I can do better than that.”

Comments

  1. David Harris said 1 day later:

    You know, it was this post, out of all the things I’m reading about Site5, that will cause me to make a webhost switch to you guys. Knowing that S5 is willing to rewrite an entire engine to gain performance (and knowing that it was designed well enough to be a painless process) is just comforting to me. And, the fact you use Ruby. :)

  2. Jomdom said 2 days later:

    Amazing stuff on this blog. I just got done reading the entire first page of the Site5 main blog, and am in the process of doing the same here.

    I am currently a Dreamhost customer of 5 days, and am having serious trouble with their Ruby on Rails implementation. Knowing that Site5 has – and correct me if I’m wrong – two programmers on staff that were hired specifically because of Ruby support is simply amazing.

    You can expect to hear from me within a week on setting up some hosting.

    Keep up the awesome communication and forward thinking goals; they’re working wonders.

  3. Matt Lightner said 2 days later:

    Hey Jomdom,

    We actually have three people on the team who were hired for their Ruby/Rails development expertise (although as you can see, they do much more than that). In addition, Adam Greenfield, Site5’s CTO (who also serves as the Engineering Team leader), and I are both accomplished Ruby programmers and active contributors to all of our projects.

    I’m glad to hear you’ll be joining Site5 soon—we have lots of great things in store for the future!

    Matt

  4. David Felstead said 2 days later:

    Hi David H. and Jomdom – thanks for the feedback!

    One of the coolest things about Site5 (and working for Site5) is the fact that it’s about never standing still – we’re always working to improve, innovate and make the webhosting experience better for everyone.

Trackbacks

Use the following link to trackback from your own site:
http://www.karmiccoding.com/articles/trackback/6

(leave url/email »)