Does it scale?
Posted by David Felstead Thu, 22 Jun 2006 01:24:00 GMT
Anyone who has been around Ruby and Rails (or any new software technology) for a while has seen or heard the question asked countless times: Does it scale?
Inevitably the one who is asking the question is referring to the scalability of the software framework, asking whether it can be easily expanded to accommodate copious amounts of requests and traffic. The question itself is one of those ones that can’t be answered simply, and no doubt the Rails afficionados will be rolling their eyes at the same question being raised yet again, as the vast majority of applications written with Rails will never grow to the size that will require them to scale. The Site5 Engineering team’s recent work on our new server monitoring and task management system Squire has made us have to look more closely into the scaling issue, and not just on a technological basis. With the massive growth that Site5 has been experiencing in recent months, it seems that some of the ways we used to take care of things with regards to server management just weren’t going to carry us through into the future.
The crux of my argument is this: when there is enough growth to warrant a re-evaluation of the scalability of an application, chances are that you’re going to have to re-evaluate your business processes in a similar way. Luckily, the Site5 Management team recognizes this, and have planned accordingly – in fact, most of the engineering team’s effort is being poured into future-proofing our fleet.
Site5 now has hundreds of new customers joining us every month and new servers being added all the time, so tasks that were once very simple for our support staff and system-admin gurus start to become more difficult. The off-the-shelf systems we use to monitor our server fleet start to become inadequate – they don’t provide enough detail on the sources of any issues occurring, and often require manual intervention from support staff to resolve. Though most issues take only a few minutes to resolve, a few minutes multiplied by a few hundred servers starts to become a big drain on resources. It might have worked before, it might still work today, but it won’t work in the future.
So enters our new server monitoring system Squire. This neat little piece of software (written completely from scratch by we of the Site5 Engineering Team) is a purpose built web-hosting monitoring system. It’s closely integrated with our hand-built Synco CRM system and even with Site5 Backstage to proactively monitor and gather detailed statistics on each machine in our fleet. It automatically detects and resolves common issues on our machines and instantly notifies the support team when a problem that can’t be resolved automatically is encountered. In addition, it provides the customer service staff and support staff with quick links to customer information should customers need to be contacted in the case of problems. In fact, we have a few little Backstage features in store to keep our customers that much more informed… stay tuned!
Our system administration and customer service teams work extremely hard to keep our server fleet healthy and our customers happy. With Squire we hope to not only make their lives easier, but also to keep our customers up to date, informed, and, most importantly: happy.

We’ve been having untold numbers of problems with the neysa server (http/apache down, usually) over the past 2 months. We requested a transfer of our sites to another server, and the support guy coyly responded “I think it is very unlikely your sites will be transferred.”
That was all he said. Might want to review some of your support responses… I was really surprised that he responded that way.
Other than that, Site5 is great!