Saturday, June 23, 2012

All Servers Die So Take Care of Your Customers

Adjix, Epics3, et al, server farm with two dead (vertical) servers.
Serving up nearly 100K unique visitors daily on 512 Kbps.

I wish that I could say Adjix died peacefully, in its sleep, but it was a sudden, unexpected death as it went down fighting.

RIP Adjix.

New Year's Eve 2012
I received a call from a couple of key users, on New Year's Eve afternoon, that my beloved Adjix was having problems. After the calls, I discovered that the Adjix database had unexpectedly stopped and needed to be restarted. We knew that the Adjix database didn't have much life left after the San Diego power outage of September 8, 2011 knocked out some servers in the cluster - but I wanted to keep Adjix running until the last possible moment.

After identifying the problem, during the afternoon of New Year's Eve, I restarted the database server and I could tell that it would take several hours to rebuild the indices and check the database integrity due to its large size. After all, you do want your database to be ACID compliant.

Then, ominously, just before midnight EST, the physical database server (hardware) stopped responding to any and all commands.

Today, six months later, I gained physical access to that server - the last one in that dying cluster - and this is the final entry from the logs (-0800 is PST).

2011-12-31 21:52:43 -0800: Rolling forward master [80.1% complete] 17154000 SQL transactions processed

Sure enough, the logs confirmed that, with less than eight minutes left before the New Year, the hardware stopped responding.

Future Proof
Although I never charged anyone to use the Adjix APIs, I still felt that everything I delivered should be future proof.

In theory, we can see that Adjix links redundantly work as expected:

But, I have to admit that I'm amazed to see, six months after Adjix died, that others' Adjix links are still alive and kicking:

(Note: is the ultra-short version of

Not only did Adjix automatically store every single shortened URL in its own Amazon S3 bucket, but Adjix also gave every user the opportunity to store the Adjix shortened link in their own S3 bucket.

Applying Lessons Learned
In early 2010, I launched a photo sharing service, Epics3, which allowed users to share photos on Twitter and Facebook as well as store any photo they uploaded to their own Amazon S3 bucket. Just like Adjix, this was a simple solution to implement resulting in two separately owned and operated servers redundantly serving up the same content.

As an example, the first of the following links is dynamically generated from the Epics3 server and the second one is served statically, directly, from an Amazon S3 bucket.

That, my friend, is future proofing. You'd think, after four years, that it would be more popular.

On New Year's Eve 2012, Adjix died. Long live Adjix (and all other customer data).

No comments: