Sunday, January 8, 2012

Poor Man's Performance Monitoring

Amazon has been great at rolling out high end (enterprise class) web services which don't require expensive or long term contracts. Instead of signing a contract, Amazon Web Services are "pay by the drink" which literally means that a monthly bill can be less than a dime. You simply pay for storage and bandwidth used.

But, I've noticed one, key, service that Amazon has yet to roll out: website performance monitoring.

CDN
Five years ago, if your website had to deliver large media files, such as audio or video, then your best option was (and still is) a content delivery network (CDN). A CDN distributes copies of files all over the world so that they're as close as possible to the users which means faster download times. The problem with CDNs of the past, like Akamai, is that it's a service for the big boys requiring long term, expensive, contracts. Of course, when you spend thousands of dollars each month you do get enterprise class support and dashboards, but, many smart, lean, companies can develop their own dashboard specific to their needs.

A couple years ago Amazon began selling their own CDN service, called CloudFront, with no contract. CloudFront, like all Amazon Web Services, is "pay by the drink," so that any startup could deliver video and music just as fast as Netflix streaming movies or the Apple iTunes Store. Just like Amazon now offers a CDN web service, I can see them offering a performance monitor web service in the future.

Performance Monitoring
Performance monitoring is a way to monitor the status and performance of a website. Is the website up or down? How long does it take for your website to load for users in New York, L.A., and Paris? Are there any bottle necks causing delays?

There are two basic ways to monitor performance: internally and externally.

External Performance Monitoring
External performance monitoring would be at home as an Amazon Web Service. There are no shortages of companies providing external website performance monitor services and dashboards for hundreds or thousands of dollars per month. The services these companies provide are very complex and many times they offer more than any one company requires. Also, a company paying for these services will require a team of employees to gather and process the performance reports.

External performance monitors usually run automated scripts which follow a critical path, such as logging into an e-commerce website and making a purchase. These scripts can either run on simulated web browsers, in a data center, or on actual web browsers in homes or offices so that they can simulate the "last mile" performance. The latter can be very helpful when companies try to troubleshoot seemingly "strange" problems such as why a website has problems loading only for some customers and not others. Is the problem due to a misconfigured router, a misconfigured DNS server, etc?

Since Amazon already has points-of-presense all over the world, they would have little problem setting up an external performance monitoring service API.

Internal Performance Monitor
Even if a small company doesn't have the financial means to afford an external performance monitoring service, they should still implement simple internal performance measures on their servers. Off the top of my head, I can think of two simple measures which should be implemented from the get-go: web server segregation and code timers.

Web Server Segregation
Static web server resources should be segregated from web servers that serve up dynamic web pages. If this can't be physically done by using separate hardware, then use different host names and web server logs for dynamic versus static resources. If you only have one web server, then simply move all of your static web resources to Amazon's S3 web service.

Monitoring your dynamic web server logs is one way to determine what your customer's UX is like. I use Apache with the following log format:
%h %l %u %t "%r" %>s %b "%{Referer}i" %T

The most import part of this log format, for performance monitoring, is the last part, %T. This shows how long it took for the dynamic page to be requested, generated, and served to the client's web browser. Here's an example from an Apache log:
ool-182e94c5.dyn.optonline.net - - [08/Jan/2012:16:45:59 -0800] "GET / HTTP/1.1" 200 66321 "-" 2

The last number, 2, tells me that it took two seconds for this dynamic web page, which is 66,321 bytes long, to be generated and downloaded to the client's web browser. Historically, that number is less than one second in my logs, so seeing two seconds warrants further investigation which is where code timers come into play.

Code Timers
Code that is executed which could take longer than a second or requires "heavy lifting", such as generating reports, should be timed. All API calls to third parties, i.e. Twitter or Facebook, should also be timed and logged.

Here's the source code for a simple Timer.java utility class based on Apple's WebObjects app server.

The following Java method, which verifies a Twitter user's credentials, is used in actual production code. It demonstrates the timer's simplicity. The key to using the timer is that it's started at the beginning of the method and then the timer's results are logged just before returning from the method. The next step would be to log the high, low, and average response times and report them.

public static String twitterUsernameForUser(User user)

{

Timer timer = Timer.startNewTimer();

String twitterUsernameForUser = null;


Twitter twitter = new Twitter();

twitter.setOAuthConsumer(System.getProperty("twitter4j.oauth.consumerKey"),

System.getProperty("twitter4j.oauth.consumerSecret"));


AccessToken accessToken = new AccessToken(user.twitterOAuthToken(),

user.twitterOAuthTokenSecret());

twitter.setOAuthAccessToken(accessToken);


try

{

twitter.verifyCredentials();

twitterUsernameForUser = twitter.verifyCredentials().getScreenName();

} catch (TwitterException e)

{

e.printStackTrace();

NSLog.debug.appendln("Caught exception. Invalid twitter credentials.");

}


timer.stop();

NSLog.debug.appendln

("OAuthTwitterUtilities.twitterUsernameForUser() elapsed time: "

+ timer.elapsedTime());


return twitterUsernameForUser;

}



But, even if Amazon offers a web performance monitoring service, you should still segregate your web server and time your network API calls.


No comments: