Sunday, May 10, 2009

Amazon Web Service's Simple Storage Service: The 99% Solution

Amazon Web Services (AWS) Simple Storage Service (S3) was launched about three years ago as a simple way to store files on Amazon's servers. Each file, also known as an object, can be up to five gigabytes (GB) in size and the filename is referred to as the object's key. The beauty of S3 is that each object saved on Amazon's servers is backed up onto at least three different servers. Where exactly the object is stored is not important since any computer on the Internet can access it - this is commonly referred to as the cloud, since, just like a cloud, there are no hard boundaries.


Strengths
1. Uptime – S3 uses the same infrastructure that is used to host its own amazon.com website. While downtime isn't unheard of, it is extremely rare.

2. Hypertext Transfer Protocol (http) access – Accessing objects stored on S3 can be as simple as typing in the web address to the resource in your web browser. Storing and retrieving objects on S3 can be done programatically or graphically using a third party plug-in such as S3Fox: http://go.joemoreno.com/e5bf

3. Security – Objects stored on S3 can be made public, such as a web page, or private so that only a single user can access a specific object (great for backing up sensitive files). Additionally, items can be semi-private so that users can anonymously access an object, once they are authenticated, for a limited amount of time. A good use of this feature is to prevent other users from "hot linking" or "deep linking" directly to a photo or sensitive document on your website.

4. Domain Name Service (DNS) Access – Objects stored on S3 are placed into buckets which can be thought of as a folder or storage area network (SAN). An object in a bucket can be accessed in one of three ways:

a. http://s3.amazonaws.com/bucket/key
Update: This method only works for non-EU buckets.

b. http://bucket.s3.amazonaws.com/key

c. http://bucket/key where bucket is a DNS CNAME record pointing to bucket.s3.amazonaws.com. For this reason, bucket names must be unique throughout all of S3.

5. Price – Amazon only charges for its service based on use with no flat fees required. They refer to this as "Pay by the drink". Fees for storage, bandwidth, and access begin at $0.15/GB of storage, $0.17/GB of bandwidth, $0.01/10,000 GET requests and $0.01/1,000 PUT, COPY, POST, and LIST requests. For the first several months that I used S3 I received monthly bills for S3 totaling less than a dime (US$0.10). How Amazon can bill me that little and not lose money is a mystery. Here are copies of my first three bills from Amazon.

6. Bandwidth Throughput – While not advertised, the bandwidth throughput (speed) of objects served up by S3 is consistently in the 3-4 megabytes per second (MBps) [note: that is not megabits per second (Mbps), but megabytes per second - in other words, the throughput is approximately 24-32 Mbps]. Fast enough for nearly all users.


Weakness
Since objects stored on S3 are served up using http (and they can also be served up securely using https), it is possible to host an entire website on S3 for pennies/month with one small exception. Amazon has not implemented a way to serve up the default web page when a user initially visits a website.

The initial web page of a website is commonly named index.html. For example, when you visit cnn.com or apple.com, you are actually viewing http://www.cnn.com/index.html or http://www.apple.com/index.html. All web servers used for displaying websites automatically know how to serve up a default webpage - all but Amazon's S3. And users have been clamoring for this feature since 2006: http://go.joemoreno.com/esw2

Since a default object cannot be set to be automatically returned an entire website cannot be hosted on Amazon's S3 unless your users are willing to type www.example.com/index.html every time they want to visit your site - which, obviously, is not practical.


The 1% Solution
After grabbling with this problem since 2007, I finally implemented a solution allowing me to host an entire website on Amazon's S3 without requiring another web server to serve up the index.html file. While I did not discover this technique, it is certainly worth mentioning in detail since it is not very obvious.


Perfect Match: Amazon S3 & GoDaddy
Hosting an entire static website on Amazon's S3 requires a little DNS gymnastics, but its results can be seen here: http://www.adjixsucks.com

One of the first places a web browser checks, when a person enters a domain name in their web browser, is with the domain's registrar. The registrar is the company where the domain name was registered and this is how the domain name owner tells the world where to route users to find their website, route e-mail, etc.

Since the registrar is the first stop, this is the perfect place to intercept the request and send it to a website's index.html page.


How to do it
In order to forward a request to a website's index.html with GoDaddy you'll need to do a few things. For starters, since most people enter www.example.com or example.com into their web browser you'll have to store your entire website's files in an S3 bucket that is not named example.com or www.example.com. In my adjixsucks.com example, I choose to save the website in a bucket named web.adjixsucks.com - but you can call it anything, such as www1.example.com, static.example.com, etc.


Parked Name Servers
Once you've saved your website's files in an S3 bucket, you'll need to configure your domain name's DNS to use GoDaddy's "Parked nameservers" (ns11.domaincontrol.com and ns12.domaincontrol.com).




Total DNS
Under total DNS, you'll need to edit/add three entries for the "@", "www", and "web" hosts. Keep in mind that these updates can take hours to propagate throughout the Internet.

1. Under the A (Host) records, you'll need to set the @ host to point to GoDaddy's forwarding server (64.202.189.170).

2. Under the CNAME (Aliases) records, you'll need to add two CNAMES for www and web.




Domain Name Forwarding (back to your own domain)
Finally, you'll need to update the Forwarding section to your index.html. Once this it updated, it can take 20-30 minutes until the redirect is live.




When you're done, it should look similar to this:




Ta-Dah
When people enter your domain name into their web browser most won't even notice that the website begins with web instead of www. While this isn't a perfect solution, it's very close and the price is right: $10 for the annual domain registration and as little as a nickel or dime per month paid to Amazon for hosting your website.

Joe Moreno
President
Adjix
joes3@adjix.com
760.444.4721

6 comments:

andy said...

I always enjoy learning how other people employ Amazon S3 online storage. I am wondering if you can check out my very own tool CloudBerry Explorer that helps to manage S3 on Windows . It is a freeware.

ape2man said...

Dude your a life saver.

blog said...

Why not name your S3 bucket www.adjixsucks.com and make life even simpler... then you are only redirecting at "@" a record traffic...

Joe Moreno (@JoeMoreno) said...

ScrappyDog,

You can't set a domain's root name as a CNAME (only an A record) in most DNS system. Even if you could, Amazon S3 buckets don't allow you to set a default object (i.e. index.html) to be returned.
Since most people either type www.example.com or example.com into their browsers, there's a conflict which is why the bucket is named web.example.com.

But, please let me know if I'm overlooking something.

Sophia Guevara said...

Joe,
Thank you so much for posting this!

Joe Moreno (@JoeMoreno) said...
This comment has been removed by the author.