Sunday, June 13, 2010

Handling CNAME Web Requests

I was recently playing around with DNS CNAME aliasing and I began to wonder how different sites handle an incoming request which has a different host header than the one that was expected.

Virtual Hosting
In the early days of the web (before the mid-1990s) it wasn't always possible to host multiple domains at the same IP address unless the web server supported virtual hosting. Virtual hosting allows a server at a single IP address to analyze the host header of each request to properly serve up the correct web page. This is a brilliant solution and Amazon's S3 web service makes excellent use of this technique so you can use your own domain name to serve up content from S3.

For example, here is a static webpage, hosted on Amazon's S3, but you'd never know where it's hosted by looking at the URL. One way to find out that this is hosted on S3 is to use the dig or host command from the command line:

[jmoreno@ ~]$ host web.joemoreno.com

web.joemoreno.com is an alias for web.joemoreno.com.s3.amazonaws.com.

web.joemoreno.com.s3.amazonaws.com is an alias for s3-directional-w.amazonaws.com.

s3-directional-w.amazonaws.com is an alias for s3-2-w.amazonaws.com.

s3-2-w.amazonaws.com has address 207.171.185.131


CNAME to Another Website
This got me thinking, "What if I pointed my own host name at another website?" This would be less like framing another website (via an HTML frame or iframe) and more like hyperlinking to other's content.

So, I tried it out with three popular sites and each handles it differently.

CNN
http://news.joemoreno.com
CNN doesn't appear to look at the host name header for the incoming request and simply serves up its content. It seems that the only problem this presents is when content is served up via Flash such as ads and video. In other words, Flash ads and video are broken when the host name isn't cnn.com. Since the links on the CNN website are relative, the host name in the web browser doesn't change when clicking on other cnn.com links.

NY Times
http://nytimes.joemoreno.com
The NY Times also doesn't look at the host name of the incoming request to see if it's nytimes.com or www.nytimes.com. However, the NY Times uses absolute URLs on its website so clicking on any link clears out the previous host name and replaces it with www.nytimes.com.

Twitter
http://twitter.joemoreno.com
Twitter handles this issue perfectly. Their web server looks at the host name of the incoming request and, if it's not twitter.com, it returns a 301 redirect to twitter.com while keeping the rest of the request intact.

Legal Issues
I spoke with a couple attorneys who specialize in Internet law to see if this has ever been an issue. They were not aware of any cases where the CNAME aliasing was challenged in court. The most similar case was in 1997 when the Washington Post sued Total News, Inc. since the latter was framing the former's news content. However, a court decision was never reached since it was settled out of court a few months later.

Regardless of the lack of legal challenges, it's possible that a company would be concerned about brand dilution. However, the issues with HTML framing, CNAMES, etc. would most likely be solved by implementing a simple and inexpensive technical solution instead of suing.

Solution
Some companies might not like another website aliasing their website without explicit permission and others might not care. In practice, the deciding factor would be lost revenues or brand damage. Solving this problem is much like preventing someone from framing, deep-linking or hot-linking into your website. The solution is to look at the referrer of each web request and change it if it's not what it should be.

Conclusion
The benefits of aliasing another website, via a CNAME, without them knowing isn't clear. Although many sites will frame other's content without them knowing, the web site that's the target of the framing can simply break the frame with just a single line of JavaScript embedded in the page's HTML:

<script type="text/javascript">
if (parent.frames.length > 0)
{ parent.location.href = location.href; }
</script>


A very similar JavaScript could be written to simply look at the request's host name. If it's not the correct host name then reload the page with the correct host name (although I haven't tested this theory).

This observation is simply offered as a proof of concept.

2 comments:

yelvington said...

This really has nothing to do with CNAME (canonical name) in DNS. You also can use A records or put any arbitrary string in your computer's hosts file and map it to any IP address and achieve the same effect.

The question is whether the Web server responds to a request that has an HTTP/1.1 Host string that doesn't match what the server expects.

This is a fairly minor server configuration matter.

All modern browsers and servers support HTTP/1.1, which introduced the Host field to enable name-based virtual hosting with multiple websites on a single IP address.

When you find a website that accepts such a request, you've merely discovered an application of Postel's Law.

Joe Moreno said...

yelvington,

Yup, I think we're in agreement that one (and probably the best) technique to solve this problem is to configure the web server.

You also raise a great point that it would, at least in theory, work with an A record too. But, in practice, it probably wouldn't work very well since many large web sites use load balancers with multiple IPs. S3, for example, uses a DNS CNAME TLL of about 30-60 seconds for each IP.

An A record that works now might not work in five minutes. This leads to a point of frustration which Dave and I have encountered with most DNS servers which is that you can't configure a CNAME for the root of a domain (i.e. example.com 3600 IN CNAME example.com.s3.amazonaws.com).

- Joe