As part of my current project to implement XML-driven Web solutions, I am re-reading Tim Berners Lee’s Style Guide for online hypertext for inspiration. One of the topics covered is called Cool URIs dont change. Most of it relates to planning and implementing hackable URIs (more from me on that soon). But, in a footnote called "How can I remove the file extensions…" the topic of content negotiation comes up. I was reminded of how nice it would be to be able to support c-neg for my IIS-hosted projects.
What is content negotiation?
Content negotiation (c-neg for short) is the process where servers and clients negotiate with each other to decide exactly which file or file format will be sent from the server to the client. Typically, c-neg focuses on selecting the right language for a browser or identifying a clients form factor (hand-held device) or the extent of its graphics capabilities (only supports black and white images, etc.).
However, c-neg has lots of other possible uses. For my purposes, I want to be able to know when a client is asking for stylesheets, images, or standard markup content.
Why use content negotiation anyway?
One of the big reasons for using c-neg is to hide some of the internal details of the web server tech from users. For example, if you could drop the tailfrom all file requests, users would not need to know whether your site is using HTML, ASP, ASPX, JSP, CFM, etc. in order to find a page at your site. Theoretically they would just need to know the title of the document or the topic.
For example, typing http://cool.server.com/shopping_list might return the document shopping_list.html, or shopping_list.asp, etc. depending on what the server has available. Users need to worry about the tail at all.
Even more to the point, hiding the tails can protect users when the hosting server switches technologies. For example, when I switched my servers from ASP to ASPX, I basically nullified all my URIs from the past since all my links included the .ASP at the end of URIs. Had I been using c-neg for all documents, changing from ASP to ASPX would not have affected users at all and all my links would still work.
How does content negotiation really work?
The process of c-neg is pretty straight forward. When a client (Web browser) makes a request to a server for some resource (document, image, etc.), that clients sends some additional information in the form of headersthat help detail the type of document requested and the preferred or supported formats for that client. For example, when a Web browse asks for an image, it can tell the server it prefers PNG over GIF. Or, if it is a hand-held cell phone, it might tell the server that it can only accept black and white BMP image format.
On each request, the server inspects these headers and, if allowed, can return the best format for the client. Note that I used the phrase if allowed. More on that below.
Some ugly truths about content negotiation
When it comes to negotiating document types and formats, the Accept-Header is the string of information sent by clients to tell servers what the client prefers. And the Mozilla family of browsers (Netscape and FireFox) do an excellent job of sending detailed format information with each request. For example, text/css, image/png, text/html, etc. are all examples of Accept-Header information sent by FireFox when negotiating with a Web server.
But MSIE is pretty awful at this. In fact, for as far back as I can document (at least MSIE 4), MSIE has sent the same inadequate Accept-Header for *every-single-request* – no matter what the resource type (css, javascript, html document, image, etc.). Without going into the really nasty details, MSIE makes it very difficult for servers to make decisions on what to send to MSIE clients.
So folks who want to create hackable URIs that support c-neg *and* work with MSIE, have to resort to some compromises and a few server hacks[sigh].
How do I support MSIE and still use server-side content negotiation?
Even though you cannot count on MSIE to give adequate information on content-types when requesting documents, you can modify your URIs slightly to give your Web server strong hints. The method I settled on is the same one used by the W3C.org site and many others. I decided to place certain document types in similar folders.
For example, all stylesheets (*.CSS) will go in a folder named /stylesheets/. All image files (*.PNG, *.GIF, *.BMP, etc.) will go in a folder named /images/. Client scripts (*.JS,*.VB) will go in /scripts/, etc. Now, when a client asks for a document, the server can use part of the name as a hint. For example, a request for http://cool.server.com/stylesheets/default will allow the server to return default.css (if it is available).
An even better example is in the case of images. If the server gets a request for http://cool.server.com/images/logo, the server might look for logo.png and, if it exists, return that. If log.png does not exist, the server might look for logo.jpg or logo.gif instead. Finally, if the current site uses only JPG files, but next year converts to all PNG format, all the URIs will still work just fine.
OK, so how do you implement c-neg for IIS?
My example of c-neg implementation for IIS is (admittedly) basic, but you should get the idea. Specifically, the implementation outlined here focuses only on standard Web browsers and ignores the details of supporting hand-held devices, etc.
First, I whip out my trusty ISAPI Rewrite tool to establish some rules for supporting stylesheets and image files. If you dont already use ISAPI Rewrite or some other utility, you can get a free version of ISAPI Rewrite from the Helicon’s web site.
Below are two rules I added to my httpd.ini file:
# route any css requests
RewriteRule (.*)/styles/(.*) $1/stylesheets/$2.css [I,CL,L]
# reroute any image requests
RewriteRule (.*)/images/(.*) /imagehandler.ashx?_file=$1/images/$2 [I,CL,L]
Note that in the first rule simply adds .CSS: to the end of any resource request that has /stylesheets/ in the URI. Examples are:
http://cool.server.com/stylesheets/default
http://cool.server.com/myapp/stylesheets/main
The second rule is a bit trickier. Any request that has /images/ in the URI will be rerouted to a special handler that will look for the proper file and send that to the browser. I wrote the handler in C#.
Below is a snapshot of the main code loop for my imageHander:
public void ProcessRequest(HttpContext context)
{
string file = string.Empty;
string mimefile = string.Empty;
string rtnfile = string.Empty;
string[] tails;
string[] accepts;
ContentTypeCollection ctcoll = new ContentTypeCollection();
// get the file to find
file = GetQueryItem(context,"_file");
if (file.Length == 0)
return;
// get the list of tails and accepttypes
tails = GetConfigList("imagetails");
accepts = context.Request.AcceptTypes;
// get assoc tails for accept-types
mimefile = GetConfigItem("mimefile");
if (mimefile.Length != 0)
{
mimefile = context.Server.MapPath(mimefile);
if (File.Exists(mimefile) == false)
CreateMimeTypesFile(mimefile);
FileStream fs = new FileStream(mimefile, FileMode.OpenOrCreate, FileAccess.Read);
ctcoll = (ContentTypeCollection)Serialization.Deserialize(ctcoll, fs);
fs.Close();
fs = null;
}
// go through accept-types first
for (int i = 0; i < accepts.Length; i++)
{
if (accepts[i].IndexOf("image")!=-1 && accepts[i].IndexOf("*") == -1)
{
string tail = ctcoll.GetExtension(accepts[i]);
if (tail.Length != 0)
{
rtnfile = string.Format("{0}.{1}", file, tail);
rtnfile = context.Server.MapPath(rtnfile);
if (File.Exists(rtnfile))
{
SendFile(context, rtnfile, accepts[i]);
return;
}
}
}
}
// no go through pref tails
for (int i = 0; i < tails.Length; i++)
{
rtnfile = string.Format("{0}.{1}", file, tails[i]);
rtnfile = context.Server.MapPath(rtnfile);
if (File.Exists(rtnfile))
{
SendFile(context, rtnfile, accepts[0]);
return;
}
}
// we failed!
return;
}
There are a lot of loose ends in this code-snippet, but you probably get the idea. By installing this handler at the root of my IIS Webs, I now have basic c-neg support for most standard browsers. And there is quite a bit more that can be done to improve the flexibility and power of this routine – just takes a bit more coding[grin].
Summary
So, to build cool, long-lived URIs, you should use links that hide the technologies on the server. In the case of text documents, this can easily be done using a utility like ISAPI Rewrite. For images and other format-driven URIs, you will need to implement a server-side HTTP handler to work out the details of which format to send to the browser.
Implementing basic content negotiation is the first step toward creating solid URIs. My next step is to create a rational URI scheme that can live over a long period of time. More on that later.
Technorati Tags
I tag my posts for easy indexing at Technorati.com