Wanted: URI Designer

Some regular readers know that, as part of my ongoing personal project to
improve my ‘web-tech’ knowledge, I have been re-reading Tim Berners-Lee’s "Style
Guide for online hypertext
" and other related materials. One of the common
messages from documents on this topic is the importance of well-composed and
maintained web addresses or URIs. This got me thinking about (and paying more
attention to) the common web addresses that I see in my browser. I must say, I
don’t care much for what I see.

What’s bad about common URIs today?

Too often the URI I see in my browser address line is gibberish. Just try
visiting any of the top news sites (news.yahoo.com, www.msn.com, news.google.com,
etc.) and click on any of the links on the page. Usually the URI contains
additional state information (?x=13&y=29&_docid=DUsor93FH). Almost always, the
URI contains company or technology-specific information (page.aspx, document.jsp,
article.cfm, product.php). And almost never could I share the web address with a
friend by merely speaking it. This is all bad.

So what is the definition of a good URI?

The W3C has an excellent document called "Common
HTTP Implementation Problems
". In it there is a section devoted to "Understanding
URIs
." This section reads like a ‘best practices’ list for creating solid
URIs. I urge everyone to take a few minutes to read though it and to bookmark it
for future reference. I’ll lift two quotes from that document to clarify the
need for good URI design when deploying web applications.

Here’s the first quote: "A URI is a reference to a resource, with fixed and
independent semantics."
This sentence has quite a bit packed into it. For
example:

  • "A URI is a reference…" In other words while a URI points to something,
    that URI is not a serial number.
  • "…with fixed [semantics]…" This means that the URI does not change
    over time. Changing a URI breaks other people’s links to that resource – bad
    stuff.
  • "…with independent semantics." This means that the URI stands alone. It
    does not depend on state information such as cookies or session state.

So, a URI is a pointer so something. That pointer never goes bad, and that
pointer stands alone (or, put another way, is easily shared).

Here’s the second quote: "A common mistake … is to think [that a URI] is
equivalent to a filename within a computer system. This is wrong. URIs have,
conceptually, nothing to do with a file system."
This might come as a shock
to some web programmers. It is so easy to expose physical folders and files via
a web server that, by default, most web sites simply reflect the file structure
behind a web domain root. This, too, is bad stuff. Move a file, and the URI
breaks.

Ok, so URIs are non-changing, stand-alone pointers and *not* reflections of
folder and file structures on disk. Maybe we do need a URI design!

URIs are web queries

Once you get over the idea that URIs are not physical files in folders, you are
free to start thinking about what URIs really represent. In my mind, a URI is a
‘web query.’ By typing a URI, users are ‘looking for something’ out on the
Internet. By now, most web users understand that there are up to three parts to
a URI query:

  • the server name or domain (www.someserver.com)
  • the folder name or location (/articles/2006/)
  • the document name (learning_to_program.html)

I suspect that most users do not think very much about the above details, but
most intuit them as they surf. I am often especially surprised by the
sophistication of young web surfers. I have observed children who are quite
happy to ‘hack’ away at a URI in order to find a document. They truly use the
browser address line as a search tool!

Anyway, if you accept the idea that a URI is (in some fashion) a web query, then
you are free to actively *design* the URIs for your web application to support
this kind of use. To paraphrase the words of Tim Berners-Lee, you can make your
URIs ‘hack-able.’

Creating hackable URIs

What’s a hackable URI? In its simplest form, it’s a URI that can be easily
modified by a user in order to get a valid result from the same server. The most
common way to think about hackable URIs is to make sure that all sub-parts of
the URI return a valid document. As an example, the URI "www.myserver.com/content/programming/tutorials/hackable_uris.html"
has several sub-parts. Users who ‘land’ at this location should be able to ‘lop
off’ parts of the URI and get helpful results. That means that the URI "www.myserver.com/content/programming/tutorials/"
should return something – maybe a page that lists all tutorials. And "www.myserver.com/content/programming/"
might return a list of programming article classes such as "tutorials," "reference,"
"bookreviews," etc. And so on.

But creating hackable URIs doesn’t mean just supporting sub-parts. It could also
mean using a user-friendly URI scheme that actually *invites* URI hacking. For
example, what can you assume if you land at a URI that looked like this?

www.contentserver.com/archives/2005/11/03/dailyupdate.html

Not only can you assume that you can get valid documents at each sub part. You
can also assume that you can change the value of some sub-parts to discover new
documents, right?

So how do you implement a URI design?

Once you start thinking about a URI design pattern that works for your site, you
need to come up with a way to implement it. In the past, web programmers would
start creating folders and files to match the stated design. This is not the way
to go about it. Instead, web programmers should design a server-side scripts
that can scan the incoming URI, treat it as a request query and assemble a
response accordingly.

For example, given the following query:

www.authorserver.com/fiction/poetry/

A server might return a list of poets. Users might also assume that they can get
lists of other authors by changing the URI like this:

www.authorserver.com/fiction/shortstories/
www.authorserver.com/fiction/novels/

The point is that web servers should be able to do more than just serve up
documents from a physical folder tree.

One way to do this (using ASP.NET, for example) is to use the Uri.Segments
collection to inspect and parse the URI. Here’s a trivial example.

Given the URI www.server.com/archives/2005/12/ a server could create
a query against a database table called "archives" for a list of documents added
to the system in December of 2005.

Here’s some code to parse the URI:

<%@ page %>
<script runat="server" language="c#">

    void Page_Load(object sender, EventArgs args)
    {
        string webaddress = "http://www.server.com/archives/2005/12/";
        Uri thisUri = new Uri(webaddress);
        int segcount = thisUri.Segments.Length;
        string output = string.Empty;

        output = string.Format("<p>webaddress:<br />{0}</p>",webaddress);

        // get segments
        output +=string.Format("<p>segments:<br/>");
        for(int i=0;i<segcount;i++)
            output+=string.Format("{0}: {1}<br />",i,thisUri.Segments[i]);
        output +="</p>";

        // format data query
        string table = thisUri.Segments[1].Replace("/","");
        string yr = thisUri.Segments[2].Replace("/","");
        string mo = thisUri.Segments[3].Replace("/","");
        string query = "select * from {0} where yr={1} and mo={2}";

        output+=string.Format("<p>query:<br />"+query+"</p>",table,yr,mo);

        // show results
        Response.Write(output);
    }

</script>

And here’s the output created by the above code:

webaddress:

http://www.server.com/archives/2005/12/

segments:
0: /
1: archives/
2: 2005/
3: 12/
4: 31/
query:
select * from archives where yr=2005 and mo=12

Submitting the above query might return a data set that could be formatted into
an HTML page containing a series of links for the user to explore.

OK, I get the idea, but there’s more to it, right?

Well, yes. Knowing that URIs are static, independent resource pointers that
should be ‘hackable’ by users and that ASP.NET has features that allow you to
parse URIs into parts that can be used to create data queries is just the
beginning. But you can use this information to create a more flexible and long-
lived URI design for web apps. And with a URI design in place, you are no long
dependent on the existence (or lack there-of) of physical documents within your
web.

There are also a number of other operations needed to support a good URI design.
While good URIs don’t change, content does. Well-implemented URI responders will
need to handle moved documents (HTTP 301 and 302 events) through a lookup table
or some other means. Also, once you start to train users to ‘hack’ URIs at your
server, you’ll need to add improved support for 4xx (not found) and possibly 5xx
(server error) events to tell users when their creative URIs fail.

In a future article, I’ll outline a URI design that I’ve been contemplating for
some time. I also plan to share my implementation for this new URI design
sometime soon. But don’t wait for me. Start designing and implementing your own
backable URIs!


Technorati Tags

I tag my posts for easy indexing at Technorati.com


One Response to “Wanted: URI Designer”

  1. Phil Says:

    Not to mention that many unscrupulous people buy up domain names that are used in books as examples and then turn them into porn sites.  That’s why using the ones specifically set aside for examples is a good idea.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.