Archive for February, 2006

ISAPI Rewrite to the Rescue

February 27, 2006

I’ve been using ISAPI Rewrite for the last three years to improve the quality of web site URLs. Its become such an important of my web work that its one of the first things I tell my clients to add to their server toolkit. Infact, I continue to be amazed (and frustrated) that Microsoft has not included a powerful URL rewriter tool with every copy of Internet Information Server.

Now, Ive seen lots of examples of using URL rewriters to hide query string arguments from search engines. Ive also seen a few examples of how to use URL rewriters to route users to special sites based on language or browser types. But there are several other important reasons to be using URL rewriters in your web deployments.

Here are a couple of rewriter rules that I add to almost every public web site I am involved in.

Putting the WWW back in the World Wide Web

Most public web sites respond to both http://www.mydomain.com and mydomain.com. Ive gotten into the habit of *not* typing the www. since I know the plain address will work just fine. It means the same to me. However,it turns out that most search engines (Google, Yahoo!, MSN, etc.) do not treat these two addresses the same. Most engines will track links and indexes to both locations. If youre focusing on your web sites ranking with these search indexes, having both addresses can be a problem. To fix this, I use a simple rewriter rule to automatically re-route browsers that present mydomain.com to http://www.mydomain.com. This allows lazy users (like me) to keep typing the short name, but still get the full name in reply.

Heres the rule I use with ISAPI Rewrite:


# force proper subdomain on all requests
RewriteCond %HTTP_HOST ^mydomain.com
RewriteRule ^/(.*) http://www.mydomain.com/$1 [RP,L]

Now, users will always see the full name. And keep in mind that some of the most important users of your web site are search engine spider bots!

The ZIPmouse Internet Directory is one of the companies Ive worked with that has this rule in their rewriter set. Try typing http://zipmouse.com in your browser and see what happens.

Fixing the Missing Slash

One annoying problem with web sites is how they handle missing slashes in URLs. For example, look at this address:


http://www.mydomain.com/members

Usually, what users want is the default page in the members folder of the site. They just forgot about the trailing slash. I use the following rewrite rule to automatically add the trailing slash:


# fix missing slash on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http://$1$2/ [I,R]

Heres another example from the ZIPmouse site: http://www.zipmouse.com/city/seattle

Dropping the Default

Finally, heres a rule I really like to use for public sites. Ill present it first to give you a chance to think about it.


RewriteRule (.*)/default.html $1/ [I,RP,L]

We all know that typing just a folder will force the web server to return the default document. This rule does the opposite. It checks the URL for the registered default page for the site and strips the URL down to only the folder. A good rewriter file would probably do this for all the typical defaults registered on the server:


RewriteRule (.*)/default.htm $1/ [I,RP,L]
RewriteRule (.*)/default.asp $1/ [I,RP,L]
RewriteRule (.*)/default.aspx $1/ [I,RP,L]
RewriteRule (.*)/index.htm $1/ [I,RP,L]

The ZIPmouse directory uses home.html as the default page for their site. You can use the following URL to test the above rule.

http://www.zipmouse.com/shop/computers-and-internet/home.html

Summary

So there are three handy URL rewrite rules that can improve the look and feel of your web sites URLs. If you are not using a URL rewriter yet, I encourage you do start. You can download a free for non-commercial version of ISAPI Rewriter for Microsoft server from their website. There are other rewriters out there, too.


Technorati Tags

I tag my posts for easy indexing at Technorati.com


Hiking the Golden Gate Bridge

February 25, 2006


Hiking the Golden Gate Bridge

This past fall, I got a chance to spend several days relaxing in on of my favorite US cities – San Francisco. One of the fun things was to hike across the Golden Gate Bridge. I took public transportation from my hotel to the foot of the bridge and was able to hike across to a visitor center, enjoy the view and return. Couple hours, lots of fun.

Next time you’re in SanFran, take an afternoon to enjoy the view from the bridge.

my flickr profile

tags:
,
,
photo

Military Misfortunes Author Interview

February 24, 2006
I’m in the final prep for my INETA talk for WKDNUG at Murray, KY next week. While trolling the Web for references, I found an NPR interview with Eliot Cohen, one of the authors of Military Misfortunes.  This book is the ‘jumping off point’ for my talk titled ‘Implementation Misforturnes or Why Some Well-Designed IT Projects Fail.’
 
Even though the book was published in 1991, the material is timeless. Also, like so much that comes from historical research at military colleges, the key points are quite applicable to business.
 

My New Formula for Web 2.0

February 20, 2006

I’ve been working on several fronts to get a better handle on Web 2.0 and related items. As a result, I’ve developed a ‘formula’ – a kinda of shorthand mission statement – that describes what I think Web 2.0 means for the ‘geeks’ among us who need to implement Web 2.0 solutions. And that formula is:

(XHTML+CSS2) * JS
------------------ = Web 2.0
XML+XSLT+RDBMS

Now for the explanation…

Like any web solution approach, there are two perspectives: Server and Client. In my formula the client perspective focuses on markup, layout, and scripting. The server perspective focuses on XML (marking up data), XSLT (transforming that data into a useable form), and RDBMS (storing the data that is fed into XML documents for XSLT transformation. More about my thinking follows.

XHTML

Any serious attempt to build Web 2.0 solutions should start with fully validated XHTML. No slacking on this one. We need to start from a clean slate. Along with XML-validated HTML markup, we need to drop the habit of using tables to control layout. We also need to stop adding font/color and other style information directly to the markup. That leads to the next item in the formula.

CSS2

CSS2 achieved Recommendation level at W3C in 1998. Yep, 1998. Yet some browsers still don’t fully support CSS2 features. In addition, many high traffic web sites still haven’t adopted CSS2 as the default standard for controlling layout and style for (x)HTML documents. The really depressing news is that the W3C is already working on CSS3! It’s time to bite the bullet and commit to using CSS2 as the default layout and styling service for online documents on the web.

JS

JS means JavaScript. Ecma International (the group formerly known as ECMA) is responsible for maintaining and advancing the JavaScript language. The current version (1.5) also known as ECMAScript was approved in 1999. I must admit, I thought JavaScript was a fading dinosaur. But, with the rise of Firefox and the XUL engine that drives it, JavaScript has continued to flourish. Now that Ajax is becoming a key component of leading edge web solutions, there seems little reason to consider JavaScript to be on a downward slide.

Lots can be said on the subject of making good use of JavaScript, but for now, I will point out that only recently have I seen good examples of object-oriented approaches to building JavaScript solutions. And most of those have come from folks already drinking the Web 2.0 punch. More emphasis needs to be placed on building clean, powerful JS objects and using them to animate the user interface.

BTW – Ecma International started work on the next version of JavaScript (referred to as ECMAScript for XML) in 2004. No telling how long it will take before we see that as a common scripting option in browsers.

XML

Not much needs to be said here except that, for my mind, all data should be presented in XML form. Regardless of where is used or how is it stored, data shipped around the web should be annotated – marked up. Most XML tutorials focus on XML data stored as physical files. This makes for easy tutorials, but not-so-good solution implementations. In fact, valuable data will almost always be stored in databases of some time, usually relational. But that’s another element (see below).

Again, Ajax solutions are already taking advantage of this idea by using the XMLHttpRequest object to pull XML data from servers into client browsers for manipulation. More of this needs to be done – including on the servers themselves.

XSLT

I’ve used XSLT as part of the formula, but that’s a bit misleading. In fact, XSLT is just one of the related technologies I consider crucial for dealing with XML data. XSLT is needed to transform XML documents into usable forms. XPath is needed as a way to filter and modify the XML data. Finally, XSL-FO can be used to help format the output. For now, I want to concentrate on standard XSLT to produce XHTML. However XSL-FO was originally conceived as a way to produce XHTML. Up to now, XSL-FO has become synonymous with creating PDF documents from XML.

the point here is that XML data requires transformation and XSL/T should be the key to solving that problem. Even though XSL 1.0 reached Recommendation status in 2001, I still see way too many examples of XML DOM-grepping to pull out needed data and present it for users. Some of this is due to limitations (i.e. client browsers with poor or nonexistent XPath support), but much of it is also due to just plain not getting on board with the technology. It is important to commit to using declarative tools like XSLT and XPath to transform data effectively and efficiently. Both on the server and the client.

RDBMS

Maybe some are surprised to see an ‘ancient’ term like RDBMS in a document about building Web 2.0 solutions. But the truth is, most important data is, and will be for the near future, stored in relation database systems. Sure, there are some object-oriented, even hierarchically-oriented data storage systems in use today. The disk files system is probably the best-known hierarchical model. However, we can’t deny that businesses and even individuals understand and use relational models to store information. And this is a good thing.

At the same time, we need to start requiring the RDBMS model to ‘step it up a notch’ and start supporting the XML+XSLT approach to shipping and presenting data. Most of the big RDBMS tools today support presenting queries as XML output. And some have decent tools for accepting XML data as input for inserts, updates, and other RDBMS tasks. It’s time we all started taking advantage of these features and began demanding more of our RDBMS vendors. For now, we need to commit to always getting our data requests in XML form and always sending XML documents as part of our data update tasks.

So What?

So, what happens when you start using XSLT to start transforming XML data stored in RDBMS and then use XHTML and CSS2 to build solid user interfaces to access that data, *and* use JavaScript 1.5 to animate those interfaces? You have Web 2.0!. This ‘formula’ works no matter what technology or platform you are working with. All these standards are open. None of them assume an OS or proprietary service layer. Of course, none of this is new, right? The technologies and standards have been around for many years. There are already lots of folks doing some parts of this – a few doing it all.

But – to be blunt – *I’m* not doing all this yet. And I should be. I would suspect there are many more out there not yet committed to this kind of formula on a fundamental level. And I would guess some, if not most, of them would like to be doing it, too. That’s what this article is all about.

Over the next several weeks and months, I’ll be working to build a basic infrastructure to support this formula. This will include a server-side runtime built to present XML data from RDBMS transformed via XSLT. It will also include XHTML markup documents modified via CSS2 and animated by JavaScript. In the process, I hope to show how small, but meaningful, changes in the way we think about and implement solutions, can have a big impact on the final results.


Technorati Tags

I tag my posts for easy indexing at Technorati.com


WKDNUG Selects Implementation Misfortunes Talk

February 20, 2006

I recieved notice this past week that the folks at WKDNUG in Murray, KY have selected my new "Implementation Misfortunes" talk for my visit on February 28th, 2006. This talk is loosely based on the book "Military Misfortunes" by Cohen and Gooch. It should be a lively talk.

I am looking forward to visiting Murray, KY. while I’ve been in the area a few times for vacations, this will be the first time I will spend time ‘working’ in Murray. My travel schedule will be interesting, too. It turns out Murray is about two hours from any sizeable airport. Kentucky’s own geographical oddity, I guess. Anyway, I’ll fly into Nashville, TN then rent a car and drive two hours northwest to Murray. Should be a real ‘trip.’

Technorati Tags

I tag my posts for easy indexing at Technorati.com


Yahoo! doing some cool stuff!

February 15, 2006

I know Google has been getting lots of buzz but I have also noticed a number of interesting activity in Yahoo! land recently.

Opening up their UI library

I saw a post on O’Reilly Radar titled Yahoo! Open Sources UIs and Design Patterns. Turns out Yahoo! has put a good deal of work building some standard widgets (calendar & treeview) along with a set of common javascript APIs for handling Ajax, events and other DOM details. And they want to give it away! Good stuff!. They are even publishing thier own Yahoo! User Interface Blog to help web-heads dig into the API. I encourage everyone to download the API, docs, and examples to see how Yahoo! is working to standardize the Ajax experience.

Updating their My Yahoo! functionality

Not long ago, Yahoo! made it really easy for users to include RSS feeds from blogs and other sites into the content on users’ "My Yahoo!" pages. Now, I noticed this evening that some headline groups sport a tiny icon. No doubt, this is the icon associated with the RSS feed that is used to build the headlines. In addition, users can hover over the headline and see a ‘preview’ of the text behind the link. Pretty sweet!

Powerful Ajax email client

I have also been beta-testing the new Yahoo! mail client and like what I see. while Google was first out of the gate with an Ajax-enabled web mail client, Yahoo! has, imho, done a much better job of biulding a powerful, friendly tool for their users. While the Yahoo! Mail team has lots to do to integrate their other services (Calendar, Contacts, & Notepad) into the new interface, I really have few bad things to say about this major upgrade to thier mail client. Anyone who has a Yahoo! mail account can sign up for the Yahoo! Mail beta


Technorati Tags

I tag my posts for easy indexing at Technorati.com


Fixing URLs to Improve SEO

February 12, 2006

In my last entry, I talked about how I was able to improvemy client’s ‘Google-friendliness’ by creating a proper 404-handler for our webcontent system. In this article, I’ll talk about how you can improve yourcontent’s appearance in search engines (including their page rank) by forcingall your URLs to appear as lowercase.

Google is Case-Sensitive

When working to improve our page rankings, I noticed thatthe Google index was case-sensitive. If a tie has links that point to that samepage address using different case, the page would appear twice. Aside frombeing a bit annoying, this feature has the potential to leak valuable PR (pagerank) from one entry for that page to the other – bad for clients who are rabidabout tracking their PR.

First, I set up a process that ‘scrubbed’ pages within thesite, and forced all internal links to lowercase. But this is only on half ofthe problem. Since clients can’t always control how external links are writtenwhen they point to their pages, I needed to come up with a way to ‘correct’ forany inconsistencies in the way URLs were composed by others outside our control.One of the best ways to do this is to create a routine that checks the URL whenit is presented to your server, ‘fixes’ it, and then tells the browser deviceto ask again with the proper URL. This process is called page redirecting.

Fixing URLs with ASP.NET

My approach is to force all URLs to lowercase. This presentsa consistent look to all bots and is easy on the eyes for users. One stickingpoint is that you often cannot force query string arguments to lowercase. Thiscan mess up user inputs and other important case-sensitive data being passedalong with the URL. So, I needed a simple routine for checking the ‘case-ness’of the URL and, if needed, redirecting the browser to the proper URL.

With ASP.NET, it’s pretty simple. You need to check threeparts of the URL for case issues:

  • Scheme (“http”, “ftp”, “mailto”, etc.)
  • Authority (www.mysite.com, mysite.com:80, etc.)
  • LocalPath (/folder/page.aspx, /page.aspx, etc.)

Note that there is a Host element that you might think touse. This returns the domain portion of the URL (www.mysite.com), but it willnot return any port information (:80). Authority returns both. MSDN has a nice summary on the Request.Url object in ASP.NET

So, below is some simple C# code to pull out the items,check for uppercase characters (using RegExp) and then return the new string,if needed.

public static string LCaseUrl(HttpContext ctx)
{
string scheme, authority, path, query, ucase, rtn = string.Empty;

try
{
scheme = ctx.Request.Url.Scheme;
authority = ctx.Request.Url.Authority;
path = ctx.Request.Url.LocalPath;
query = ctx.Request.Url.Query;

ucase = "[A-Z]+";
System.Text.RegularExpressions.Regex re =
new System.Text.RegularExpressions.Regex(ucase);

if (
re.IsMatch(scheme) ||
re.IsMatch(authority) ||
re.IsMatch(path)
)
rtn = string.Format(
"{0}://{1}{2}{3}",
scheme,
authority,
path,
query).ToLower();
else
rtn = "";
}
catch { }

return rtn;
}

Now all that’s needed is some code to call this LCaseUrlroutine. I placed mine in the Application_BeginRequest event of the Global.asaxfile here:

protected void Application_BeginRequest(object sender, EventArgs e)
{
// check url and redirect, if needed
string newUrl = Utility.LCaseUrl(HttpContext.Current);

if (newUrl.Length != 0)
Response.Redirect(newUrl,true);
}

Notice that I wrote the code so that, if no change wasneeded, nothing is returned. This will cut down on the number of redirects Ineed to issue to browsers.

Summary

Now, whenever anyone links to a site that uses this feature,the URL will always be echoed back to the browser using lower case for allparts of the URL except the query string (if it exists). This will do a betterjob of controlling the way pages appear in search engine indexes and,potentially, improve the page rank value of those same pages in the index.


Technorati Tags

I tag my posts for easy indexing at technorati.com


When Smart 404s are neither

February 5, 2006

This past week, a major client asked me to help them make their web content ‘search engine friendly.’ And now-a-days, that means ‘Google friendly.’ Along with the usual work to improve the meta-data on pages (title, keywords,description, etc.), one of my tasks was to make sure our 404-handling worked well. I thought it would be a simple task. I was wrong. In fact, our current content-delivery system messes up things rather badly.

In the process of fixing my problem, I learned a bit about HTTP Status Codes, about proper uses of the ASP.Net HttpResponse object and about a cool Internet Explorer plug-in to monitor client-server interactions with my webserver.

First, some background

Before getting into the details of how our smart 404 handler was messing thing up, I’ll take a moment to talk about the basics of HTTP conversations between clients and servers. If you’re not much into this low-level stuff, don’t worry. I’ll be brief.

In every interaction between web clients (browsers)and web servers, a status code is sent by the server to the client. You’ll find a nice summaryof HTTP Status codes at the W3C site.

There are five basic types of status codes in the HTTP world:

  • 100 – Informational
  • 200 – Success
  • 300 – Redirection
  • 400 – Client Error

The 400 series is the one that is used when a client browser asks for something the server can’t deliver. A 404 code is sent when the item was not found. A 403 code is sent when the client is not authorized to see the item, etc.

Its good form for servers to use the HTTP status code values to keep the client browser properly informed. For example, when the server returns the requested page, the HTTP status code of 200 should be sent. When a browser asksfor a page that cannot be found, the server should send a 404 message. Luckily,ASP.NET and IIS handle this for you. Unfortunately, when you add custom handlers into this mix, things don’t always work properly. For example, few ‘smart 404 handlers’ send a 404 status code. Instead, they almost always send an HTTP status code of 200. And that’s where the trouble starts.

My smart 404 handler was lying

The content-delivery model I use (built using C# and ASP.NET) has a great feature to handle bad browser requests. If the server gets a request for a URL that is no longer valid, our system will look to see if a new URL is available as a replacement (renamed or moved page, etc.) or, if that fails, will return a page that includes some suggested replacements and a search box to help the user. This is nice for human users, but has a fatal flaw for non-human users like search bots.

See, my handler was lying to clients. It always returned a Status of 200 – no matter what content was requested. You could ask for a randomly-generated URL and it would still return a page with a status of 200. I know this because I learned that Google’s search bot actually does just that – requests a url that it knows doesn’t exists. And if your server returns a page with status 200, the Googlebot brands your site a ‘liar’ and will tell you so, too! And there are a number of other bots that do the same thing.

So, to help my client make their site ‘search engine friendly,’ I had to train my 404 handler to start telling the truth.

Always return 404 if you can’t find the page

To fix this problem, I needed to alter the code used to return our friendly 404 page. Since I use ASP.NET, this is pretty simple. I just needed to set one or more values in the HttpResponse object. You’ll find a a nice summary of the HttpResponse object at the MSDN web site.

The Response object has a number of properties you can set to make sure you send the right status code to the caller.

  • Response.Status – a string that has both the code and a simple message (i.e. “404 Not Found”)
  • Response.Statuscode – an integer value (i.e. 404)
  • Response.StatusDescriptoin – a string that is the status message (i.e. “Not Found”)

Now, whenever I am sending my friendly 404 content, I use the Response object to set the HTTP status information to include the 404 message.

NOTE: It also turns out that using ASP.NET’sResponse.Redirect in your 404 handler might be getting you into trouble with bots, too. But that’s fodder for another article.

Validate your server responses with this handy IE plug-in

One of the ways I was able to verify that my server was behaving properly was to inspect the HTTP Headers sent back and forth between the client and server. There are a number of nice tools for this. The one I had the most fun with is actually a free Internet Explorer plug-in called ieHTTPHeaders written by Janus Blunck. You can download ieHTTPHeaders for Internet Explorer from his web site. I use this handy tool quite a bit now.

Summary

So, to summarize: If you are responsible for programming web content delivery, you need to make sure your pages always return a 404 from the server whenever clients ask for something you can’t find. Even if you return a content page that includes helpful, friendly content, you still need to tell the client that this is a 404 situation. This makes bots happy and that will make you and your clients happy, too.


Technorati Tags

I tag my posts for easy indexing at Technorati.com