Fixing URLs to Improve SEO

In my last entry, I talked about how I was able to improvemy client’s ‘Google-friendliness’ by creating a proper 404-handler for our webcontent system. In this article, I’ll talk about how you can improve yourcontent’s appearance in search engines (including their page rank) by forcingall your URLs to appear as lowercase.

Google is Case-Sensitive

When working to improve our page rankings, I noticed thatthe Google index was case-sensitive. If a tie has links that point to that samepage address using different case, the page would appear twice. Aside frombeing a bit annoying, this feature has the potential to leak valuable PR (pagerank) from one entry for that page to the other – bad for clients who are rabidabout tracking their PR.

First, I set up a process that ‘scrubbed’ pages within thesite, and forced all internal links to lowercase. But this is only on half ofthe problem. Since clients can’t always control how external links are writtenwhen they point to their pages, I needed to come up with a way to ‘correct’ forany inconsistencies in the way URLs were composed by others outside our control.One of the best ways to do this is to create a routine that checks the URL whenit is presented to your server, ‘fixes’ it, and then tells the browser deviceto ask again with the proper URL. This process is called page redirecting.

Fixing URLs with ASP.NET

My approach is to force all URLs to lowercase. This presentsa consistent look to all bots and is easy on the eyes for users. One stickingpoint is that you often cannot force query string arguments to lowercase. Thiscan mess up user inputs and other important case-sensitive data being passedalong with the URL. So, I needed a simple routine for checking the ‘case-ness’of the URL and, if needed, redirecting the browser to the proper URL.

With ASP.NET, it’s pretty simple. You need to check threeparts of the URL for case issues:

  • Scheme (“http”, “ftp”, “mailto”, etc.)
  • Authority (www.mysite.com, mysite.com:80, etc.)
  • LocalPath (/folder/page.aspx, /page.aspx, etc.)

Note that there is a Host element that you might think touse. This returns the domain portion of the URL (www.mysite.com), but it willnot return any port information (:80). Authority returns both. MSDN has a nice summary on the Request.Url object in ASP.NET

So, below is some simple C# code to pull out the items,check for uppercase characters (using RegExp) and then return the new string,if needed.

public static string LCaseUrl(HttpContext ctx)
{
string scheme, authority, path, query, ucase, rtn = string.Empty;

try
{
scheme = ctx.Request.Url.Scheme;
authority = ctx.Request.Url.Authority;
path = ctx.Request.Url.LocalPath;
query = ctx.Request.Url.Query;

ucase = "[A-Z]+";
System.Text.RegularExpressions.Regex re =
new System.Text.RegularExpressions.Regex(ucase);

if (
re.IsMatch(scheme) ||
re.IsMatch(authority) ||
re.IsMatch(path)
)
rtn = string.Format(
"{0}://{1}{2}{3}",
scheme,
authority,
path,
query).ToLower();
else
rtn = "";
}
catch { }

return rtn;
}

Now all that’s needed is some code to call this LCaseUrlroutine. I placed mine in the Application_BeginRequest event of the Global.asaxfile here:

protected void Application_BeginRequest(object sender, EventArgs e)
{
// check url and redirect, if needed
string newUrl = Utility.LCaseUrl(HttpContext.Current);

if (newUrl.Length != 0)
Response.Redirect(newUrl,true);
}

Notice that I wrote the code so that, if no change wasneeded, nothing is returned. This will cut down on the number of redirects Ineed to issue to browsers.

Summary

Now, whenever anyone links to a site that uses this feature,the URL will always be echoed back to the browser using lower case for allparts of the URL except the query string (if it exists). This will do a betterjob of controlling the way pages appear in search engine indexes and,potentially, improve the page rank value of those same pages in the index.


Technorati Tags

I tag my posts for easy indexing at technorati.com


Leave a comment