Your Guide to all things starting with "i" and more!

Canonical URL Tags

Canonical URL Tags, the most important thing since the coming of sitemaps.

Ok, got it, but what are they??? What do canonical url tags do?? And how should I use those canonical url tags?

Let me explain:

Yahoo!, Live & Google recently announced that they will be supporting a new “canonical url tag”.

According to Wikipedia, canonical means: reduced to the simplest and most significant form possible without loss of generality.

Translate this to the life of a webmaster, and you may get something like: reducing the url to its most basic value, which is still valid for its content. Now, does that make any sense? Yes, it does in fact.

[ad]Since search engines cannot guess if a url such as http://www.icantinternet.org/?p=177 and http://www.icantinternet.org/?p=155 are basically pointing to a totally different page, or if the parameters (all that is coming after the questionmark ‘?’ ) are only there for statistical purposes, or to identify a sessionid, or a user’s preference, and point to the (more or less) same page, this new canonical url tag can come in very handy. It can help us, webmasters, to remove the self created duplicated content, and thus (very important part here, listen to the rolling drums!)… transfer our linkjuice from all these until now separated urls, onto the one baseline url to which it belongs!

The problems search engines pre-canonical url tag had with this type of duplicate content is

  1. They don’t know which version(s) to include/exclude from their indices
  2. They don’t know whether to direct the link metrics (trust, authority, anchor text, link juice, etc.) to one page, or keep it separated between multiple versions
  3. They don’t know which version(s) to rank for query results

You can easily see that this will make your website suffer in rankings and traffic losses and engines suffer lowered relevancy.

All this can now easily be solved by using the canonical url tag.

So, you wonder, how do I use it then?

Easy as this:

<link rel="canonical" href="http://www.icantinternet.org/">

Each page that has this in its HTML header (where you also find the Title attribute and Meta Description tag), will transfer its link juice to the page to which the canonical url links, assumed that the content is similar.

The Canonical URL tag is similar to a 301 redirect, since you’re telling the engines that multiple pages should be considered as one, without actually redirecting your human visitors to the new URL, as a 301 would do. The main differences between a 301 redirect and the canonical url tag are:

  • Whereas a 301 redirect re-points all traffic (bots and human visitors), the Canonical URL tag is just for engines, meaning you can still separately track visitors to the unique URL versions.
  • A 301 is a much stronger signal that multiple pages have a single, canonical source. While the engines are certainly planning to support this new tag and trust the intent of site owners, there will be limitations. Content analysis and other algorithmic metrics will be applied to ensure that a site owner hasn’t mistakenly or manipulatively applied the tag, and we certainly expect to see mistaken use of the tag, resulting in the engines maintaining those separate URLs in their indices (meaning site owners would experience the same problems as before).
  • 301 redirects can work cross-domain, so you can redirect a page at domain1.com to domain2.com and carry over those search engine metrics. This does not work with the Canonical URL tag, which operates exclusively on a single root domain (it does work across subfolders and subdomains).

And to close this article, some interesting quotes from the three search engines that announced this new “canonical url tag”.

From Google:

Is rel=”canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.

Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.

from Yahoo!:

• The URL paths in the <link> tag can be absolute or relative, though we recommend using absolute paths to avoid any chance of errors.

• A <link> tag can only point to a canonical URL form within the same domain and not across domains. For example, a tag on http://test.example.com can point to a URL on http://www.example.com but not on http://yahoo.com or any other domain.

• The <link> tag will be treated similarly to a 301 redirect, in terms of transferring link references and other effects to the canonical form of the page.

• We will use the tag information as provided, but we’ll also use algorithmic mechanisms to avoid situations where we think the tag was not used as intended. For example, if the canonical form is non-existent, returns an error or a 404, or if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred.

• The tag is transitive. That is, if URL A marks B as canonical, and B marks C as canonical, we’ll treat C as canonical for both A and B, though we will break infinite chains and other issues.

and from Live/MSN:

  • This tag will be interpreted as a hint by Live Search, not as a command. We’ll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
  • You can use relative or absolute URLs in the “href” attribute of the link tag.
  • The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx”, and the ”href” attribute in the link tag points to “http://mysite2.com”, the tag will be invalid and ignored.
    • However, the “href” attribute can point to a different subdomain. For example, if the page is found on “http://mysite.com/default.aspx” and the “href” attribute in the link tag points to “http://www.mysite.com”, the tag will be considered valid.
  • Live Search expects to implement support for this feature sometime in the near future.
, , ,