The Definitive Guide to Canonicals and Robots – Updated

| 6 Comments

*Originally published 5/28/2013, Updated 10/29/2013*

If you read the article title above and your eyes started to glaze over, or if you thought “canonical, that’s not a real word”, then you should probably stop reading now. But if you immediately started to get inappropriately excited at finally having this all spelled out, then this guide is for you.

Robots.txt

The earliest and still most widely accepted version of disallowing content. You should use robots.txt when you have an area of your website that you don’t want any robots looking at. It’s ideal for areas like shopping carts or content behind login barriers.

Pros:

Cons:

Generally this doesn’t happen, but it’s worth thinking about. For example, if you had a protected area on a website that contained financial information, it wouldn’t be a good idea to disallow a directory like /private-account-numbers in your robots.txt. Choose a higher, less descriptive directory or don’t mention it at all and make sure it’s locked down with SSL.

Note that pages blocked in robots.txt CAN still accrue link value (PageRank), but if you block duplicate content instead of redirecting it or using canonicals, you don’t transfer that accrued link value to the canonicalized page.

robots-serp

A SERP result with Robots.txt

This likely won’t be a big deal since these pages won’t show up in regular search results, but again you should be aware that if a competitor goes looking, they can see the page exists. You can use the Google page removal tool in Webmaster Tools if a page that you don’t want shown is showing up this way.

Rule number one of writing for the web… if you don’t want it found, don’t publish it.

Meta Robots

You can also use robots commands in individual pages. Simply add a meta tag that uses one of the following combinations (from robotstxt.org):

Options from Robotstxt.org

Options from Robotstxt.org

Pros:

Rule number one of writing for the web… if you don’t want it found, don’t publish it.

Cons:

Again, you may see instances where a previously indexed page shows up in search results even though you have now marked it “noindex”. You need to manually remove the page using the Google removal tool mentioned above.

Canonicals

A relatively new option, the concept of canonical tags was introduced to the SEO world in 2010, but is now supported by all of the major search engines. They are designed to help webmasters control duplicate content that is created inadvertently, most commonly implemented in situations where pages are built on-the-fly from various database fields.

Pros:

Cons:

Which do you choose?

The answer is, “It depends”. Each situation is different, and while canonicals are imprecise and complicated, they are very powerful when implemented correctly. However, many webmasters implement canonicals when they should just be implementing a 301 redirect instead. Hopefully this guide will help you understand the pros and cons of each option so that you can make a more educated decision going forward.




  • http://www.websiteoptimizers.com/ Tom Bowen

    Thanks Jenny. Now I think you should do a Part 2! For example, you say that most webmasters implement canonical wrong. A post about typical mistakes and how it should be done would be great. Or when one method is better than another.

  • http://www.archology.com/ Jenny Halasz

    Hey, Tom! You hold the distinction of being our first comment on the new blog, so thanks! You read my mind on the next post. I already have it mapped out in my brain, just need to get it down on paper! Turns out I’m going to be helping out with All Things SEO on SEL now too, so watch for that. :)

  • Pingback: How Search Engines Work -- Really!()

  • http://squarejawmedia.com/ Brian McDonald

    Follow up question: Let’s say you have 3 versions of a landing page for a single campaign broken up by segments. Is best route to use canonical link versus blocking with robots for organic search?

    • http://www.archology.com/ Jenny Halasz

      Good question, Brian! I generally recommend completely blocking campaign landing pages, if they are different URLs than organic – even if the content is identical. While the canonical will transfer link popularity over to the organic version, it will really mess up your tracking if you’re getting links to a campaign page.

      • http://squarejawmedia.com/ Brian McDonald

        Thanks Jenny!