canonical urls as a solution for duplicate content

rel=canonical: the ultimate guide

rel=canonical: the ultimate guide

May 10th, 2016 – 38 Comments

History of rel=canonical

In February 2009 Google, Bing and Yahoo! introduced the canonical link element. Matt Cutt’s post is probably the easiest reading if you want to learn about its history. While the idea is simple, the specifics of how to use it turn out to be complex.

The rel=canonical element, often called the “canonical link”, is an HTML element that helps webmasters prevent duplicate content issues. It does this by specifying the “canonical”, or “preferred”, version of a web page. Using it well improves a site’s SEO.

The idea is simple: if you have several similar versions of the same content, you pick one “canonical” version and point the search engines at that. This solves the duplicate content problem where search engines don’t know which version of the content to show. This article takes you through the use cases and the anti-use cases.

The SEO benefit of rel=canonical

Choosing a proper canonical URL for every set of similar URLs improves the SEO of your site. Because the search engine knows which version is canonical, it can count all the links towards all the different versions, as links to that single version. Basically, setting a canonical is similar to doing a 301 redirect, but without actually redirecting.

The process of canonicalization

When you have several choices for a products URL, canonicalization is the process of picking one. In many cases, it’ll be obvious: one URL will be better than others. In some cases, it might not be as obvious, but then it’s still rather easy: pick one! Not canonicalizing your URLs is always worse than canonicalizing your URLs.

canonical graphic 1024x630

How to set canonical URLs

Correct example of using rel=canonical

Let’s assume you have two versions of the same page. Exactly, 100% the same content. They differ in that they’re in separate sections of your site and because of that the background color and the active menu item differ. That’s it. Both versions have been linked from other sites, the content itself is clearly valuable. Which version should a search engine show? Nobody knows.

For example’s sake, these are their URLs:

  • http://example.com/wordpress/seo-plugin/
  • http://example.com/wordpress/plugins/seo/

This is what rel=canonical was invented for. Especially in a lot of e-commerce systems, this (unfortunately) happens fairly often. A product has several different URLs depending on how you got there. You would apply rel=canonical as follows:

  1. You pick one of your two pages as the canonical version. It should be the version you think is the most important one. If you don’t care, pick the one with the most links or visitors. If all of that’s equal: flip a coin. You need to choose.
  2. Add a rel=canonical link from the non-canonical page to the canonical one. So if we picked the shortest URL as our canonical URL, the other URL would link to the shortest URL like so in the <head> section of the page:
    <link rel="canonical" href="http://example.com/wordpress/seo-plugin/">

    That’s it. Nothing more, nothing less.

What this does is “merge” the two pages into one from a search engine’s perspective. It’s basically a “soft redirect”, without redirecting the user. Links to both URLs now count for the single canonical version of the URL.

Setting the canonical in Yoast SEO

If you use Yoast SEO, you can change the canonical of several page types using the plugin. You only need to do this if you want to change the canonical to something different than the current page’s URL. Yoast SEO already renders the correct canonical URL for almost any page type in a WordPress install.

For posts, pages and custom post types, you can edit the canonical in the advanced tab of the Yoast SEO metabox:

canonical-in-yoast-seo

For categories, tags and other taxonomy terms, you can change them in the Yoast SEO metabox too, in the same spot. If you have other advanced use cases, you can always use the wpseo_canonical filter to change the Yoast SEO output.

When should you use canonical URLs?

301 redirect or canonical?

If you have the choice of doing a 301 redirect or setting a canonical, what should you do? The answer is simple: if there are no technical reasons not to do a redirect, you should always do a redirect. If you cannot redirect because that would break the user experience or be otherwise problematic: set a canonical URL.

Should a page have a self-referencing canonical URL?

In the example above, we make the non-canonical page link to the canonical version. But should a page set a rel canonical for itself? This is a highly debated topic amongst SEOs. At Yoast we have a strong preference for having a canonical link element on every page and Google has confirmed that’s best. The reason is that most CMSes will allow URL parameters without changing the content. So all of these URLs would show the same content:

  • http://example.com/wordpress/seo-plugin/
  • http://example.com/wordpress/seo-plugin/?isnt=it-awesome
  • http://example.com/wordpress/seo-plugin/?cmpgn=twitter
  • http://example.com/wordpress/seo-plugin/?cmpgn=facebook

The issue: if you don’t have a self-referencing canonical on the page that points to the cleanest version of the URL, you risk being hit by this stuff. Even if you don’t do it yourself, someone else could do this to you and cause a duplicate content issue. So adding a self-referencing canonical to URLs across your site is a good “defensive” SEO move. Luckily for you, our Yoast SEO plugin does this for you.

Cross-domain canonical URLs

You might have the same piece of content on several domains. For instance, SearchEngineJournal regularly republishes articles from Yoast.com (with explicit permission). Look at every one of those articles and you’ll see a rel=canonical link point right back at our original article. This means all the links pointing at their version of the article count towards the ranking of our canonical version. They get to use our content to please their audience, we get a clear benefit from it too. Everybody wins.

Faulty canonical URLs: common issues

There is a multitude of cases out there showing that a wrong rel=canonical implementation can lead to huge issues. I know of several sites that had the canonical on their homepage point to an article, and completely lost their home page from the search results. There are more things you shouldn’t do with rel=canonical. Let me list the most important ones:

  • Don’t canonicalize a paginated archive to page 1. The rel=canonical on page 2 should point to page 2. If you point it to page 1 search engines will actually not index the links on those deeper archive pages…
  • Make them 100% specific. For various reasons, many sites use protocol relative links, meaning they leave the http / https bit from their URLs. Don’t do this for your canonicals. You have a preference. Show it.
  • Base your canonical on the request URL. If you use variables like the domain or request URI used to access the current page while generating your canonical, you’re doing it wrong. Your content should be aware of its own URLs. Otherwise, you could still have the same piece of content on for instance example.com and www.example.com and have them both canonicalize to themselves.
  • Multiple rel=canonical links on a page causing havoc. Sometimes a developer of a plugin or extensions thinks that he’s God’s greatest gift to mankind and he knows best how to add a canonical to the page. Sometimes, that developer is right. But since you can’t all be me, they’re inevitably wrong too sometimes. When we encounter this in WordPress plugins we try to reach out to the developer doing it and teach them not to, but it happens. And when it does, the results are wholly unpredictable.

Order a website review and get a plugin of your choice for free. We'll even configure it for you

Get a Yoast website review

rel=canonical and social networks

Facebook and Twitter honor rel=canonical too. This might lead to weird situations. If you share a URL on Facebook that has a canonical pointing elsewhere, Facebook will share the details from the canonical URL. In fact, if you add a like button on a page that has a canonical pointing elsewhere, it will show the like count for the canonical URL, not for the current URL. Twitter works in the same way.

Advanced uses of rel=canonical

Canonical link HTTP header

Google also supports a canonical link HTTP header. The header looks like this:

Link: <http://www.example.com/white-paper.pdf>; 
  rel="canonical"

Canonical link HTTP headers can be very useful when canonicalizing files like PDFs, so it’s good to know that the option exists.

Using rel=canonical on not so similar pages

While I won’t recommend this, you can definitely use rel=canonical very aggressively. Google honors it to an almost ridiculous extent, where you can canonicalize a very different piece of content to another piece of content. If Google catches you doing this, it will stop trusting your site’s canonicals and thus cause you more harm…

Using rel=canonical in combination with hreflang

In our ultimate guide on hreflang, we talk about canonical. It’s very important that when you use hreflang, each language’s canonical points to itself. Make sure that you understand how to use canonical well when you’re implementing hreflang as otherwise you might kill your entire hreflang implementation.

Conclusion: rel=canonical is a power tool

Rel=canonical is a powerful tool in an SEO’s toolbox, but like any power tool, you should use it wisely as it’s easy to cut yourself. For larger sites, the process of canonicalization can be very important and lead to major SEO improvements.

Read more: ‘Duplicate content: causes and solutions’ »


38 Responses to rel=canonical: the ultimate guide

  1. Maciej
    By Maciej on 12 February, 2015

    Hi!

    Thank you for the great article.
    On our website we publish monthly reports and rankings, content isn’t always the same (it is not copy/paste) but generally has the same data in different order. There is also some comment attached to this (always different). We started to use rel=canonical and we can see great results from that+under the important phrases we always have the newest content in google. Isn’t it risky to do such things?

    Cheers!

    • Michiel Heijmans
      By Michiel Heijmans on 13 February, 2015

      Why would you think that, Maciej?

      • Te Zet
        By Te Zet on 13 February, 2015

        Because the content itself is different.

  2. Ruth
    By Ruth on 12 February, 2015

    Is there any plan to add rel=canonical to RSS/XML feeds so folks using your content will have that link back automatically? I suppose they’d still have to grab/use that field so it wouldn’t be foolproof. Can you speak to this?

    • Michiel Heijmans
      By Michiel Heijmans on 13 February, 2015

      It’s indeed an easy way to scrape content as well. Most of the times, this is done automatically – that is why we added the option to add a backlink to your content in RSS in our WordPress SEO plugin. That will make sure there is a link back to your site. Google will understand from the link that your website was the first to publish the content and will value your page much higher. It’s not the same, but basically the same indication to Google. Hope that helps!

  3. Anders
    By Anders on 12 February, 2015

    Hi,

    Ok I read this article but still not sure if to use rel=canonical for this scenario for wordpress:

    Site has a page for Areas Covered like this:
    http://sitename.com/areas-covered
    On that page there is a list of different areas covered, where each area covered area has a link to a page that is almost identical to the areas covered page above, apart from the link is unique for each area covered and title pages title and unique area named is use.
    Example of the other pages;
    http://sitename.com/areas-covered-locationA
    http://sitename.com/areas-covered-locationB
    http://sitename.com/areas-covered-locationC
    http://sitename.com/areas-covered-locationD
    http://sitename.com/areas-covered-locationE
    http://sitename.com/areas-covered-locationF
    etc….etc….

    These other pages all has the list of areas on it and all link to each other area page.

    What is the correct procedure here?

    As mentioned the pages are not identical because the is unique for each page i.e. LocationA and some words have been changed (but 99% is the same perhaps initially).

    The plan would be to later add more unique location images for each of all the location pages, perhaps write some more unique things about the area for each area page, but generally they are almost identical.

    Finally, if it’s a bad idea in the first place to have one similar page with a different URL for each area, then if ref=canocial is not the solution, would it be better to leave as-is or to hide all the separate area pages with a hide page plugin and stop robots from listing/indexing or caching these pages, only keeping the main page “Areas Covered”?

    I would appreciate advice that is simple to understand for a non-expert.

    Thanks
    Anders

    • Anders Sundstedt
      By Anders Sundstedt on 13 February, 2015

      Anyone that can reply to this one please?

    • Michiel Heijmans
      By Michiel Heijmans on 13 February, 2015

      That’s plain duplicate content. If the majority of the actual content on a page is the same, it’s duplicate.

      We recently published a very nice post on local SEO by Kris Jones you might want to check and please understand that the way you mention has past it’s expiration date. Might work for now, but usually short term. In the long run, local rankings are for local companies.

  4. Christian
    By Christian on 12 February, 2015

    Hi,
    Great post, very clear explanation.
    I had read though that rel=canonical should not be used with hreflang tags (which I find is not so well taken into account by Google).
    https://sites.google.com/site/webmasterhelpforum/en/faq-internationalisation (last QA in Rel Alternate Hreflang sectoin)
    Thanks!
    Christian

    • Michiel Heijmans
      By Michiel Heijmans on 13 February, 2015

      That’s not what it says, Christian :) You shouldn’t use canonical LIKE hreflang. In case of a multilingual website, the canonical goes to the page at hand (if appropriate) and the hreflang / alternates go to the other languages. So you can use these alongside each other, but don’t use canonical links for pointing to other languages to rank.

      The exact phrase on the linked page is: “We recommend not using rel=canonical across different language or country versions.”

      • Christian
        By Christian on 13 February, 2015

        Your are right, Michiel. Thanks!

  5. Nigam
    By Nigam on 13 February, 2015

    @Joost:
    Thanks for keep sharing such useful and informative articles.
    Indeed, re=canonical will be proven as A powerful tool if used wisely.
    Many of SEO experts uses this not only to avoid DUPLICATE content issue but also COPYRIGHT issue as well. What’s your opinion on this?
    Thanks,

    Nigam Parikh
    Mumbai,India.

    • Michiel Heijmans
      By Michiel Heijmans on 13 February, 2015

      Who says that accounts for copyright? I’d add a copyright statement in your footer / terms instead. Canonical isn’t anything that gives you rights..?

  6. kalyan
    By kalyan on 13 February, 2015

    Hi, i recently changed the category names in the site and removed them from parent category.this created duplicate meta tags and title tags which i noticed in webmaster tools, Can I use redirection plugin ?or as said in the above post can I use settings in your plugin.if yes please specify where ? as I have no technical experience

    The first is the new url and cat ; the second is the old one.
    /gs1/epw-disabled-citizens-disability-certificate/932/
    /p2/society/epw-disabled-citizens-disability-certificate/932/

  7. Patrice
    By Patrice on 13 February, 2015

    Until i read your article, i didn’t even know that they existed , it would be awesome to find a tool that checks for “compromised headers”, do you know if such a tool exist ?

    • Raoul de Boer
      By Raoul de Boer on 13 February, 2015

      I agree! Let me know when you find such a tool :-)

  8. David Sottimano
    By David Sottimano on 13 February, 2015

    Good article, solid advice, except one thing. I respectfully disagree with usage of cross domain rel canonicals for site migrations / rebrands. I know you added a line saying 301s are always the best, but that section of the article might confuse a lot of non SEOs as to what to do during migrations. Site migrations are stressful enough with 301s, so using a combination of rel canonicals, then 301s is sending 2 signals rather than 1, which significantly increases the chances of error (both on Google’s part and your own). Maybe you know something I don’t? ;)

  9. James
    By James on 13 February, 2015

    Awesome post Joost! Could have done with it last week whilst trying to justify a site restructure to a client.
    Bookmarked

  10. Siddharth T. Patel
    By Siddharth T. Patel on 16 February, 2015

    Very informative blog post. Everything makes sense though. Keep up the good work ahead too. Really appreciate the effort.

  11. Dennis
    By Dennis on 16 February, 2015

    Great stuff reading Yoast. I really think you came into the depth with canonical. I did not know, that Facebook is giving credit to the original source though, so thank you for new information!

  12. Paul Altieri
    By Paul Altieri on 17 February, 2015

    Hi Joost,

    Nice read…. site looks great. thanks for the tips.

  13. Arne van Elk
    By Arne van Elk on 17 February, 2015

    Hi Joost, have a question about canonicals and pagination. You say: “Don’t canonicalize a paginated archive to page 1. Don’t add a rel=canonical on page 2 and further, search engines will actually not index the links on those deeper archive pages anymore”.

    I wonder if this is true. Apart from indexation, does a canonical tell the search engine a page should not be followed as well? It seems to me these are two different things.

  14. viki sangre
    By viki sangre on 19 February, 2015

    I think if we can’t use canonical tag for either URL then it leads to cloaking. My websites many pages we detected as cloaked by several tools. After using canonical tag on archive everything goes right.
    Thank for sharing

  15. James Kockelbergh
    By James Kockelbergh on 20 February, 2015

    Hi Joost,

    As always great article.

    I am currently cleaning up a clients website where previously somebody did some real harm by “Using rel=canonical on not so similar pages” to boost link juice to a few pages on the website. Google penalized this website badly as they lost most of their traffic from search based terms.

    Would you recommend a manual review by Google after cleaning up the website?

    If you wish to add any further recommendations they will be appreciated.

    Best regards,

    James.

  16. jashon
    By jashon on 21 February, 2015

    So if we put self-pointing canonical in every-page we are safe from duplicate content issue from those rouge requests? that’s nice was always having problem with that.

  17. Shawn
    By Shawn on 24 February, 2015

    Good article. Curious about URL cloaking for FB and Twitter. We carefully canonical everything back to a single canonical URL across multiple sites where content might be viewed, but this creates problems with FB and Twitter sharing esp. when sites are branded or private labeled with their own FB campaigns. I saw on Stack that FB is okay with cloaking the canonical URL so it’s different (local) for FB and Twitter but constant for Google and other bots. Is this valid / white hat / the right way to handle this?

  18. JK
    By JK on 10 May, 2016

    I’ve read a lot about 301 vs cannonical. Imo cannonical is prefered in case of website preferences as www vs non www and https vs non https.
    If the page is actualy moved, than use a 301.
    If you decide to use a 301 redirect, really thing it through and only use them if your absolutely sure. Why? Because browsers cache 301 redirects. Once cached it can not be uncached unless your visitor clears the cache of their browsers (and we all know they won’t). This means if you make a typo or a mistake this can end up terrible. especially if you discover this mistake to late (many visitors might have cached the wrong url).
    We al know that marketingmanagers change their minds constantly. Today they want a www website and tomorrow it should be non www. Used 301redirects? Too bad.. a lot of visitors wind up in 301 loops and crashing websites.
    Is there a solution?
    For damage in the past? No. Only hope the cache of the browser is flushed soon.
    For the future?
    Yes, be sure to control the caching, by redirecting with proper caching instructions:

    Using PHP:
    header(“Cache-Control: no-cache, no-store, must-revalidate”); // HTTP 1.1.
    header(“Pragma: no-cache”); // HTTP 1.0.
    header(“Expires: 0”); // Proxies.

    Finally: I hear ‘specialist’ say they should 301 redirect removed pages. IMO this is unnatural and therfore not correct use of 301. Make sure your 404 page is perfectly clear for your visitors. Set the visitor back on track making the right suggestions of the content he’s looking for.

    Just wanted to share my experience.

  19. Jamie
    By Jamie on 10 May, 2016

    Hi Yoast,

    Many thanks for this post, I had a question that seems almost unanswered on the web. But what would you do for pages that have swapped content and how does Google work with that? e.g. if this link was also available the other way around: http://www.imaging-resource.com/cameras/canon/6d/vs/canon/5d-mark-iii and the content would reflect that.

    Would you recommend to use canonical for this?

  20. Pedro
    By Pedro on 11 May, 2016

    Thank you for such an informative article, Joost. Since I started following the blog I’ve learned quite a bit about SEO. It feels like a bottomless hole, there’s no ending to the amount of learning. I had heard about rel=canonical, but had no idea of what it was.

    Cheers!

  21. Abi
    By Abi on 12 May, 2016

    Thank you, this has solved so many problems for me!

  22. azukadm
    By azukadm on 13 May, 2016

    This has been a problem for me until now thanks a lot I will have to work on this…

  23. Rich Owings
    By Rich Owings on 18 May, 2016

    A client’s site has a blog that is canonicalizing page 2 to page 1. I can’t find a setting for that. Any suggestions?

    • Joost de Valk
      By Joost de Valk on 18 May, 2016

      Is that client using Genesis, by any chance? That theme has that, rather unfortunate, setting.

  24. Samuel
    By Samuel on 22 May, 2016

    I chose a preferred version of my blog in google webmaster tools. Would it be necessary to add canonical tag to my blog header? I keep getting a message on my OnPage dashboard about a canonical in my header pointing to a different direction. I have tried to locate this but have been unable to.

    • Joost de Valk
      By Joost de Valk on 22 May, 2016

      Choosing one in your code is always a better idea than just in google webmaster tools. Google webmaster tools only works for Google and there are a few other search engines out there :)

  25. Ady
    By Ady on 23 May, 2016

    My confusion set rel = ” canicacal ” after I create a new page on the menu level terraced my blog , by google new page I created contains a duplicate is detected , Is there any html code that must be set in the PHP blog theme ? If there are any existing page html code to be attached to the ?
    Webmaster Please check out my blog at http://www.bisnisonline9.com
    Thanks?


Check out our must read articles about Technical SEO