The complete guide to duplicate content

In doing my work as an SEO, I constantly encounter people who do not “get” the concept of duplicate content. It’s as old as the search engines themselves, and SEO’s have been fixing it since day one, but we apparently still need to do more about it. That’s why I decided to take some time and write a guide to help you learn what it is, how to identify and how to solve it:

Duplicate content: causes and solutions

Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

27 Responses

  1. Tommy DayBy Tommy Day on 9 December, 2010

    Great guide Joost, I feel like I’m taking a masterclass on SEO by reading your site!

    As far as duplicate page titles, I have that problem using pagination on my blogs. Page 1 and page 23 of a category archive or the home page have the same page title. I noticed your posts page does the same thing.

    Does Google not care about duplicate page titles as much when it’s related to pagination?

  2. JPopBy JPop on 9 December, 2010

    Thanks for sharing all these useful informations about duplicate content, your guide is very complete! I’ll bookmark it to read everything :)

  3. Jean-Paul HornBy Jean-Paul Horn on 10 December, 2010

    I’m really hoping the new (Google) meta tags syndication-source and original-source are going to alleviate the scraping problem somehow, although these tags are probably going to be easily misused :-( I’m all ears for alternative solutions for fighting scrapers if anyone has them …

  4. Dave from The Longest Way HomeBy Dave from The Longest Way Home on 10 December, 2010

    Excellent guide Joost. I was getting dupes from Mobile traffic. Google sorted that one after different titles.

    Only issue I see is those stumbleupon links. If someone stumbles your content. Stumbleupon seems to exclude a trailing slash.

    The two url’s show up in analytics but not WM tools.

    Is there a way to tackle this? Or if it’s not showing up as an issue in webmaster is it a non event and google see the difference?

    • Joost de ValkBy Joost de Valk on 10 December, 2010

      With just the / search engines should be able to figure out the difference, in most cases your webserver will even do the redirect for you.

  5. Dennis SieversBy Dennis Sievers on 10 December, 2010

    Prefect guide Joost! As SEO’s we encounter these problems every day. In my opinion Google makes it sound like a small problem. If you only consider the fact that Google tells us that they will choose ‘the best page’ for you it does sound like its not a big problem at all. But what they don’t mention is that is not only has affect on
    1. the popularity flow within your website
    2. the crawlcapacitity spiders can/will assign to your website

    it also has affect on the statistics they are gathered by web analytics vendors, since all the data usually is assigned on a URL level. For instance, if your homepage can be accessed via http://www.example.com, but also via http://www.example.com/default.aspx and http://www.example.com/default.asxp?ID=1, all that data is spread across those URL’s. If you want to know how many visitors came directly to your homepage, you need to bend over backwards to gather all the data for all those URLs. So, to effectivly measure and act on statistical data for your website, these duplicate problems need to be fixed too. Every piece of content should only be accessed via 1 unique URL. This means the canonical tag is not suitable to solve this analytics problem.

  6. Doug FrancisBy Doug Francis on 10 December, 2010

    Thanks again for reminding us that Google Webmaster Tools gives us a ton of critical data about our blogs/sites and how they are performing (or poorly performing).

  7. Avergae JoeBy Avergae Joe on 11 December, 2010

    I set up a quote new website that I am now going great guns on to get uploaded with lots of content and as SEO as possible as well as getting crawley etc. I was recommended your posts about duplicate content as well as your master blog item (now a page :)) on wordpress use for SEO. I have now implemented all the recommendations along with the various plug ins – when you read a blog it makes such simple sense, but the irony is until you read it you are too busy adding things to contemplate what you are doing.
    Thank you therefore for taking the time to put pen to paper (metaphorically) and doing this.
    Regards Mike

  8. DanielBy Daniel on 11 December, 2010

    Joost, I noticed on my site, I am using your SEO plugin and the tag pages that relate to a main post are showing the /tag/ in the canonical meta. Should it be this way? And if not, what should I do about it? I don’t want to be hit with duplicate content because of this.

  9. Tom DeftyBy Tom Defty on 12 December, 2010

    Very interesting article, plenty to work on. Google Webmaster Tools is a must for any seo newbies.

  10. NiharBy Nihar on 13 December, 2010

    Thanks for sharing this guide.

    I will check that guide right away

  11. Ben HusonBy Ben Huson on 15 December, 2010

    Very useful in-depth article, Joost.

    I tend to find that one type of site that commonly suffers from duplicate content is ecommerce sites. This is largely because you will often want to display the same product information under different categories and areas of the site, often styling the product information differently depending on the context of the product.

    Logically these duplicate product pages can be accessed at different URLs so that you can maintain the context of the product page. For example I might have a hat that I want to appear in the ‘Mens’ category and the ‘Womens’ category, both which are styled differently, and I will need 2 different URLs because I will want to send out links marketing the hat to both women and men.

    Also in this circumstance you would also probably want to track views of that product page separately for men and women so it makes sense to use different URLs.

    In such cases I guess the best solution would be to use the canonical link tag in the head to specify your preferred page to index?

    • Joost de ValkBy Joost de Valk on 15 December, 2010

      Yeah that’s what canonical was meant for :)

  12. Alain SadonBy Alain Sadon on 15 December, 2010

    Great guide, Joost!

    Concerning the WWW vs non WWW duplicate content issues: what do you think of setting the “preferred domain” (in Google Webmaster Tools) to the WWW-version of the site? The result of doing so affects the way the URL is displayed in the SERP’s (as www). But more importantly, it makes Google translate non www-references to www-references, which therefore seems to solve this specific duplicate content issue:
    “if you specify your preferred domain as http://www.example.com and we find a link to your site that is formatted as http://example.com, we follow that link as http://www.example.com instead.”
    Source: http://www.google.com/support/webmasters/bin/answer.py?answer=44231&hl=en

    What it is your take on this solution? Thanks!

    • Joost de ValkBy Joost de Valk on 15 December, 2010

      Hey Alain,

      that’ll work nicely for Google but not for other search engines :)

      • Alain SadonBy Alain Sadon on 15 December, 2010

        Other search engines? :)

        • Joost de ValkBy Joost de Valk on 15 December, 2010

          Yeah, I know that’s hard to grasp for most Dutch SEO’s :) in the US and the UK though, the market share for Bing is big enough that it’s not something to ignore.

  13. Pawel UrbanskiBy Pawel Urbanski on 15 December, 2010

    Hi Yoast,
    Thanks for a very informative article. Your notes can be expanded by another tip.
    You can tell Google, Yahoo or other search engines to ignore certain parameters from query strings.
    All the temporary parameters such as session ids or sorting keys can be marked as not important.

    Greetings,
    Pawel

    • Joost de ValkBy Joost de Valk on 15 December, 2010

      Good point Pawel, that should actually have been in there, I’ll update it later on!

  14. DanielBy Daniel on 15 December, 2010

    Hey Yoast,
    Question. In the way of duplication. What would happen with something like magento where you can have multi-stores, using different URLs, but the products can be assigned to each store. And if you do not change the descriptions, etc for each product. What would happen in a case like that?

    Great Article! – Dan

    • Joost de ValkBy Joost de Valk on 15 December, 2010

      You’d end up in duplication hell. Seriously, that’s the sort of issues that are only solvable by creating unique content across the board. Do keep in mind that having the same content in english on shop A and in french on shop B is no issue.

  15. TheoBy Theo on 15 December, 2010

    Excellent guide on duplicate content, thanks for sharing all these useful infos!

  16. Dave from The Longest Way HomeBy Dave from The Longest Way Home on 19 December, 2010

    Webmasters often shows my duplicate content based on comments.

    e.g. http://www.example.com/blogpost/?postcomment=true

    This being a duplicate of http://www.example.com/blogpost/

    I know after sometime webmasters stops showing the error. But, I would like to know if there is a way of preventing this from showing in the first place?

  17. birdBy bird on 28 December, 2010

    Hi, thanks for all the great info and plugins! I’m using your SEO plugin and it’s great. I’m using sub categories and sub, sub categories on my page. Should I no follow/ no index sub categories? Thanks!

Trackbacks