What is duplicate content?

You’ve probably come across the term duplicate content quite a lot, but what is it? Duplicate content is content that lives in several locations — i.e., URLs. It can harm your rankings and many people say that copious amounts of it can even lead to a penalty by Google. That’s not true, though. There is no penalty, but having loads of duplicate or copied content can get Google to influence your rankings negatively.

What is duplicate content?

Duplicate content is all content that is available on multiple locations on or off your site. It often lives on a different URL and sometimes even on a different domain. It mostly happens accidentally or is the result of a sub-par technical implementation. For instance, your site could be available on both www and non-www or HTTP and HTTPS — or both at the same time, the horror! Or maybe your CMS uses excessive dynamic URL parameters that confuse search engines. Even your AMP pages could count as duplicate content if not linked properly. It is everywhere.

Google’s definition of duplicate content is as follows:

“Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.”

That last part is important. If you scrape, copy and spin existing content — Google calls this copied content — with the intention of deceiving the search engine to get a higher ranking you will be on dangerous ground.

Google says this type of malicious intent might trigger an action:

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results”

Michiel has some great tips for discovering duplicate content on your site in his DIY Duplicate content check and for what to do if someone copies your content. Google’s documentation is also a goldmine if you’re working with this type of content.

Duplicate content, copied content and thin content: what’s the difference?

The topic of duplicate content confuses a lot of people. For Google, it mostly has a technical origin, but it will also look at the content itself. “I have two URLs for the same article, which one should I choose?” While most regular people will probably think of pieces of similar content that appear elsewhere on a site. “I have used this piece of text in several other places, is that bad?” This is all duplicate content, but for determining rankings, search engines make a distinction between duplicate content, copied content and thin content.

Your duplicate content might classify as copied content if you use an existing text and rehash it quickly to reuse it on your site. It doesn’t matter if you give it a little spin or put in a few keywords, this behavior is not acceptable. Throw in a couple of thin content pages — pages that have little to no quality content — and you’re in dangerous territory. Site quality is an issue and these tactics can bring serious harm to your site. Remember Panda?

A quick side note for users of our Yoast Duplicate Post plugin. Don’t worry, the posts you cloned using our plugin don’t count as duplicate content. Unless you publish both the clone as well as the original without making any changes. Read more about how to use the Yoast Duplicate Post plugin and why.

Read more: How to get similar pages ranking »

Don’t block duplicate content on your site

Google is pretty apt at discovering and handling duplicate content. The search engine is smart enough to figure out what to do with most of the content it finds. If it finds multiple versions of a page it will fold these into the version it finds best — in most cases, this will be the original article/page. What it does need, though, is complete access to these URLs. If you block Googlebot in your robots.txt from crawling these URLs, it cannot figure these things out by itself and you will run the risk of Google treating these pages as separate instances. Here are a couple of things you should do:

  • Allow robots to crawl these URLs
  • Mark the content as duplicate by using rel=canonical (read more about this below)
  • Use Google’s URL Parameter Handling tool to determine how parameters should be handled
  • Use 301 redirects to send users and crawlers to the canonical URL

There’s more you can do to fight duplicate content on your site as Joost describes in his article on the causes and solutions.

Use rel=canonical!

One of the essential tools in your duplicate content fighting toolkit is rel=”canonical” . You can use this piece of code to determine what the original URL is of a piece of content, something we call the canonical URL. We have an excellent ultimate guide to rel=”canonical” that shows you everything there is to know about it.

Focus on original, fresh and authoritative content

Another tool in your arsenal to fight duplicate, copied and unoriginal content are your writing skills. Google is focused on quality. It is always on the lookout for the best possible piece of content that fits the user’s intent best. Your goal should not be to make a quick buck but to leave a lasting impression. Watch out for thin content and make sure to make it original and of high quality.

The same goes for similar content on your site. We’ve talked about keyword cannibalization before and this is an extension of that. Folding several comparable posts into one can achieve much better results, both in terms of rankings as well as fighting duplicate content.

Here’s Google’s take on similar content:

“Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.”

If you want to know how to go about this, this step-by-step guide written by Joost clearly explains how to find and fix keyword cannibalism on your website.

Duplicate content is everywhere — know what to do about it

Ex-Googler Matt Cutts once famously said that 20% to 30% of the web consists of duplicate content. While I’m not sure these numbers are still accurate; duplicate content continues to pop up on every site. This doesn’t have to be bad news. Fix what you can and don’t try and turn duplicate content and its siblings copied content and thin content into a viable SEO strategy.

Keep reading: Content maintenance for SEO »

Coming up next!


26 Responses to What is duplicate content?

  1. Paul Watchorn
    Paul Watchorn  • 6 years ago

    Hi,
    Regarding the ‘telling google what ‘versions’ of the site there is, then informing google which is the one to take notice of.

    This is really difficult to understand for people like me.
    Two questions:
    1. How do I find what versions i have out ther
    2. How to inform Google which is the one to watch.
    Thanks you

  2. RAGHVENDRA NARAYAN BHARDWAJ
    RAGHVENDRA NARAYAN BHARDWAJ  • 6 years ago

    Well ! This post is very nice about duplicate content. I have learned about duplicate contents.
    Duplicate content is not good for SEO.

    • Willemien Hallebeek
      Willemien Hallebeek  • 6 years ago

      Glad you’ve learned that, Raghvendra!

  3. Aditya Harlalka
    Aditya Harlalka  • 6 years ago

    Hey Edwin!! Duplicate Content is something, all should know. It is very important, especially in Content Writing.
    Thanks a lot for this information. This is very helpful.

  4. Petar
    Petar  • 6 years ago

    various tools show that I have pages with duplicated titles and it’s actually archive pages that all have the same Title.
    eg. site.com/page/2 have the same title like the homepage (I have a blog format for the site).
    I noticed that you sorted that out on your blog so for archive pages you have for example:
    SEO blog • Page 3 of 84 • Yoast

    how did you achieve that? how can we add that pagination within the tags ?
    thank you!

    • Willemien Hallebeek
      Willemien Hallebeek  • 6 years ago

      Hi Peter, Good question! We solved this with rel=”next” and rel=”prev”. You can find the details on that here: https://yoast.com/rel-next-prev-paginated-archives/ Good luck!

      • Petar
        Petar  • 6 years ago

        Hi Willemien,
        thanks for replying.
        I know about rel meta tags and I have them, but that doesn’t affect the title.
        on your blog I noticed that you have custom Title tag for each archive page, eg for page 2 is:
        SEO blog • Page 2 of 84 • Yoast
        so what I’m asking is, if it is possible to adjust that with SEO yoast plugin and how?
        thank you!

  5. Eric Roberts
    Eric Roberts  • 6 years ago

    Hi we have thousands of products and it would be impossible to add new content for every product? my site is now doing well with the help of Yoast, but it does sometimes concern me about what could be classed as duplicate content on my products ?thanks Eric Roberts

  6. Nancy E. Head
    Nancy E. Head  • 6 years ago

    Duplicate content is something I’d heard about–and imagined to be something like plagiarism. Thanks for a great post. Very helpful information.

    • Willemien Hallebeek
      Willemien Hallebeek  • 6 years ago

      You’re welcome, Nancy!

  7. Stevie Howard
    Stevie Howard  • 6 years ago

    Curious about how this relates to republished content. Does Google also see this as negative republishes an article on a different site or if a site pulls from an RSS feed to feature the article?

    • Willemien Hallebeek
      Willemien Hallebeek  • 6 years ago

      Hi Stevie, If sites use your article they should mention that and add a link and a canonical to the article on your site. If they don’t, you should ask them to do so or remove it. Good luck!

      • Vic
        Vic  • 6 years ago

        add a canonical? I must have read umpteem articles on canonicalization. And you know what? I still don’t understand WTF it means. Yes, I’ve read the ones here at Yoast. Can anyone please point me to a first grade level explanation of what it is and how to use it? Thx.

        • Willemien Hallebeek
          Willemien Hallebeek  • 6 years ago

          Hi Vic, Thanks for your question. It’s a great idea for an SEO Basics post. We’ll try to write that one soon. For now I can just give you this: Imagine you have two posts that are very similar. Post x is the original, post y is the copy. If you add a canonical link to Post y, you add a (non visible) link to your post that tells search engines that the original version of the article is Post x. Search engines will then give “credits” for the content to Post x, instead of Post y. Implementing that is easy when you use Yoast SEO. Joost describes that here: https://yoast.com/rel-canonical#setting Hope this helps!

  8. Caroline
    Caroline  • 6 years ago

    I always have a line or two at the bottom of my blog posts saying ‘If you enjoyed this, why not pin the below image to your Pinterest board… And you may also like my posts about other cities in x… and you can subscribe for updates at…’ generic-ish stuff. Would that be enough to count as duplicate? I’m wary of deleting it because it works and readers do subscribe, pin and read on according to my analytics! Thanks

    • Iris Guelen
      Iris Guelen  • 6 years ago

      Hey Caroline! That’s totally fine, no worries! :)

  9. Andrea Wentorp
    Andrea Wentorp  • 6 years ago

    Hi Edwin, I am working myself through all that technical SEO stuff with Yoast’s great help and have one question concerning duplicate content: I have two identical page (one “.com” and one “.de”. Am I knocking myself out with that concerning duplicate content????

  10. Joanna
    Joanna  • 6 years ago

    Hello,
    we sell linen duvet covers and on every productpage, we will have a small text about linen (so this will be the same on every page). The rest of the text on every page will be different. Do I have a problem with the duplicated small text? Thank you!

    • Iris Guelen
      Iris Guelen  • 6 years ago

      Hi Joanna! When you have an online shop it makes sense that you have multiple pages for products that might be quite similar, so no worries!. It’s important that you link back from every product page to your category page, and that you optimize that category page. You can see your product pages as the more “long-tail keyword” focused pages, like “linen duvet cover + dark blue” and your category pages as the more general keyword focused pages, like “linen duvet covers”. Hope this helps!

  11. Bill Seng
    Bill Seng  • 6 years ago

    What about how WordPress “creates” pages based upon categories? Does that count as duplicate content, or does Google understand that’s how WP works, and thus not a problem?

    If it IS a problem, how do you suggest resolving for categories and tags?

  12. wasim akram
    wasim akram  • 6 years ago

    Sir keyword cannibalization is also a major issue. But how to remove duplicate content so that problem cannot occur again and again. Is there any proper method of it?

    • Iris Guelen
      Iris Guelen  • 6 years ago

      Hi Wasim! In this article, Joost explains the causes and solutions of duplicate content: https://yoast.com/duplicate-content/. Hopefully it can help you out!

  13. ken
    ken  • 6 years ago

    please do help me to review my site and find out if there is any duplicate content. thank u