DIY: Duplicate content check

Duplicate content is much-dreaded in the world of SEO. If your content lives on multiple pages on your site, or other websites, Google might get confused and won’t know what to rank first. You’ll want to prevent duplicate content as much as possible. So, what can you do, yourself? Here, I’ll explain how to perform a duplicate content check, which you should do from time to time to find copied content. Plus, some tips to avoid duplicate content in the first place. Let’s get started!

Adding a preventive snippet

In the ‘Search Appearance’ > ‘RSS’ section of our Yoast SEO plugin, we have predefined a snippet to add to your feed entry saying “This article first appeared on yourwebsite.com”. The link in this snippet makes sure that every scraper includes the link to the original article. Of course, this already helps to prevent duplicate content, as Google will find that backlink to your website.

Nevertheless, if you write awesome content, your content will be duplicated. And that copy won’t always include a link to your website. All the more reason to do a duplicate content check on a regular basis.

CopyScape duplicate content checker

There are a lot of tools to find duplicate content. One of the best known duplicate content checkers is probably CopyScape.com. This tool works pretty easily: insert a link in the box on the homepage, and CopyScape will return a number of results, presented a bit like Google’s search result pages.

copyscape duplicate content checker results
The results page of a CopyScape scan

You can click the results for more details and to see which parts of your text are duplicate. Let’s look at an example from our popular post on 6 common SEO mistakes, which was first published on 3 October 2017. Copyscape found that 170 words, or 9% of this post, were copied:

CopyScape highlights passages that are duplicate

In this case, the first paragraph from our article, discussing low site speed as a common SEO mistake, was copied and turned into a short blog post. CopyScape clearly highlights the text they found to be duplicate, which gives an idea of how severe the copying is. If it’s just a small percentage of the page, I wouldn’t worry. If it’s like over 40%, and makes up quite a large part of the other page, I would simply email them to change the copied text.

Use the CopyScape duplicate content checker to find copied content from your website on other websites. Again, it’s one of many tools, but this one’s free and easy to use. Keep in mind, though, you won’t get unlimited scans for one website. If you want to dive a bit deeper into your duplicate content, CopyScape also offers a premium version for more insights.

Tip: Duplicate content on product pages

Using CopyScape, we frequently find manufacturer descriptions used in online shops to be duplicate. Usually, these are automatically imported into the shop’s content management system. Usually, not just for your website. Be aware of this. I understand it’s quite the hassle to write unique product descriptions for every product. But, don’t your best-selling products, at the least, deserve as much? So start now and take it from there!

Siteliner internal duplicate content check

Siteliner is CopyScape’s brother that searches for internal duplicate content. So, this duplicate content checker will find duplicate content on your own site.

Internal duplicate content

Internal duplicate content, how does that happen, you ask? Well, a very common example of this is when a WordPress blog doesn’t use excerpts but shows the entire blog post on the blog’s homepage. That means that the blog post is available on at least two pages: the homepage and the post itself. And it’s probably on the category and tag overview pages as well. That’s four versions of the same article on your own website already.

Using excerpts (rather than showing the entire post) has the advantage that the excerpt always has a proper link to the post. This link will tell Google that the original content is not on that blog/category/tag page but in the post itself. We often recommend the use of excerpts.

Using Siteliner

The Siteliner duplicate content check will show you a lot of things, but limited to 250 pages and once every 30 days. Again, there is a premium version, but the free one will already give you a good impression. Just do a search and you’ll end up on the overview page. You’ll see the percentage of internal duplicate content at the top left. Don’t panic when you see high numbers, as this duplicate content check also considers excerpts duplicate content:

Siteliner results overview
The siteliner overview page

Simply click one of the links and check if it’s indeed the excerpt. The excerpt obviously links to the post, so if that’s the case, you’re covered.

Siteliner highlights the content it considers internal duplicate content and tells you where to find it

Sidenote on using duplicate content checkers

While Google understands what a sidebar is, CopyScape and Siteliner appear to include all text on a page in their percentage calculations. This means that the actual percentage of the duplicate content, when just looking at the main content of a page, might be higher. Please keep this in mind when you use one of these duplicate content checkers. Just a heads-up!

Manual duplicate content check

CopyScape and Siteliner are nice, easy-to-use duplicate content checkers. However, if you want to see what’s duplicate according to Google, you could also just use Google itself.

If you have a certain page that you’d like to check, simply go to that page. Copy a text snippet, preferably from a section that you think might be attractive for others to copy. Let’s take a passage from our common SEO mistakes article: “If your page title is too long (currently 400 to 600 pixels), it will get cut off in Google. You don’t want potential visitors to be unable to read the full title in the SERPs.” (Note that Google only takes the first 32 words into account). Insert the exact snippet in Google between double quotation marks like this:

Duplicate content check in Google

This search query returns ‘about 208 results’ according to Google, which is well over the 10 results CopyScape returned.

Check your own duplicate content

Use a duplicate content checker like CopyScape to find what has been copied from your site, and use Google to see where else on the internet this content ended up. These are simple tools that serve a higher goal: to prevent duplicate content. If you want to read more on duplicate content, start with our Duplicate content: causes and solutions article.

Read more: rel=canonical: the ultimate guide »

Coming up next!


3 Responses to DIY: Duplicate content check

  1. Anshul Panchal
    Anshul Panchal  • 5 years ago

    nice post.I`m a blogger. I use Copyscape to find duplicate content of my website. and i don`t think the mannual duplicate content check is useful enough

  2. jaredmanninen
    jaredmanninen  • 5 years ago

    This is probably a bit off-topic for this specific article, but I couldn’t find anything addressing my duplicate content-related question. I write a hiking blog and would like to publish a curated list post of my own articles/hikes. Originally I planned to use the introduction paragraphs of each hike with a “read more” option to link to the specific article (not unlike when you click on a tag or category. But now I’m thinking that’s going to just create duplicate content problems. Any feedback or suggestions regarding this idea, or could you point me in the right direction? Thanks for your help.

  3. Felicia
    Felicia  • 5 years ago

    I understand it can be beneficial to re-write products descriptions if they came from the vendor so that they are unique. Is this still a priority if the page returns only 9% duplicate content most of which is b/c of the product description? Is it still worth writing new descriptions with a low percentage over all?