Find out who’s scraping you!

ScrapersSometimes you just need to know who is linking to a blog, or even better, your competitor’s blog, or who’s scraping it, and you’d actually like that data in a format you can use. I thought I’d make a list of the sources you can use to gather link-data from, outside of the “obvious” Yahoo SiteExplorer and Google Webmaster Tools.

  1. First of all, and most hardened SEO’s know this: Google Blog search. The link data in there is just fenomenal, as they’re not, like the normal Google link: command results, filtered. When you want a complete overview, make sure you disable the dupe content filter, that will show you all those pesky scrapers. If you need this data to use it in a tool, it offers RSS feeds too.
  2. Second: Technorati. If you click on a blog’s authority, you get to see it’s reactions, and you can also drag these out via the API.
  3. Blogpulse, you don’t even need to add link: here, you can just throw in a URL, just like the others, Blogpulse offers RSS feeds.
  4. IceRocket. Offers RSS feeds for the link: command queries as well.
  5. WASAlive, not too much data, but sometimes it offers some unique stuff. RSS feeds for the results as well.

Now you should also consider, when you’re querying those engines, to query for two things: the blog URL, and it’s feed URL. In my blog’s case: and Especially scrapers will usually just link to your feed URL.

Now if you’ve read my post on the Technorati Authority booster, you know why I needed this myself…

18 Responses

  1. Willy
    By Willy on 22 January, 2008

    Joost, I’ve found that the link: results in Google Blog search are extremely inconsistent for me. One day, some links show up, the next day it will be others, occasionally none will show up. Have you noticed this as well? Now, I don’t have too many links in there at this point, so maybe it’s more noticeable for me, but it’s quite strange.

  2. Joost de Valk
    By Joost de Valk on 22 January, 2008

    Hmm Willy, I’ll have a look :)

  3. gadget
    By gadget on 23 January, 2008

    Joost – Great post, thank you.

    Ever since I saw you at the a4u Expo I’ve been reading your blog and slowly building my knowledge so thank you. I’ve also started to download your WP plugins to combat scrapping, spam etc.

    One thing that I just don’t get at the minute is what to apply the ‘nofollow’ tag to (I use your plugin)? Having read comments on Google’s webmaster pages, they seem to be saying ‘don’t worry too much about duplicate content on your site’. However, I accept your argument that one doesn’t want ‘everything’ indexed, especially the same post again and again when listed in several locations – home page, post page, category page, archive page etc.

    If it helps, my latest site is at – appreciate any comments.

  4. Oliver Taco
    By Oliver Taco on 23 January, 2008


    It is a real problem – and I’d love to see a (not too expensive) paid service to help with this.


  5. Dennis Goedegebuure
    By Dennis Goedegebuure on 24 January, 2008

    Did you hear anything back from Nathan yet?



  6. Joost de Valk
    By Joost de Valk on 24 January, 2008

    @Dennis: yeah he’s solving it ;)

  7. webdesign den haag
    By webdesign den haag on 25 January, 2008

    Hee verry cool joost, again a great post!

  8. Jan-Willem Bobbink
    By Jan-Willem Bobbink on 27 January, 2008


    Thanks for the list! I’ve been using Google Blog Search already but I will try the others too!


  9. Drew Stauffer
    By Drew Stauffer on 31 January, 2008

    When I first started blogging I was actually happy that some scrapers came by. I felt honored.

    Now a year later I have completely changed my mind set. I hate scrapers.

    Now that you’ve given us the tools to find these scrapers, is their anything bloggers can do to protect themselves?

  10. Dennis Goedegebuure
    By Dennis Goedegebuure on 31 January, 2008

    @Drew Stauffer

    There are a couple of things you can do.
    – When on WordPress: install the Feed plug in that puts an extra link in your feed back to your blog with your preferred anchor tag: Plugin
    – Just ask the scraper to stop. Do a Whois lookup, and send an email. this sometimes works.
    – Exclude the scraper from accessing your pages through robots.txt
    – Always put a copyright notice on your blog, so that if you would like to take legal actions, you have some arguments.

    These are just my thought.

    You can always report the scraper to the Search engines as well.


  11. Joost de Valk
    By Joost de Valk on 31 January, 2008

    @Drew of course you should use my rss footer plugin ;)

  12. Dennis Goedegebuure
    By Dennis Goedegebuure on 31 January, 2008


    wow, you’ve been busy. You have one of those plug-ins as well?

    @Drew, highly recommend the plug in of Joost! I will switch over tonight… :)

  13. Drew
    By Drew on 1 February, 2008


    Thanks soo much. I will definitely add that footer plugin tonight.

  14. dating mike
    By dating mike on 12 February, 2008

    Hi Joost,

    Its a great post! Previously i doesn,t know to find the blogs rss feed in this list, Explained a lot i will try myself..

  15. Dennison Uy - Graphic Designer
    By Dennison Uy - Graphic Designer on 12 February, 2008

    Hi Joost! Can you explain how to use Google blog search a little more? I went to and then clicked on Advanced Blog Search then I entered my blog URL under find posts > with all of the words. Is this correct? Also, I cannot seem to find the “dupe content filter”?

  16. Laura Godfrey
    By Laura Godfrey on 17 June, 2009

    Keeping an eye on your competitor is essential. If they have linked to certain blogs, social media websites, ask yourself why have they done this. Answer: to reinforce the keywords they are optimising. SEO is basically a guessing game – trying to achieve the best page rank and generate the most amount of traffic possible. Before implementing any keywords into your website, it is essential to complete research into your chosen market. The links mentioned above are great at providing data about backward links. Great post.