Find out who’s scraping you!

ScrapersSometimes you just need to know who is linking to a blog, or even better, your competitor’s blog, or who’s scraping it, and you’d actually like that data in a format you can use. I thought I’d make a list of the sources you can use to gather link-data from, outside of the “obvious” Yahoo SiteExplorer and Google Webmaster Tools.

  1. First of all, and most hardened SEO’s know this: Google Blog search. The link data in there is just fenomenal, as they’re not, like the normal Google link: command results, filtered. When you want a complete overview, make sure you disable the dupe content filter, that will show you all those pesky scrapers. If you need this data to use it in a tool, it offers RSS feeds too.
  2. Second: Technorati. If you click on a blog’s authority, you get to see it’s reactions, and you can also drag these out via the API.
  3. Blogpulse, you don’t even need to add link: here, you can just throw in a URL, just like the others, Blogpulse offers RSS feeds.
  4. IceRocket. Offers RSS feeds for the link: command queries as well.
  5. WASAlive, not too much data, but sometimes it offers some unique stuff. RSS feeds for the results as well.

Now you should also consider, when you’re querying those engines, to query for two things: the blog URL, and it’s feed URL. In my blog’s case: yoast.com and feeds.joostdevalk.nl. Especially scrapers will usually just link to your feed URL.

Now if you’ve read my post on the Technorati Authority booster, you know why I needed this myself…

Tags: ,


Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

18 Responses

  1. WillyBy Willy on 22 January, 2008

    Joost, I’ve found that the link: results in Google Blog search are extremely inconsistent for me. One day, some links show up, the next day it will be others, occasionally none will show up. Have you noticed this as well? Now, I don’t have too many links in there at this point, so maybe it’s more noticeable for me, but it’s quite strange.

  2. Joost de ValkBy Joost de Valk on 22 January, 2008

    Hmm Willy, I’ll have a look :)

  3. gadgetBy gadget on 23 January, 2008

    Joost – Great post, thank you.

    Ever since I saw you at the a4u Expo I’ve been reading your blog and slowly building my knowledge so thank you. I’ve also started to download your WP plugins to combat scrapping, spam etc.

    One thing that I just don’t get at the minute is what to apply the ‘nofollow’ tag to (I use your plugin)? Having read comments on Google’s webmaster pages, they seem to be saying ‘don’t worry too much about duplicate content on your site’. However, I accept your argument that one doesn’t want ‘everything’ indexed, especially the same post again and again when listed in several locations – home page, post page, category page, archive page etc.

    If it helps, my latest site is at http://www.freetickets.org.uk – appreciate any comments.

  4. Oliver TacoBy Oliver Taco on 23 January, 2008

    Nice!

    It is a real problem – and I’d love to see a (not too expensive) paid service to help with this.

    -OT

  5. Dennis GoedegebuureBy Dennis Goedegebuure on 24 January, 2008

    Joost,
    Did you hear anything back from Nathan yet?

    Cheers

    DG

  6. Joost de ValkBy Joost de Valk on 24 January, 2008

    @Dennis: yeah he’s solving it ;)

  7. webdesign den haagBy webdesign den haag on 25 January, 2008

    Hee verry cool joost, again a great post!

  8. Jan-Willem BobbinkBy Jan-Willem Bobbink on 27 January, 2008

    Joost,

    Thanks for the list! I’ve been using Google Blog Search already but I will try the others too!

    JW

  9. Drew StaufferBy Drew Stauffer on 31 January, 2008

    When I first started blogging I was actually happy that some scrapers came by. I felt honored.

    Now a year later I have completely changed my mind set. I hate scrapers.

    Now that you’ve given us the tools to find these scrapers, is their anything bloggers can do to protect themselves?

  10. Dennis GoedegebuureBy Dennis Goedegebuure on 31 January, 2008

    @Drew Stauffer

    There are a couple of things you can do.
    - When on WordPress: install the Feed plug in that puts an extra link in your feed back to your blog with your preferred anchor tag: Plugin
    - Just ask the scraper to stop. Do a Whois lookup, and send an email. this sometimes works.
    - Exclude the scraper from accessing your pages through robots.txt
    - Always put a copyright notice on your blog, so that if you would like to take legal actions, you have some arguments.

    These are just my thought.

    You can always report the scraper to the Search engines as well.

    Cheers
    DG

  11. Joost de ValkBy Joost de Valk on 31 January, 2008

    @Drew of course you should use my rss footer plugin ;)

  12. Dennis GoedegebuureBy Dennis Goedegebuure on 31 January, 2008

    @Joost,

    wow, you’ve been busy. You have one of those plug-ins as well?

    @Drew, highly recommend the plug in of Joost! I will switch over tonight… :)

  13. DrewBy Drew on 1 February, 2008

    @Joost,

    Thanks soo much. I will definitely add that footer plugin tonight.

  14. dating mikeBy dating mike on 12 February, 2008

    Hi Joost,

    Its a great post! Previously i doesn,t know to find the blogs rss feed in this list, Explained a lot i will try myself..

  15. Dennison Uy - Graphic DesignerBy Dennison Uy - Graphic Designer on 12 February, 2008

    Hi Joost! Can you explain how to use Google blog search a little more? I went to http://blogsearch.google.com/ and then clicked on Advanced Blog Search then I entered my blog URL under find posts > with all of the words. Is this correct? Also, I cannot seem to find the “dupe content filter”?

  16. Laura GodfreyBy Laura Godfrey on 17 June, 2009

    Keeping an eye on your competitor is essential. If they have linked to certain blogs, social media websites, ask yourself why have they done this. Answer: to reinforce the keywords they are optimising. SEO is basically a guessing game – trying to achieve the best page rank and generate the most amount of traffic possible. Before implementing any keywords into your website, it is essential to complete research into your chosen market. The links mentioned above are great at providing data about backward links. Great post.

Trackbacks