Google, WordPress and trackback URL’s

I’ve been playing around with WP-Googlestats for some blogs, and I noticed something I really didn’t think about until now. Googlebot, and other spiders, all spider the trackback URL’s that are in the meta data for posts. But since it spiders that and it’s not really sending a trackback, it get’s redirected to the post again.

This is were, in my opinion, WordPress goes wrong, as that redirect is a 302 redirect. On line 65 of wp-trackback.php, it says the following:

wp_redirect(get_permalink($tb_id));

So it uses the function wp_redirect to redirect you back to the original post. This function lives in wp-includes/pluggable.php, and by default, sends a 302 redirect. You can make it send a 301 redirect by simply changing the code to:

wp_redirect(get_permalink($tb_id),301);

This way, this Googlebot crawl wasn’t in vain, as it just spiders the original URL again. You can wonder whether that’s to useful, so another solution might be to block Googlebot and other spiders from crawling it all together, by adding this to your robots.txt:

Disallow: */trackback/$

That way, Google won’t spider the URL at all. I’d rather have it spider the post again, so I just changed the wp-trackback.php file.

Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

20 Responses

  1. Devon YoungBy Devon Young on 1 April, 2007

    I never knew * and $ could be used in the location path in a robots.txt file. That’s a really cool tip.

  2. Joost de ValkBy Joost de Valk on 1 April, 2007

    regular expressions can be used yes, but be sure to test them well!!

  3. Seo HandleidingBy Seo Handleiding on 1 April, 2007

    Goede tip Joost, maar ik vraag me af wat de invloed zal zijn als je de aanpassing niet zal doorvoeren.

    Denk dat het niet erg veel uitmaakt, want dit zit al redelijk lang in WP en anders waren ze er zelf wel mee gekomen. Maar desonniettemin een leuke tip ;-)

  4. Joost de ValkBy Joost de Valk on 1 April, 2007

    Could you please comment in English here? :) I know you know I’m Dutch, but most of my readers aren’t…

    WP has not been built with SEO in mind, although you’d sometimes think different when you look at it. What happens now when Google spiders the trackback is that it get’s a 302, and thus might spider it again and again (haven’t tested that)…

  5. Apache AdminBy Apache Admin on 2 April, 2007

    wow this is such a great little tip! Could you explain more about how you found out it was sending 302′s instead of 301′s?

    And the robots.txt tip is also good, though I just read a really good writeup about optimizing wordpress robots.txt

  6. Joost de ValkBy Joost de Valk on 2 April, 2007

    I used the live HTTP headers plugin in Firefox :)

  7. Joost de ValkBy Joost de Valk on 2 April, 2007

    Yahoo! seems to actually index those trackback URL’s

    All the more reason to fix it.

  8. geniosityBy geniosity on 5 April, 2007

    Great advice. I recently tried adding certain “directories” to my robots.txt and Google seemed to not like it (by dropping my referrals by 50%). Problem is, if I did a site: command on one of my blogs, the non-supplemental results were my “/feed” pages.

    Anyway, have you thought of logging your change as a defect/enhancement request for WordPress? I think it’s probably a good idea (so the rest of us don’t have to go messing around with our core files)… ;-)

  9. Apache AdminBy Apache Admin on 21 May, 2007

    Ya ok so you used live HTTP, I much prefer WireShark/Ethereal for doing that, I actually detailed how I use Wireshark to sniff HTTP to debug sites..

    How did you happen to notice that is what WordPress was doing?

  10. Joost de ValkBy Joost de Valk on 24 May, 2007

    Apache Admin: sorry for the late reply, you got in to the spam filter ;) I saw Yahoo! indexing those pages, so I knew it couldn’t be a 301.

  11. Apache AdminBy Apache Admin on 25 May, 2007

    Oh ok.. I never use yahoo, can you point me in the right direction?

    I was thinking that you were doing some crazy debugging stuff.. I was like, “How the heck did this guy manage to keep track of all the packets for such a specific part of wordpress” :)

  12. Apache AdminBy Apache Admin on 25 May, 2007

    Oh by the way, I prevent googlebot from indexing my trackback urls at all with this in my robots.txt

    User-agent: Googlebot
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/

  13. DurkBy Durk on 14 August, 2008

    whats your opinion about the sidebar?, thats full of links and is on every page. Does double links makes google rank you lower?

  14. AskApacheBy AskApache on 20 September, 2008

    Hey Joost whats going on here? You letting spam run rampant on your blog now or what?

  15. Joost de ValkBy Joost de Valk on 20 September, 2008

    @AskApache: Hehe no, fixed, thx for the call!

  16. AskApacheBy AskApache on 20 September, 2008

    Glad to hear it.. I haven’t been here for a few months and the site is looking completely different.. nice..

  17. HelveticaBy Helvetica on 16 May, 2009

    >>Disallow: */trackback*
    This usefull?

Trackbacks

  1. SEO: Inbound Links Without Asking Or Spamming…

    You have started your Search Engine Optimization education and have learned that inbound links count as *votes* for your site when Google ranks you website. Come on, everyone knows this right?
    So, how do you get those coveted inbound links?
    You can s…