Google, WordPress and trackback URL's

I've been playing around with WP-Googlestats for some blogs, and I noticed something I really didn't think about until now. Googlebot, and other spiders, all spider the trackback URL's that are in the meta data for posts. But since it spiders that and it's not really sending a trackback, it get's redirected to the post again.

This is were, in my opinion, WordPress goes wrong, as that redirect is a 302 redirect. On line 65 of wp-trackback.php, it says the following:

[sourcecode language="php"]
wp_redirect(get_permalink($tb_id));
[/sourcecode]

So it uses the function wp_redirect to redirect you back to the original post. This function lives in wp-includes/pluggable.php, and by default, sends a 302 redirect. You can make it send a 301 redirect by simply changing the code to:

[sourcecode language="php"]
wp_redirect(get_permalink($tb_id),301);
[/sourcecode]

This way, this Googlebot crawl wasn't in vain, as it just spiders the original URL again. You can wonder whether that's to useful, so another solution might be to block Googlebot and other spiders from crawling it all together, by adding this to your robots.txt:

[sourcecode language="php"]
Disallow: */trackback/$
[/sourcecode]

That way, Google won't spider the URL at all. I'd rather have it spider the post again, so I just changed the wp-trackback.php file.

Related posts

  1. WordPress 301 redirect
  2. Fixed category subpage titles and a fix in permalink redirect plugin
  3. WordPress theming: pushing trackbacks down
  4. Changing your permalink structure
  5. Optimizing your WordPress titles

Want more WordPress tips?

You should subscribe to my WordPress Newsletter, as you'll get a whole lot more WordPress tips and tricks there! Also, subscribe to this blog right now with RSS, or daily or weekly emails!

16 Responses to “Google, WordPress and trackback URL's”

  • Devon Young says:

    I never knew * and $ could be used in the location path in a robots.txt file. That's a really cool tip.

  • regular expressions can be used yes, but be sure to test them well!!

  • Goede tip Joost, maar ik vraag me af wat de invloed zal zijn als je de aanpassing niet zal doorvoeren.

    Denk dat het niet erg veel uitmaakt, want dit zit al redelijk lang in WP en anders waren ze er zelf wel mee gekomen. Maar desonniettemin een leuke tip ;-)

  • Could you please comment in English here? :) I know you know I'm Dutch, but most of my readers aren't...

    WP has not been built with SEO in mind, although you'd sometimes think different when you look at it. What happens now when Google spiders the trackback is that it get's a 302, and thus might spider it again and again (haven't tested that)...

  • Apache Admin says:

    wow this is such a great little tip! Could you explain more about how you found out it was sending 302's instead of 301's?

    And the robots.txt tip is also good, though I just read a really good writeup about optimizing wordpress robots.txt

  • I used the live HTTP headers plugin in Firefox :)

  • Yahoo! seems to actually index those trackback URL's...

    All the more reason to fix it.

  • geniosity says:

    Great advice. I recently tried adding certain "directories" to my robots.txt and Google seemed to not like it (by dropping my referrals by 50%). Problem is, if I did a site: command on one of my blogs, the non-supplemental results were my "/feed" pages.

    Anyway, have you thought of logging your change as a defect/enhancement request for WordPress? I think it's probably a good idea (so the rest of us don't have to go messing around with our core files)... ;-)

  • Apache Admin says:

    Ya ok so you used live HTTP, I much prefer WireShark/Ethereal for doing that, I actually detailed how I use Wireshark to sniff HTTP to debug sites..

    How did you happen to notice that is what WordPress was doing?

  • Apache Admin: sorry for the late reply, you got in to the spam filter ;) I saw Yahoo! indexing those pages, so I knew it couldn't be a 301.

  • Apache Admin says:

    Oh ok.. I never use yahoo, can you point me in the right direction?

    I was thinking that you were doing some crazy debugging stuff.. I was like, "How the heck did this guy manage to keep track of all the packets for such a specific part of wordpress" :)

  • Apache Admin says:

    Oh by the way, I prevent googlebot from indexing my trackback urls at all with this in my robots.txt

    User-agent: Googlebot
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Allow: /wp-content/uploads/

  • Durk says:

    whats your opinion about the sidebar?, thats full of links and is on every page. Does double links makes google rank you lower?

  • AskApache says:

    Hey Joost whats going on here? You letting spam run rampant on your blog now or what?

  • @AskApache: Hehe no, fixed, thx for the call!

  • AskApache says:

    Glad to hear it.. I haven't been here for a few months and the site is looking completely different.. nice..

Hosting by:
Hosted by MediaTemple Grid Services