I’ve been playing around with WP-Googlestats for some blogs, and I noticed something I really didn’t think about until now. Googlebot, and other spiders, all spider the trackback URL’s that are in the meta data for posts. But since it spiders that and it’s not really sending a trackback, it get’s redirected to the post again.
This is were, in my opinion, WordPress goes wrong, as that redirect is a 302 redirect. On line 65 of wp-trackback.php, it says the following:
wp_redirect(get_permalink($tb_id));
So it uses the function wp_redirect to redirect you back to the original post. This function lives in wp-includes/pluggable.php, and by default, sends a 302 redirect. You can make it send a 301 redirect by simply changing the code to:
wp_redirect(get_permalink($tb_id),301);
This way, this Googlebot crawl wasn’t in vain, as it just spiders the original URL again. You can wonder whether that’s to useful, so another solution might be to block Googlebot and other spiders from crawling it all together, by adding this to your robots.txt:
Disallow: */trackback/$
That way, Google won’t spider the URL at all. I’d rather have it spider the post again, so I just changed the wp-trackback.php file.

I never knew * and $ could be used in the location path in a robots.txt file. That’s a really cool tip.
regular expressions can be used yes, but be sure to test them well!!
Goede tip Joost, maar ik vraag me af wat de invloed zal zijn als je de aanpassing niet zal doorvoeren.
Denk dat het niet erg veel uitmaakt, want dit zit al redelijk lang in WP en anders waren ze er zelf wel mee gekomen. Maar desonniettemin een leuke tip ;-)
Could you please comment in English here? :) I know you know I’m Dutch, but most of my readers aren’t…
WP has not been built with SEO in mind, although you’d sometimes think different when you look at it. What happens now when Google spiders the trackback is that it get’s a 302, and thus might spider it again and again (haven’t tested that)…
wow this is such a great little tip! Could you explain more about how you found out it was sending 302′s instead of 301′s?
And the robots.txt tip is also good, though I just read a really good writeup about optimizing wordpress robots.txt
I used the live HTTP headers plugin in Firefox :)
Yahoo! seems to actually index those trackback URL’s…
All the more reason to fix it.
Great advice. I recently tried adding certain “directories” to my robots.txt and Google seemed to not like it (by dropping my referrals by 50%). Problem is, if I did a site: command on one of my blogs, the non-supplemental results were my “/feed” pages.
Anyway, have you thought of logging your change as a defect/enhancement request for WordPress? I think it’s probably a good idea (so the rest of us don’t have to go messing around with our core files)… ;-)
Ya ok so you used live HTTP, I much prefer WireShark/Ethereal for doing that, I actually detailed how I use Wireshark to sniff HTTP to debug sites..
How did you happen to notice that is what WordPress was doing?
Apache Admin: sorry for the late reply, you got in to the spam filter ;) I saw Yahoo! indexing those pages, so I knew it couldn’t be a 301.
Oh ok.. I never use yahoo, can you point me in the right direction?
I was thinking that you were doing some crazy debugging stuff.. I was like, “How the heck did this guy manage to keep track of all the packets for such a specific part of wordpress” :)
Oh by the way, I prevent googlebot from indexing my trackback urls at all with this in my robots.txt
User-agent: Googlebot
Disallow: */trackback*
Disallow: /*?*
Disallow: /z/
Disallow: /wp-*
Allow: /wp-content/uploads/
whats your opinion about the sidebar?, thats full of links and is on every page. Does double links makes google rank you lower?
Hey Joost whats going on here? You letting spam run rampant on your blog now or what?
@AskApache: Hehe no, fixed, thx for the call!
Glad to hear it.. I haven’t been here for a few months and the site is looking completely different.. nice..
>>Disallow: */trackback*
This usefull?