Feeds in the search results?

There was some discussion on one of the biggest Dutch online marketing sites as to whether RSS feeds were any good for your SEO. I gave a quite lengthy reaction to that in Dutch which I wanted to share with all of you as well.

My opinion is that the fact that feeds are showing up in the search results is a bug, and it’s completely not user friendly at all. In my opinion they do lead to duplicate content problems too, so there’s really only one thing you can do, and that’s block ‘em. But I’d like to block those feeds without loosing the nice side effect of their links going into my posts.

The problem with using a noindex tag in your feed’s head section is that the combination noindex,follow does not seem to be supported in that at the moment. I’ve been trying to get confirmation on that but have had no luck so far. The other problem is that you’ll have to adapt the platform that creates your RSS feeds for you for that and that might be troublesome.

Introducing the Robots-X-Tag HTTP header

Google has said though, that you can use “any supported META tag” as a value for the X-Robots-Tag HTTP header. With that HTTP header you should be able to control whether a file can be indexed, has to have a snippet, etc. So you should also be able to add a noindex,follow HTTP header to your feed, indicating to Google that they should just follow the links in the feed, and not index the feed itself. You could arrange that from within your server config, which would look something like this in Apache:

<Directory /feed/>
Header append X-Robots-Tag "noindex,follow"
</Directory>

Here’s somewhat more info on Header append.

If you’re afraid blocking indexation of your feed might cause you to loose traffic from Google Blogsearch and/or Technorati, it won’t. Google Blogsearch uses FeedFetcher, which doesn’t observe robots.txt, and neither does Technorati. They both seem to be under the impression that pinging a blog search engine is enough consent to get it indexed, while others have suggested that pinging Technorati on behalf of others might be a nice way of improving your Technorati authority.

In the end, the X-Robots-Tag seems to be quite promising. There’s a catch though, FeedBurner does not support it yet at this moment, which makes it pretty hard for everyone, including me, serving their feed through FeedBurner.

Tags: ,


Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

15 Responses

  1. AshishBy Ashish on 3 October, 2007

    Nice info,
    I am sad also because my blog feeds are also indexed in google.
    Since implementing the above technique will be hard for me at the moment,I use robots.txt for blocking feed from getting indexed.
    i think that is much simpler than above technique.

  2. Joost de ValkBy Joost de Valk on 3 October, 2007

    @Ashish: you’re missing the point… You want to noindex, follow it so you can use the linkjuice and fast indexation.

    @André: The fact that a lot of feeds are solved with the wrong mimetype is caused by a lot of platforms. Users, and even most webmasters, don’t have a clue about mime-type’s, and shouldn’t need to have a clue either…

  3. André ScholtenBy André Scholten on 3 October, 2007

    Most RSS feeds are served with the wrong mime-type, if they are served as “application/rss+xml” Google should not index them. But right now most feeds are served as “text/html” or “text/plain”.

    I think setting that straight should be a more proper solution.

  4. AshishBy Ashish on 3 October, 2007

    @joost de valk: xml sitemaps of blogs are automatically created whenever a new post is made.and i have put a mention of sitemap in robots.txt. won’t that help google to index the post fast because google or any search engine will first hit robots.txt file and then it will check sitemap because its mentioned in robots.txt.

  5. Rick KlauBy Rick Klau on 3 October, 2007

    Joost – Thanks for raising this issue. I believe that Google has gotten better on this issue recently, and will continue to improve… our goal is to ensure that feeds absolutely do *not* create duplicate content issues under any circumstances.

    If you notice a feed showing up in a Google search result, feel free to shoot me a note at rklau@google.com – it’s always helpful to have specific examples.

    As for the X-Robots-Tag, I’ll look into how we might implement that at FeedBurner. Sounds like a good idea as a backup to the overall goal here to ensure that feeds don’t show up independently in search results.

    Regards,

    Rick Klau
    Google (former VP/publisher services at FeedBurner)

  6. Joost de ValkBy Joost de Valk on 3 October, 2007

    Hi Rick, I appreciate your comment! If I encounter feeds in Search results, I certainly will drop you a line :)

    Could you confirm to me that an X-Robots-Tag with “noindex,follow” would work, as in, that Googlebot would follow the links in that RSS feed?

  7. Joost de ValkBy Joost de Valk on 3 October, 2007

    Ow and Rick, I’ve edited out your email address to prevent you from being spammed, if you’d like to have it in there anyway, let me know :)

  8. Joost de ValkBy Joost de Valk on 3 October, 2007

    Rick asked me to put his email back, so I did :)

  9. David HopkinsBy David Hopkins on 4 October, 2007

    Thanks for the info. I have heard of these Robots-x-tags a few times now but not seem an example of one. They seem to be usefull for providing spiders more information on documents other than HTML.

    In regards to your feed being user friedly you could always use XSL to transform, but browser support here is not that good, particularly with specifying a DTD. I don’t know why anyone would want their feed to be indexed though?

  10. Rick KlauBy Rick Klau on 4 October, 2007

    @Andre – We serve feeds as the proper mime-type, and I’ve not seen many examples of servers serving up the wrong type… any examples we can look at?

    @Joost – Just to clarify, Blogsearch crawls as Googlebot, not Feedfetcher. If you disallow your feed in robots.txt, you won’t show up in Blogsearch. As for Feedfetcher, if Feedfetcher crawls a feed that is disallowed by robots.txt,
    we won’t make that feed publicly searchable or otherwise show it to people who don’t know the url.

  11. Rick KlauBy Rick Klau on 4 October, 2007

    @Joost – One last clarification: Blogsearch does not currently support the X-Robots-tag and FeedBurner doesn’t have a mechanism to insert it (we do, however, support the addition of ‘noindex’). I’m looking into how/when we might change this (both for FeedBurner and Blogsearch).

  12. Tad ChefBy Tad Chef on 7 October, 2007

    Damn Rick! You’re right. I’m not in Blogsearch anymore, at least not with my new posts, after I used robots.txt to stop Google from indexing the feed…
    Time to fix it. Thank you.

Trackbacks