<?xml version="1.0" encoding="utf-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Yoast &#187; robots.txt</title> <atom:link href="http://yoast.com/tag/robots-txt/feed/" rel="self" type="application/rss+xml" /><link>http://yoast.com</link> <description>Tweaking Websites</description> <lastBuildDate>Mon, 21 May 2012 18:33:54 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.4-beta4-20825</generator> <image><title>Yoast</title> <url>http://yoast.com/wp-content/themes/yoast-v2/images/yoast-logo-rss.png</url><link>http://yoast.com</link> <width>144</width> <height>103</height> <description>Tweaking Websites</description> </image> <item><title>WordPress robots.txt Example</title><link>http://yoast.com/example-robots-txt-wordpress/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=example-robots-txt-wordpress</link> <comments>http://yoast.com/example-robots-txt-wordpress/#comments</comments> <pubDate>Fri, 10 Feb 2012 12:47:51 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[WordPress]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[WordPress SEO]]></category><guid
isPermaLink="false">http://yoast.com/?p=45343</guid> <description><![CDATA[<p>Robots.txt is a way to tell a search engine which pages it's allowed to spider, to "see", and which pages it cannot "see". Because of that, robots.txt differs from meta name="robots" tags, which tell search engines on those individual pages, whether they can include them in their index or not. The difference is subtle, but [...]</p><p><a
href="http://yoast.com/example-robots-txt-wordpress/">WordPress robots.txt Example</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p><img
class="alignright size-full wp-image-45347" title="WordPress Robots.txt advice from Yoast" src="http://cdn.yoast.com/wp-content/uploads/2012/02/Yoast_02_Robot1.jpg" alt="WordPress Robots.txt advice from Yoast" width="177" height="220" />Robots.txt is a way to tell a search engine which pages it's allowed to spider, to "see", and which pages it cannot "see". Because of that, robots.txt differs from <code>meta name="robots"</code> tags, which tell search engines on those individual pages, whether they can include them in their index or not. The difference is subtle, but important. Because of that, the <a
href="http://codex.wordpress.org/Search_Engine_Optimization_for_WordPress#Robots.txt_Optimization">suggested robots.txt in the codex</a> is wrong. Let me explain:</p><p>Google sometimes lists URLs that it's not allowed to spider, because it's blocked by robots.txt, because a lot of links point to a URL. A good example of this is a search for [<a
href="https://www.google.com/search?q=rtl+nieuws&amp;pws=0">RTL Nieuws</a>] (disclosure: RTL is a client of mine). rtlnieuws.nl 301 redirects to the <a
href="http://www.rtl.nl/actueel/rtlnieuws/home/">news section of rtl.nl</a>. But... rtlnieuws.nl/robots.txt exists... And has the following content:</p><pre class="brush: plain; title: ; notranslate">User-agent: *
Disallow: /</pre><p>Because of that, the links towards rtlnieuws.nl don't count toward the news section on rtl.nl, and Google displays rtlnieuws.nl in the search results. This is unwanted behavior that we're trying to fix but for now it's a good example of what I wanted to explain. By <em>blocking</em> /wp-admin/ and /trackback/ in your robots.txt, you're not preventing them from showing up.</p><p>Unfortunately, recently the /wp-admin/ block was added to WordPress core, because of <a
href="http://core.trac.wordpress.org/ticket/18465">this Trac ticket</a>. In the discussion on that ticket, I've proposed another solution in <a
href="http://core.trac.wordpress.org/attachment/ticket/18465/noindex.patch">this patch</a>. This solution involves sending an X-Robots-Tag header, which is the HTTP header equivalent of a <code>meta name="robots"</code> tag. This <em>would</em> in fact remove all wp-admin directories from Google search results.</p><h2>WordPress Robots.txt blocking Search results and Feeds</h2><p>There are two other sections which are blocked in the suggested robots.txt, /*?, which blocks everything with a question mark and as such all search results, and */feed/, which blocks all feeds. The first is not a good idea because if someone were to link to your search results, you wouldn't benefit from those links.</p><p>A better solution would be to add a <code>&lt;meta name="robots" content="noindex, follow"&gt;</code> tag to those search results pages, as it would prevent the search results from rankings but would allow the link "juice" to flow through to the returned posts and pages. This is what my <a
href="http://yoast.com/wordpress/seo/">WordPress SEO plugin</a> does as soon as you enable it. It also does this for wp-admin and login and registration pages.</p><p>I'm aware that that is different from <a
href="http://support.google.com/webmasters/bin/answer.py?hl=en&amp;answer=35769">Google's guidelines</a> on this topic at the moment, which state:</p><blockquote><p>Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.</p></blockquote><p>I've reached out to Google to get clarification on whether they would say my solution is acceptable as well, or perhaps even better :) .</p><p>Blocking /feed/ is a bad idea because an RSS feed is actually a valid sitemap for Google. Blocking it would prevent Google from using that to find new content on your site. So, my suggested robots.txt for WordPress is actually a <em>lot</em> smaller than the Codex one. I only have this:</p><pre class="brush: plain; title: ; notranslate">User-Agent: *
Disallow: /wp-content/plugins/</pre><p>I block the plugins directory because some plugin developers have the annoying habit of adding index.php files to their plugin directories that link back to their websites. For <em>all </em>other parts of WordPress, there are better solutions for blocking.</p><h2>The other WordPress Robots.txt suggestions</h2><p>The other sections of the robots.txt as suggested are a bit old and no longer needed. Digg mirror is something for us old guys who remember when Digg used to send loads of traffic, Googlebot Image and Media Partner are still there but if you only have the above in your robots.txt you don't need specific lines for them in your WordPress robots.txt file.</p><p>&nbsp;</p><p><a
href="http://yoast.com/example-robots-txt-wordpress/">WordPress robots.txt Example</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/example-robots-txt-wordpress/feed/</wfw:commentRss> <slash:comments>36</slash:comments> <media:thumbnail url="http://cdn.yoast.com/wp-content/uploads/2012/02/Yoast_02_Robot1-125x125.jpg" /> <media:content url="http://cdn.yoast.com/wp-content/uploads/2012/02/Yoast_02_Robot1.jpg" medium="image"> <media:title type="html">WordPress Robots.txt advice from Yoast</media:title> <media:thumbnail url="http://cdn.yoast.com/wp-content/uploads/2012/02/Yoast_02_Robot1-125x125.jpg" /> </media:content> </item> <item><title>Web Designer Mag should fix its SEO</title><link>http://yoast.com/web-designer-mag-bad-seo/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=web-designer-mag-bad-seo</link> <comments>http://yoast.com/web-designer-mag-bad-seo/#comments</comments> <pubDate>Wed, 30 Dec 2009 20:06:26 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[WordPress]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[XML Sitemap]]></category><guid
isPermaLink="false">http://yoast.com/?p=1923</guid> <description><![CDATA[<p>Ok I just had to post this, as it's too funny. I got a Google Alert this afternoon for this post, which mentioned one of my plugins as being listed by Web Designer Magazine. So, I Googled them, as the post didn't link to them, and got this result: Notice something? I know I did: [...]</p><p><a
href="http://yoast.com/web-designer-mag-bad-seo/">Web Designer Mag should fix its SEO</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>Ok I just had to post this, as it's too funny. I got a Google Alert this afternoon for <a
href="http://kerrywebster.com/news/comment-from-matt-mullenweg-in-web-designer-magazine/">this post</a>, which mentioned one of my plugins as being listed by <a
href="http://www.webdesignermag.co.uk/">Web Designer Magazine</a>. So, I Googled them, as the post didn't link to them, and got this result:</p><p><a
class="thickbox" title="Google Result for Web Designer Mag" href="http://yoast.com/cdn-edge/uploads/2009/12/web-designer-magazine.jpg"><img
src="http://yoast.com/cdn-edge/uploads/2009/12/web-designer-magazine-300x103.jpg" alt="Web Designer Magazine results in Google" title="Web Designer Magazine" width="300" height="103" class="aligncenter size-medium wp-image-1924" /></a></p><p>Notice something? I know I did: there's no description there. So I clicked on and checked the page to see what could cause that, finding it was easy:</p><pre class="brush: xml; title: ; notranslate">&amp;lt;meta name='robots' content='noindex,nofollow' /&amp;gt;</pre><p>It was listed right above the <code>EditURI</code> line that WordPress puts in by default, telling me they forgot to uncheck a box (that get's checked by default on some auto installers, I've been told on Twitter):</p><p><a
class="thickbox" href="http://yoast.com/cdn-edge/uploads/2009/12/privacy-settings.jpg"><img
src="http://yoast.com/cdn-edge/uploads/2009/12/privacy-settings-300x114.jpg" alt="" title="privacy-settings" width="300" height="114" class="aligncenter size-medium wp-image-1925" /></a></p><p>One issue though... If Google had actually <em>seen</em> that tag, it wouldn't have listed the site <em>at all</em>. So there had to be something else. And of course, there is: meet <a
href="http://www.webdesignermag.co.uk/robots.txt">Webdesigner Magazine's robots.txt</a>:</p><pre class="brush: plain; title: ; notranslate">User-agent: *
Disallow: /

Sitemap: http://www.webdesignermag.co.uk/sitemap.xml.gz</pre><p>The first two lines there prevent Google from indexing the site entirely. But it just became funnier... The fourth line was added by a plugin they're running, pointing search engine bots at their XML sitemap. Yes, the same bots they just forbade entry to their entire site... If you check out their <a
href="http://www.webdesignermag.co.uk/sitemap.xml">XML sitemap</a>, you'd notice they're running Arne Brachold's <a
href="http://www.arnebrachhold.de/projects/wordpress-plugins/google-xml-sitemaps-generator/">Google XML Sitemap Generator</a>...</p><p>So: they installed an XML sitemap plugin, let's see what else they did: ah... Cool! They're running All In One SEO too! Now AIOSEO is "ok", except: it doesn't warn you for stupidities like these... Might be a darn good feature for an SEO plugin to check whether your site is actually allowing crawlers to come in, don't you think? Ah well, nothing can be perfect.</p><p>Anyway, this tells me one thing: Web Designer Mag needs help with their WordPress install. And they need it bad. If you work for them, and read this, feel free to contact me <a
href="http://twitter.com/yoast">on Twitter</a> or through the <a
href="http://yoast.com/contact/">contact form</a>. I'd be happy to help!</p><p><a
href="http://yoast.com/web-designer-mag-bad-seo/">Web Designer Mag should fix its SEO</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/web-designer-mag-bad-seo/feed/</wfw:commentRss> <slash:comments>57</slash:comments> <media:thumbnail url="http://cdn.yoast.com/wp-content/uploads/2009/12/web-designer-magazine.jpg" /> <media:content url="http://cdn.yoast.com/wp-content/uploads/2009/12/web-designer-magazine.jpg" medium="image"> <media:title type="html">Web Designer Magazine</media:title> </media:content> <media:content url="http://cdn.yoast.com/wp-content/uploads/2009/12/privacy-settings.jpg" medium="image"> <media:title type="html">privacy-settings</media:title> <media:thumbnail url="http://cdn2.yoast.com/wp-content/uploads/2009/12/privacy-settings-125x125.jpg" /> </media:content> </item> <item><title>Preventing your site from being indexed, the right way</title><link>http://yoast.com/prevent-site-being-indexed/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=prevent-site-being-indexed</link> <comments>http://yoast.com/prevent-site-being-indexed/#comments</comments> <pubDate>Thu, 17 Dec 2009 14:41:26 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[SEO]]></category> <category><![CDATA[Apache]]></category> <category><![CDATA[robots.txt]]></category><guid
isPermaLink="false">http://yoast.com/?p=1894</guid> <description><![CDATA[<p>It keeps amazing me that I keep seeing people use robots.txt files to prevent sites from being indexed and thus showing up in the search engines. You know why it keeps amazing me? Because robots.txt doesn't actually do the latter, even though it does prevent your site from being indexed. Let's go through some terms [...]</p><p><a
href="http://yoast.com/prevent-site-being-indexed/">Preventing your site from being indexed, the right way</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>It keeps amazing me that I keep seeing people use <em>robots.txt</em> files to prevent sites from being indexed and thus showing up in the search engines. You know why it keeps amazing me? Because <em>robots.txt</em> doesn't actually do the latter, even though it does prevent your site from being indexed.</p><p>Let's go through some terms here:</p><p><strong>Indexed / Indexing</strong><br
/> The process of downloading a site or a page's content to the server of the search engine, thereby adding it to it's "index".</p><p><strong>Ranking / Listing / Showing</strong><br
/> Showing a site in the search result pages (aka <abbr
title="Search Engine Result Pages">SERPs).</abbr></p><p>So, while the most common process goes from Indexing to Listing, a site <em>doesn't have to be indexed</em> to be listed. If a link points at a page, domain or wherever, that link will be followed. If the <em>robots.txt</em> on that domain prevents the search engine from indexing that page, it'll still show the URL in the results if it can gather from other variables that it might be worth looking at.</p><p>If my explanation above doesn't make sense, have a look at Matt Cutt's video explanation:<br
/><p><a
href="http://yoast.com/prevent-site-being-indexed/"><em>Click here to view the embedded video.</em></a></p></p><p>So, if you want to effectively hide pages from the search engines, and this might seem contradictory, you <em>need</em> them to index those pages. Why? Because when they index those pages, you can tell them not to List them. There's two ways of doing that: by using robots meta tags, like this (and I've got an article on <a
href="http://yoast.com/articles/robots-meta-tags/">robots meta tags</a> that's more extensive):</p><pre class="brush: xml; light: true; title: ; notranslate">&lt;meta name=&quot;robots&quot; content=&quot;noindex,nofollow&quot;/&gt;</pre><p>The issue with a tag like that is that you have to add it to each and every page. That's why the search engines came up with the <a
href="http://yoast.com/x-robots-tag-play/">X-Robots-Tag HTTP header</a>. This allows you to specify an HTTP header called <code>X-Robots-Tag</code>, and set the value as you would the meta robots tags value. The cool thing about this is that you can do it for an entire site. So, if your site is running on Apache, and mod_headers is enabled (it usually is), you could add the following single line to your <em>.htaccess</em> file:</p><pre class="brush: plain; light: true; title: ; notranslate">Header set X-Robots-Tag &quot;noindex, nofollow&quot;</pre><p>And it'd have the effect that that entire site <em>can</em> be indexed, but will never be shown in the search results. So, get rid of that robots.txt file with <code>Disallow: /</code> in it, and use the X-Robots-Tag instead!</p><p><a
href="http://yoast.com/prevent-site-being-indexed/">Preventing your site from being indexed, the right way</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/prevent-site-being-indexed/feed/</wfw:commentRss> <slash:comments>36</slash:comments> <media:content url="http://www.youtube-nocookie.com/v/KBdEwpRQRD0" duration="274"> <media:player url="http://www.youtube-nocookie.com/v/KBdEwpRQRD0" /> <media:title type="html">Preventing your site from being indexed, the right way &#8226; Yoast</media:title> <media:description type="html">It keeps amazing me that I keep seeing people use robots.txt files to prevent sites from being indexed and thus showing up in the search engines. You know why it keeps amazing me? Because robots.txt doesn&#039;t actually do the latter, even though it does prevent your site from being indexed. Let&#039;s go th</media:description> <media:thumbnail url="http://cdn.yoast.com/wp-content/uploads/2012/01/preventing-your-site-from-being-indexed-the-right-way-8226-yoast-300x225.jpg" /> <media:keywords>Apache,robots.txt</media:keywords> </media:content> </item> <item><title>PageRank sculpting &#8211; Siloing and more</title><link>http://yoast.com/pagerank-sculpting-siloing/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=pagerank-sculpting-siloing</link> <comments>http://yoast.com/pagerank-sculpting-siloing/#comments</comments> <pubDate>Sat, 22 Mar 2008 20:54:56 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[SEO]]></category> <category><![CDATA[Paid Links]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[SEO tools]]></category><guid
isPermaLink="false">http://www.joostdevalk.nl/pagerank-sculpting-siloing/</guid> <description><![CDATA[<p>PR sculpting seems to be "all the rage" at the moment. Tracking the conversation back seems to get me to an article by Dan Thies of september 4th last year, pointing back to an interview with Matt on SEOmoz. It's been a whole load of buzz lately, coming up at SES again a few times [...]</p><p><a
href="http://yoast.com/pagerank-sculpting-siloing/">PageRank sculpting &#8211; Siloing and more</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>PR sculpting seems to be "all the rage" at the moment. Tracking the conversation back seems to get me to an article by Dan Thies of september 4th last year, pointing back to <a
href="http://www.seomoz.org/blog/questions-answers-with-googles-spam-guru">an interview with Matt</a> on SEOmoz. It's been a whole load of buzz lately, coming up at SES again a few times too much, and I think it's about time we do a recap of what PageRank sculpting actually is, which principle it's based on, and how you should use it.</p><p><span
id="more-588"></span>In the interview that started the whole discussion again, Matt said:</p><blockquote><p>The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out), but nofollow on individual links is simpler for some folks to use. There's no stigma to using nofollow, even on your own internal links; for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level.</p></blockquote><p>This inspired Dan Thies to write <a
href="http://www.seofaststart.com/blog/internal-nofollow-help">his article</a>, which is, in my opinion, both very good and also a bit flawed. Dan says:<cite>"That's the key point. Getting more of your important pages indexed."</cite> And I simply, do not  agree, most of the other stuff is very true and valuable though. Let's get back to the basic theory of how you should create a site structure and theme it correctly: siloing.</p><h2>Siloing</h2><p
style="float: right;"><a
href="http://www.flickr.com/photos/36521976686@N01/8741933/"><img
src="http://farm1.static.flickr.com/6/8741933_7c7fd43cea_m.jpg" alt="" /></a><br
/> <small>photo credit: <a
title="twob" href="http://www.flickr.com/people/twob/">twob</a></small></p><p>One of the oldest articles I found on siloing is <a
href="http://www.bruceclay.com/newsletter/0505/silo.html">on Bruce Clay's site</a>, in an article from 2005.</p><p>The idea (and yes I'm oversimplifying a bit now) is that you only link to pages on your site with the same theme, to make it easier to rank for keywords and keyword groups. A quote:</p><blockquote><p>Siloing resolves this problem by allowing you to achieve high search engine placement both for general and targeted keyword phrases through themed vertical page linking and/or construction.</p></blockquote><p>This article talks about only linking to pages that you really <em>should</em> be linking to. Of course, this is still the best practice, and if you totally lived by that, you wouldn't need to nofollow any links. However if you, for any reason whatsoever (like management that doesn't get it, weird laws / lawyers, or conversion / up-selling reasons), have to link to another, unrelated, page, nofollow is the tool you could use to still abide by those siloing laws.</p><h2>Nofollow != untrusted</h2><p>In the beginning of this nofollow discussion, some people I really like and respect, like <a
href="http://www.gregboser.com/">Greg Boser</a>, <a
href="http://www.oilman.ca/">Todd Friesen</a> and <a
href="http://www.davidnaylor.co.uk/">Dave Naylor</a>, were saying that you should really not use nofollow internally, using the argument: "why would you want to tell search engines that you don't trust certain pages on your site?"</p><p>Well, it's not about trust (anymore), as Greg admitted in <a
href="http://videos.webpronews.com/2008/03/18/ses-new-york-2008-greg-boser/">an interview with Mike McDonald at SES</a> recently, now that Matt has come out and said that. Others are <a
rel="nofollow" href="http://seo-theory.com/wordpress/2007/11/26/seo-nonsense-sculpting-pagerank-builds-muscle/">still</a> <a
rel="nofollow" href="http://searchengineland.com/080306-083414.php">whining</a> though, in part I think because either they don't get it, or they don't think other people will get it and want people to focus on different things that are more important, like site structure. In part, I agree. PageRank sculpting like that is not something for the faint of heart, or the SEO rookie. It IS however a valuable tool when you actually know what you're doing and have a lot of juice to play around with.</p><h2>So what is it about then?</h2><p>As my buddy <a
href="http://www.wolf-howl.com/google/why-theres-nothing-wrong-with-sculpting-your-pagerank/">Michael Gray said</a>: why would Apple want to rank for [<a
rel="nofollow" href="http://www.google.com/search?q=contact+us">contact us</a>]? Doesn't that single ranking imply a wasted opportunity to rank for a few more products they actually <em>sell</em>?</p><p>A <a
href="http://www.davidnaylor.co.uk/nofollow-sculpting-my-take.html">quote from Matt</a> at DaveN's blog is very important here:</p><blockquote><p>Nofollowing your internals (PageRank sculpting - JdV) can affect your ranking in Google, but it's a 2nd order effect.</p></blockquote><p>Anyway, at Onetomarket we've been doing this for at least 4 years. We used javascript links before nofollow was around, and we <em><strong>know</strong></em> that it works because we <em>tested</em> it. It's <em>not specifically</em> about getting more pages indexed, it's about getting those pages indexed that matter to you, and about, as Greg also pointed out in the video interview above, getting as much pages indexed as your overall PageRank can handle. You don't want to "spread yourself too thin".</p><h2>PageRank sculpting is more then Nofollow</h2><p>You really need to know that at some point, you'll need more then nofollow. You'll need page per page control of the robots meta tag, and you will probably be using it to noindex, follow category / tag / archive pages, in favor of single pages. That, combined with not giving too much linklove to these pages, is the essence of <a
href="http://www.seo4fun.com/blog/2007/08/22/third-level-push-modified-siloing-for-deeper-index-penetration.html">third level push</a>.</p><p>It's also very important to be honest with yourself: "Do I really need this page / this set of pages in the index right now?" It might be wiser to make sure you have a smaller amount of pages in the index, throwing the deeper pages out and allowing yourself to actually rank with the pages higher up in your site's structure.</p><p>Those two uses of nofollow, together with the use in siloing of nofollowing links to unrelated pages, make it a very powerful tool. You can be a good SEO without using it, and a lot of times you can probably make more money by focussing on other things, but anyone saying that it's bad advice or nonsense, doesn't know what he or she is talking about, and should think twice before writing openly that people should not follow that advice.</p><p><a
href="http://yoast.com/pagerank-sculpting-siloing/">PageRank sculpting &#8211; Siloing and more</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/pagerank-sculpting-siloing/feed/</wfw:commentRss> <slash:comments>81</slash:comments> <media:thumbnail url="http://farm1.static.flickr.com/6/8741933_7c7fd43cea_m.jpg" /> <media:content url="http://farm1.static.flickr.com/6/8741933_7c7fd43cea_m.jpg" medium="image" /> </item> <item><title>Playing with the X-Robots-Tag HTTP header</title><link>http://yoast.com/x-robots-tag-play/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=x-robots-tag-play</link> <comments>http://yoast.com/x-robots-tag-play/#comments</comments> <pubDate>Sun, 20 Jan 2008 20:41:57 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[SEO]]></category> <category><![CDATA[Apache]]></category> <category><![CDATA[Microformats]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[SEO tools]]></category><guid
isPermaLink="false">http://www.joostdevalk.nl/x-robots-tag-play/</guid> <description><![CDATA[<p>Ever since the announcement on the Google Blog and more recently Yahoo's announcement that they've enhanced their support for it, I've been meaning to play with the X-Robots-Tag header. This HTTP header allows you to do what you'd normally do in a robots meta tag, in an HTTP header, which has some pretty cool appliances. [...]</p><p><a
href="http://yoast.com/x-robots-tag-play/">Playing with the X-Robots-Tag HTTP header</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>Ever since the <a
href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html">announcement on the Google Blog</a> and more recently Yahoo's announcement that they've <a
href="http://www.seroundtable.com/archives/015636.html">enhanced their support for it</a>, I've been meaning to play with the X-Robots-Tag header. This HTTP header allows you to do what you'd normally do in a <a
href="http://yoast.com/wordpress/meta-robots-wordpress-plugin/robots-meta-tags/">robots meta tag</a>, in an HTTP header, which has some pretty cool appliances. I'll show you a few cool things you can do with this, but first some theory. If you don't feel like that, skip to the <a
href="http://yoast.com/x-robots-tag-play/#examples">example uses of the X-Robots-Tag</a>.</p><p>As Sebastian explained in <a
href="http://www.seomoz.org/blog/robots-exclusion-protocol-101" title="Robots Exclusion Protocol 101">an excellent post on SEOmoz</a>, there are two different kinds of directives: crawler directives and indexer directives.</p><p><span
id="more-520"></span><br
/> <strong>Crawler directives</strong><br
/> The <code>robots.txt</code> file only contains the so called Crawler directives, telling search engines, identified by their <code>User-agent:</code>, where they are not allowed to go by using <code>Disallow:</code> and where they <em>can</em> (and should) go by using <code>Allow:</code>, and by pointing them at a <code>Sitemap:</code>.</p><p>As Sebastian pointed out and explains thoroughly in <a
href="http://sebastians-pamphlets.com/standardization-of-rep-tags-as-robots-txt-directives/">another brilliant post</a>, pages that search engines aren't allowed to spider, can still show up in the search results, when they have enough links pointing at them. This basically means that if you want to <em>really</em> <a
href="http://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines">hide something from the search engines</a> and thus from people using search, <code>robots.txt</code> just isn't good enough.</p><p><strong>Indexer directives</strong><br
/> Indexer directives are directives that are, even with the birth of the X-Robots-Tag, set on a per page or even per element basis. Up until July 2007, there were two: the microformat <a
href="http://microformats.org/wiki/rel-nofollow">rel="nofollow"</a>, which means that that link should not pass authority / PageRank, and the <a
href="http://yoast.com/wordpress/meta-robots-wordpress-plugin/robots-meta-tags/">Meta Robots tag</a>.</p><p>With the Meta Robots tag, you can <em>really</em> prevent search engines from showing the pages you block in the search results. You can reach the same with the relatively new X-Robots-Tag HTTP header. If you don't know what an HTTP header is, I'd suggest reading the <a
href="http://en.wikipedia.org/wiki/HTTP">Wikipedia page on it</a>, but in short: look at it as the envelope around your content. This HTTP header is better than the meta robots tag for a couple of reasons, one of them is that you can send those headers for other documents too. So, let's get into some examples.</p><h2><a
title="examples" name="examples"></a>Example uses of the X-Robots-Tag</h2><p>If you want to prevent search engines from showing files you've generated with PHP, add the following in the header file:</p><pre class="brush: plain; title: ; notranslate">header(&quot;X-Robots-Tag: noindex&quot;, true);</pre><p>This would not prevent search engines from following the links on those pages, if you want to do that, do the following:</p><pre class="brush: plain; title: ; notranslate">header(&quot;X-Robots-Tag: noindex, nofollow&quot;, true);</pre><p>But doing it in PHP is probably not the easiest use for this kind of thing. I myself greatly prefer setting headers in Apache, when possible. Consider, for instance, preventing search engines from caching / showing a preview for all .doc files on your domain, you would only have to do the following:</p><pre class="brush: plain; title: ; notranslate">&amp;lt;FilesMatch &quot;\.doc$&quot;&amp;gt;
Header set X-Robots-Tag &quot;index, noarchive, nosnippet&quot;
&amp;lt;/Files&amp;gt;</pre><p>Or, if you'd want to do this for both .doc and .pdf files:</p><pre class="brush: plain; title: ; notranslate">&amp;lt;FilesMatch &quot;\.(doc|pdf)$&quot;&amp;gt;
Header set X-Robots-Tag &quot;index, noarchive, nosnippet&quot;
&amp;lt;/Files&amp;gt;</pre><p>Or another case, your <code>robots.txt</code> file itself is <a
href="http://www.google.com/search?hl=en&amp;q=allinurl%3A%22%2Frobots.txt%22" rel="nofollow">showing up in the search results</a>. Adding this to your Apache config or your <code>.htaccess</code> file would solve that:</p><pre class="brush: plain; title: ; notranslate">&amp;lt;FilesMatch &quot;robots\.txt&quot;&amp;gt;
Header set X-Robots-Tag &quot;noindex&quot;
&amp;lt;/FilesMatch&amp;gt;</pre><p>I had a slight uncomfortable feeling when writing this down, so I e-mailed <a
href="http://www.mattcutts.com/blog/">Matt Cutts</a> asking the following: "&lt;snip&gt; would that mean that you will still fetch it for robots.txt purposes, but won't show it in the index?". I'm waiting for him to answer, and will add his response here once I have it.</p><p><strong>Tools</strong><br
/> I've quickly created a <a
href="javascript:(function(){function%20read(url){var%20r=new%20XMLHttpRequest();r.open('HEAD',url,false);r.send(null);return%20r.getAllResponseHeaders();}alert(read(window.location))})();">bookmarklet</a> which shows all the headers for a page (works in Moz browsers only I think, and a <a
href="http://cdn.yoast.com/wp-content/uploads/2008/01/headerdetector.user.js" target="_blank" title="X-Robots-Tag HeaderDetector">Greasemonkey script</a> which pops up when a page is using an X-Robots-Tag header.</p><p><strong>Conclusion</strong><br
/> As you can see, if you combine the examples above with the stuff you can learn from for instance <a
href="http://www.askapache.com/htaccess/using-filesmatch-and-files-in-htaccess.html">AskApache's .htaccess tutorial</a>, the X-Robots-Tag HTTP header becomes a very powerful tool. Use it wisely and with caution, as you won't be the first to block your entire site by accident, but it's a great addition to your toolset if you know how to use it.</p><p><a
href="http://yoast.com/x-robots-tag-play/">Playing with the X-Robots-Tag HTTP header</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/x-robots-tag-play/feed/</wfw:commentRss> <slash:comments>31</slash:comments> </item> <item><title>Extreme SEO Q&amp;A</title><link>http://yoast.com/extreme-seo-qa/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=extreme-seo-qa</link> <comments>http://yoast.com/extreme-seo-qa/#comments</comments> <pubDate>Fri, 26 Oct 2007 09:07:50 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[SEO]]></category> <category><![CDATA[Conferences]]></category> <category><![CDATA[Paid Links]]></category> <category><![CDATA[robots.txt]]></category><guid
isPermaLink="false">http://www.joostdevalk.nl/extreme-seo-qa/</guid> <description><![CDATA[<p>So yesterday I was on the panel for the last session of the day: an Extreme SEO Q&#38;A, moderated by Ciaran Norris, and on that panel with me were Dixon Jones, Marcus Tandler and Jason Duke. We had a lot of fun, and loads of good questions came up. Some of the topics we touched [...]</p><p><a
href="http://yoast.com/extreme-seo-qa/">Extreme SEO Q&#038;A</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>So yesterday I was on the panel for the last session of the day: an Extreme SEO Q&amp;A, moderated by <a
href="http://www.altogetherdigital.com/author/ciaran/">Ciaran Norris</a>, and on that panel with me were Dixon Jones, Marcus Tandler and Jason Duke. We had a lot of fun, and loads of good questions came up. Some of the topics we touched on:</p><p><strong>Buying links or traffic?</strong><br
/> We had an interesting discussion on paid links, and whether we'd buy links or not to start a site. Dixon said he looked at it as buying traffic, not pagerank, which is probably a smart way of looking at it.</p><p><strong>What to do with affiliate links</strong><br
/> Either <a
href="http://yoast.com/affiliate-links-cloak-them/">cloak them</a>, or use a redirect script on your site, which you're blocking for bots through robots.txt. To be absolutely certain, make sure that if an SE bot opens that script anyway, it redirects into your own site and <em>not</em> to the merchant.</p><p><strong>Hosting</strong><br
/> If you want to rank in the UK (or in Germany or Spain for that matter) host the pages you want to rank within the country you're aiming for. Just a .co.uk isn't good enough, you <em>have</em> to host it in the UK as well.</p><p><strong>"I got some bad links and my site's rankings went down"<br
/> </strong>This was an interesting question from someone from the public, who said his rankings went down after getting links from some trade organizations. We were all highly doubting that that was his problem, until I found a link directory hiding on his site, which had nothing to do with the subject of the rest of his site. That directory went up round about the same time as he got those links. We advised him to remove that immediatly.</p><p><strong>Session IDÂ´s</strong><br
/> We had some questions about sessionid's and of course we all agreed: get rid of those, and <a
href="http://yoast.com/how-to-get-rid-of-phpsessid-in-the-url-and-redirect/">301 redirect those indexed out</a>.</p><p><strong>Starting with a freshly registered domain<br
/> </strong>The three other guys where quite open about this: don't. Buy an old site, and use it, or 301 redirect it into your new domain.</p><p>I probably forgot quite a bit, but as you can see we touched on loads of stuff, and it was really loads of fun.</p><p><a
href="http://yoast.com/extreme-seo-qa/">Extreme SEO Q&#038;A</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/extreme-seo-qa/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Feeds in the search results?</title><link>http://yoast.com/feeds-in-the-search-results/#utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=feeds-in-the-search-results</link> <comments>http://yoast.com/feeds-in-the-search-results/#comments</comments> <pubDate>Tue, 02 Oct 2007 22:05:58 +0000</pubDate> <dc:creator>Joost de Valk</dc:creator> <category><![CDATA[SEO]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[RSS]]></category><guid
isPermaLink="false">http://www.joostdevalk.nl/feeds-in-the-search-results/</guid> <description><![CDATA[<p>There was some discussion on one of the biggest Dutch online marketing sites as to whether RSS feeds were any good for your SEO. I gave a quite lengthy reaction to that in Dutch which I wanted to share with all of you as well. My opinion is that the fact that feeds are showing [...]</p><p><a
href="http://yoast.com/feeds-in-the-search-results/">Feeds in the search results?</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></description> <content:encoded><![CDATA[<p>There was <a
href="http://www.marketingfacts.nl/berichten/20071002_rss_goed_voor_zoekmachine_marketing_of_toch_niet/">some discussion</a> on one of the biggest Dutch online marketing sites as to whether RSS feeds were any good for your SEO. I gave a quite lengthy reaction to that in Dutch which I wanted to share with all of you as well.</p><p>My opinion is that the fact that feeds are showing up in the search results is a bug, and it's completely not user friendly at all. In my opinion they do lead to duplicate content problems too, so there's really only one thing you can do, and that's block 'em. But I'd like to block those feeds without loosing the nice side effect of their links going into my posts.</p><p>The problem with using a <code>noindex</code> tag in your feed's <code>head</code> section is that the combination <code>noindex,follow</code> does not seem to be supported in that at the moment. I've been trying to get confirmation on that but have had no luck so far. The other problem is that you'll have to adapt the platform that creates your RSS feeds for you for that and that might be troublesome.</p><h2>Introducing the <code>Robots-X-Tag</code> HTTP header</h2><p>Google has <a
href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html">said</a> though, that you can use "any supported <code>META</code> tag" as a value for the <code>X-Robots-Tag</code> HTTP header. With that HTTP header you should be able to control whether a file can be indexed, has to have a snippet, etc. So you should also be able to add a <code>noindex,follow</code> HTTP header to your feed, indicating to Google that they should just follow the links <em>in</em> the feed, and not index the feed itself. You could arrange that from within your server config, which would look something like this in Apache:</p><pre class="brush: plain; title: ; notranslate">&amp;lt;Directory /feed/&amp;gt;
Header append X-Robots-Tag &quot;noindex,follow&quot;
&amp;lt;/Directory&amp;gt;</pre><p>Here's somewhat <a
href="http://httpd.apache.org/docs/2.0/mod/mod_headers.html#header">more info on <code>Header append</code></a><code>.</code></p><p>If you're afraid blocking indexation of your feed might cause you to loose traffic from Google Blogsearch and/or Technorati, it won't. Google Blogsearch uses FeedFetcher, which <a
href="http://www.google.com/support/webmasters/bin/answer.py?answer=33583&amp;topic=8460">doesn't observe robots.txt</a>, and <a
href="http://www.niallkennedy.com/blog/archives/2005/01/dear_technorati.html">neither does Technorati</a>. They both seem to be under the impression that pinging a blog search engine is enough consent to get it indexed, while others have suggested that <a
href="http://www.micropersuasion.com/2006/01/its_unethical_t.html">pinging Technorati on behalf of others</a> might be a nice way of improving your Technorati authority.</p><p>In the end, the <code>X-Robots-Tag</code> seems to be quite promising. There's a catch though, <a
href="http://www.feedburner.com/">FeedBurner</a> does not support it yet at this moment, which makes it pretty hard for everyone, including me, serving their feed through FeedBurner.</p><p><a
href="http://yoast.com/feeds-in-the-search-results/">Feeds in the search results?</a> is a post by <a
rel="author" href="http://yoast.com/author/joost/">Joost de Valk</a> on <a
href="http://yoast.com">Yoast - Tweaking Websites</a>.A good WordPress blog needs good hosting, you don't want your blog to be slow, or, even worse, down, do you? Check out my thoughts on <a
href="http://yoast.com/wordpress-hosting/">WordPress hosting</a>!</p>]]></content:encoded> <wfw:commentRss>http://yoast.com/feeds-in-the-search-results/feed/</wfw:commentRss> <slash:comments>15</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using apc
Page Caching using apc
Database Caching 4/18 queries in 0.004 seconds using apc
Object Caching 1608/1626 objects using apc
Content Delivery Network via cdn.yoast.com

Served from: yoast.com @ 2012-05-23 17:28:21 -->
