How to prevent Google from indexing script output

If you have some scripts on your site like, for instance, my domain-info script, you often use GET parameters so people can link to them or bookmark them. This has a lot of benefits, but it creates a problem as well. Search engines will index this output, causing your server to use cycles for something else then a user, and possibly create duplicate content problems.

There are two ways to prevent this: block the output in robots.txt, or redirect the search engines. I’ve chosen to do the last, since the first would mean throwing away precious link equity. It’s fairly safe to do this based on a user-agent, as I doubt any search engine will “punish” you for this.

The following code checks whether the visitor is either googlebot, yahoo’s bot, which is called Slurp, or msnbot. When it finds any of these three bots, it 301 redirects to the main script URL.

if (isset($_GET['var']) &&
     ( !(strpos($_SERVER['HTTP_USER_AGENT'], 'Googlebot') === false) ||
     !(strpos($_SERVER['HTTP_USER_AGENT'], 'Yahoo! Slurp') === false) ||
     !(strpos($_SERVER['HTTP_USER_AGENT'], 'msnbot') === false) ) ) {
	header("HTTP/1.1 301 Moved Permanently");
	header("Location: $fullpath");
	exit;
}

This way you prevent your server from using it’s precious cycles for the search engines, and you preserve the link equity for your script!

Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

1 Responses

  1. milousBy milous on 17 April, 2008

    thanks you for this tutorial
    this is list of user agent used in my web site
    http://www.al-la3eb.com
    $googlebot=strpos($_SERVER["HTTP_USER_AGENT"],”Googlebot”);
    $Yahoo_Slurp=strpos($_SERVER["HTTP_USER_AGENT"],”Yahoo! Slurp”);
    $VoilaBot=strpos($_SERVER["HTTP_USER_AGENT"],”VoilaBot” );
    $ZyBorg=strpos($_SERVER["HTTP_USER_AGENT"],”ZyBorg” );
    $WebCrawler=strpos($_SERVER["HTTP_USER_AGENT"],”WebCrawler” );
    $DeepIndex=strpos($_SERVER["HTTP_USER_AGENT"],”DeepIndex” );
    $Teoma=strpos($_SERVER["HTTP_USER_AGENT"],”Teoma” );
    $appie=strpos($_SERVER["HTTP_USER_AGENT"],”appie” );
    $Gigabot=strpos($_SERVER["HTTP_USER_AGENT"],”Gigabot” );
    $HenriLeRobotMirago=strpos($_SERVER["HTTP_USER_AGENT"],”HenriLeRobotMirago” );
    $psbot=strpos($_SERVER["HTTP_USER_AGENT"],”psbot” );
    $Szukacz=strpos($_SERVER["HTTP_USER_AGENT"],”Szukacz” );
    $Openbot=strpos($_SERVER["HTTP_USER_AGENT"],”Openbot” );
    $Naver=strpos($_SERVER["HTTP_USER_AGENT"],”Naver” );
    $msnbot=strpos($_SERVER["HTTP_USER_AGENT"],”msnbot” );

    see you ^!^