Recently, while working on the site for my father in law (in Dutch), I wanted to create an XML sitemap for the many publications on his site, that are downloadable PDF’s. I regularly add PDF’s to his site too, and since I’m a tad bit lazy I don’t want to keep updating the XML sitemap. So I wrote a small XML Sitemap PHP script, that looks for all the files in a directory of a certain type, grabs their last modified them, and throws them in an XML sitemap.
Then, when working on another site yesterday, a Base64 encoding and Base64 decoding experiment, I needed an XML sitemap yet again. So I improved up the XML Sitemap PHP Script a bit further and decided it should be released.
Why make an XML Sitemap for Static Files?
Let’s first address the “why” of this script: in lots of cases, you’ll have static files, either they’re PDF’s, or static PHP or HTML files that create a site. I want all of those in an XML sitemap for two reasons:
- to tell Google that they’re there;
- to be able to see in Google Webmaster Tools whether they’re all indexed.
This script assumes that all those files are in one directory. I know that’s a bit “lazy”, but if your site spans a lot of directories you probably should be using a CMS.
Configuring the XML Sitemap PHP script
Of course this script needs a bit of configuration before it’ll work well. It has the following constant & variables:
The directory to search for files in.
The URL to the Sitemaps directory
Whether or not the script should parse recursively.
An array of all the file types that you wish to include in your XML sitemap.
An array of all the files that should be replace with other URL’s, useful to, for instance, replace ‘index.php’ with an empty string, so it’ll look like just example.com/
An array of all the files to ignore in the XML sitemap, useful for your config.php, for instance
A relative path to the XSL file included in the script from the SITEMAP_DIR_URL location.
The change frequency for files, can be ‘hourly’, ‘daily’, ‘monthly’ or ‘never’.
The priority, a value between 0 and 1, since you can’t differentiate between files, there’s no big harm in setting them all to 1.