XML Sitemap PHP script
Recently, while working on the site for my father in law (in Dutch), I wanted to create an XML sitemap for the many publications on his site, that are downloadable PDF’s. I regularly add PDF’s to his site too, and since I’m a tad bit lazy I don’t want to keep updating the XML sitemap. So I wrote a small XML Sitemap PHP script, that looks for all the files in a directory of a certain type, grabs their last modified them, and throws them in an XML sitemap.
Then, when working on another site yesterday, a Base64 encoding and Base64 decoding experiment, I needed an XML sitemap yet again. So I improved up the XML Sitemap PHP Script a bit further and decided it should be released.
Why make an XML Sitemap for Static Files?
Let’s first address the “why” of this script: in lots of cases, you’ll have static files, either they’re PDF’s, or static PHP or HTML files that create a site. I want all of those in an XML sitemap for two reasons:
- to tell Google that they’re there;
- to be able to see in Google Webmaster Tools whether they’re all indexed.
This script assumes that all those files are in one directory. I know that’s a bit “lazy”, but if your site spans a lot of directories you probably should be using a CMS.
Configuring the XML Sitemap PHP script
Of course this script needs a bit of configuration before it’ll work well. It has the following constant & variables:
- SITEMAP_DIR
The directory to search for files in. - SITEMAP_DIR_URL
The URL to the Sitemaps directory - RECURSIVE
Whether or not the script should parse recursively. - $filetypes
An array of all the file types that you wish to include in your XML sitemap. - $replace
An array of all the files that should be replace with other URL’s, useful to, for instance, replace ‘index.php’ with an empty string, so it’ll look like just example.com/ - $ignore
An array of all the files to ignore in the XML sitemap, useful for your config.php, for instance - $xsl
A relative path to the XSL file included in the script from the SITEMAP_DIR_URL location. - $chfreq
The change frequency for files, can be ‘hourly’, ‘daily’, ‘monthly’ or ‘never’. - $prio
The priority, a value between 0 and 1, since you can’t differentiate between files, there’s no big harm in setting them all to 1.
Styling the output of our XML Sitemap PHP Script
Of course, we’ll want our XML Sitemap to look good, as well as work well. For that we use an XSL stylesheet which is included in the download. It makes the XML sitemap look like this:
Download XML Sitemap PHP Script
I’ve added the whole script on Github, so you can play, fork, etc. Or you could just download the zip.
Update 2012-09-29: I’ve updated the script to work recursively and fixed a few minor issues.
Coming up next!
-
Event
WordCamp Netherlands 2024
November 29 - 30, 2024 Team Yoast is at Sponsoring WordCamp Netherlands 2024! Click through to see who will be there, what we will do, and more! See where you can find us next » -
SEO webinar
The SEO update by Yoast - October & November 2024 Edition
26 November 2024 Get expert analysis on the latest SEO news developments with Carolyn Shelby and Alex Moss. Join our upcoming update! 📺️ All Yoast SEO webinars »
Is there any way that we can rewrite our url in the sitemap, as we use Amazon S3 and CloudFront plugin that uploads our images to Amazon, so an image src becomes from:
h**p://my-domain/blog/wp-content/uploads/2014/10/image.jpg
to:
h**ps://s3.amazonaws.com/mybucket/blog/wp-content/uploads/2014/10/image.jpg
This is such synchronicity!
My co-worker recently had to work on a very old website, and needed to redirect all of the 100+ page links to the new site’s equivalents through htaccess. He did not want to do this manually, which involved finding each page, inside folders, etc. He told me he was working on a script similar to yours. I will pass this along and maybe he will be virtually hugging you if it does help him!
Merci!
Nice little utility, Joost. I’m using Arne Brachhold’s Sitemaps generator plugin on learncomputer.com. Will try to poke around to see if I can wedge this script into it. Should be a fun little project! Thanks!
This is interesting. Aren’t there utilities on the web that will build XML sitemaps from static sites for you?
I remember using them prior to moving to WordPress.
Hi Joost:
I understand the purpose of evergreen blog posts, but I don’t quite understand the infrequency with which you post them! Almost a month since the last post now?
Jonesing for more…
S
Thanks for sharing Joost. I’ll use in my future projects.
That’s an excellent time save. No more manually updating HTML files until the end of time.
Hi Joost.
This very good what you’re doing, you’re a very skilled person, I hope you develop as plugins to make it easier to use.
Your work will be rewarded.
I do not know much about managing HTMLl, CSS, PHP … only the basics.
Thank you.
Hello,
I am using your WordPress-SEO plugin, it’s really great but there is already such an option for sitemaps. Can I install this plugin if they do not interfere with each other?
Thanks,
Arnie.
Thanks for the script Joost! I agree with Dale – any way to add pdf’s to your sitemap plugin?
Hey Joost, great job with this – are you planning on doing a html, url list and rss sitemaps with this?
Hi! First of all..your SEO plugin for WordPress is nothing short of uber awesome! I also have a handful of static pages and this script will do the trick nicely. One quick question (sorry for the newbie-ness but I’m really trying :)…which file do I point my Google Webmaster Tools to when reporting the site map? The xml-sitemap.php file?
Lastly…if I set the frequency to ‘hourly’ does that mean the script checks my files in that directory that meet the allowed extensions (set in the array) every hour automatically? So that way my sitemap is always up to date?
Thanks so much for all you do…we will making a donation through the Yoast WordPress plugin very soon!
Jack
Excellent script….I would also like to see this incorporated into the plugin.
This looks like a really useful tool.
I would imagine it to be a simple task now to create more stylesheets for tasks, for example, internal link generation, or a menu generator module.
Very interesting. You should also add the option of working with sub directories, I’m making the changes on the script for that, it can be useful for some people.
Your article was helpful for me, thanks Joost.
Best regards from Spain.
Hey YoastFans, very interesting topic, as im not a programer or wedsite developer, but it sure is interesting all this techie language. Anyway, since im a blogger and I use WordPress, im having issues and im unable to get help from the support team at MSI.Hosting.The problem is when I try and enter my site im getting a message “502 Bad Gateway
nginx/0.8.53” anyone of you intellegent yoast geeks come help me. Also when can I find my sitemap, I hear all about this and I don’t know where to Look? So if of you brainy geeks can help I sure would appriciate it very much. Ed :)
P.S.
I’m a subscriber to your newsletter, and read your blog posts as I have an attach link to my blog, and got to say very interesting topics…
Hi,
I am using your great wordpress SEO plugin for sitemaps. Is there anyway of making that plugin add pdf files to the sitemap?
Thanks,
Dale.
Many of the sites with which I work have subdomains. Could you either provide or recommend how to config the php to work with subdomains. Thanks.
Excellent thanks JdV
Looking good Joost! I was wondering, is it possible to use some sort of cronjob to update the sitemap every week or month?
Would be nice if you released this as a plugin Joost. Your ceo plugin is already top notch!
A useful script Joost.
Do you have any plans to integrate this into the wonderful WordPress SEO plugin or does it already use this script?
This looks like a great script. I might use it on future website that I will build. Thanks for sharing!
Nice script and indeed usefull for static websites. Before I manually created the sitemap and was indeed also as lazy to not update it everytime.