XML Sitemap PHP script

Recently, while working on the site for my father in law (in Dutch), I wanted to create an XML sitemap for the many publications on his site, that are downloadable PDF’s. I regularly add PDF’s to his site too, and since I’m a tad bit lazy I don’t want to keep updating the XML sitemap. So I wrote a small XML Sitemap PHP script, that looks for all the files in a directory of a certain type, grabs their last modified them, and throws them in an XML sitemap.

Then, when working on another site yesterday, a Base64 encoding and Base64 decoding experiment, I needed an XML sitemap yet again. So I improved up the XML Sitemap PHP Script a bit further and decided it should be released.

Why make an XML Sitemap for Static Files?

Let’s first address the “why” of this script: in lots of cases, you’ll have static files, either they’re PDF’s, or static PHP or HTML files that create a site. I want all of those in an XML sitemap for two reasons:

  • to tell Google that they’re there;
  • to be able to see in Google Webmaster Tools whether they’re all indexed.

This script assumes that all those files are in one directory. I know that’s a bit “lazy”, but if your site spans a lot of directories you probably should be using a CMS.

Configuring the XML Sitemap PHP script

Of course this script needs a bit of configuration before it’ll work well. It has the following constant & variables:

  • SITEMAP_DIR
    The directory to search for files in.
  • SITEMAP_DIR_URL
    The URL to the Sitemaps directory
  • RECURSIVE
    Whether or not the script should parse recursively.
  • $filetypes
    An array of all the file types that you wish to include in your XML sitemap.
  • $replace
    An array of all the files that should be replace with other URL’s, useful to, for instance, replace ‘index.php’ with an empty string, so it’ll look like just example.com/
  • $ignore
    An array of all the files to ignore in the XML sitemap, useful for your config.php, for instance
  • $xsl
    A relative path to the XSL file included in the script from the SITEMAP_DIR_URL location.
  • $chfreq
    The change frequency for files, can be ‘hourly’, ‘daily’, ‘monthly’ or ‘never’.
  • $prio
    The priority, a value between 0 and 1, since you can’t differentiate between files, there’s no big harm in setting them all to 1.

Styling the output of our XML Sitemap PHP Script

Of course, we’ll want our XML Sitemap to look good, as well as work well. For that we use an XSL stylesheet which is included in the download. It makes the XML sitemap look like this:

XML Sitemap PHP Script output, styled with XSL

XML Sitemap PHP Script output, styled with XSL

Download XML Sitemap PHP Script

I’ve added the whole script on Github, so you can play, fork, etc. Or you could just download the zip.

Update 2012-09-29: I’ve updated the script to work recursively and fixed a few minor issues.

Coming up next!


25 Responses to XML Sitemap PHP script

  1. Todor
    Todor  • 9 years ago

    Is there any way that we can rewrite our url in the sitemap, as we use Amazon S3 and CloudFront plugin that uploads our images to Amazon, so an image src becomes from:
    h**p://my-domain/blog/wp-content/uploads/2014/10/image.jpg
    to:
    h**ps://s3.amazonaws.com/mybucket/blog/wp-content/uploads/2014/10/image.jpg

  2. M-A
    M-A  • 13 years ago

    This is such synchronicity!

    My co-worker recently had to work on a very old website, and needed to redirect all of the 100+ page links to the new site’s equivalents through htaccess. He did not want to do this manually, which involved finding each page, inside folders, etc. He told me he was working on a script similar to yours. I will pass this along and maybe he will be virtually hugging you if it does help him!

    Merci!

  3. Michael Dorf
    Michael Dorf  • 13 years ago

    Nice little utility, Joost. I’m using Arne Brachhold’s Sitemaps generator plugin on learncomputer.com. Will try to poke around to see if I can wedge this script into it. Should be a fun little project! Thanks!

  4. Rob AtlantaHomes
    Rob AtlantaHomes  • 13 years ago

    This is interesting. Aren’t there utilities on the web that will build XML sitemaps from static sites for you?

    I remember using them prior to moving to WordPress.

  5. Stephan
    Stephan  • 13 years ago

    Hi Joost:

    I understand the purpose of evergreen blog posts, but I don’t quite understand the infrequency with which you post them! Almost a month since the last post now?

    Jonesing for more…
    S

  6. Srinivas
    Srinivas  • 13 years ago

    Thanks for sharing Joost. I’ll use in my future projects.

  7. Koozai Mike
    Koozai Mike  • 13 years ago

    That’s an excellent time save. No more manually updating HTML files until the end of time.

  8. Wilmer
    Wilmer  • 13 years ago

    Hi Joost.

    This very good what you’re doing, you’re a very skilled person, I hope you develop as plugins to make it easier to use.
    Your work will be rewarded.
    I do not know much about managing HTMLl, CSS, PHP … only the basics.
    Thank you.

  9. Arnie
    Arnie  • 13 years ago

    Hello,

    I am using your WordPress-SEO plugin, it’s really great but there is already such an option for sitemaps. Can I install this plugin if they do not interfere with each other?

    Thanks,
    Arnie.

  10. Jay
    Jay  • 13 years ago

    Thanks for the script Joost! I agree with Dale – any way to add pdf’s to your sitemap plugin?

  11. Dave Cain
    Dave Cain  • 13 years ago

    Hey Joost, great job with this – are you planning on doing a html, url list and rss sitemaps with this?

  12. Jack
    Jack  • 13 years ago

    Hi! First of all..your SEO plugin for WordPress is nothing short of uber awesome! I also have a handful of static pages and this script will do the trick nicely. One quick question (sorry for the newbie-ness but I’m really trying :)…which file do I point my Google Webmaster Tools to when reporting the site map? The xml-sitemap.php file?

    Lastly…if I set the frequency to ‘hourly’ does that mean the script checks my files in that directory that meet the allowed extensions (set in the array) every hour automatically? So that way my sitemap is always up to date?

    Thanks so much for all you do…we will making a donation through the Yoast WordPress plugin very soon!

    Jack

  13. Craig
    Craig  • 13 years ago

    Excellent script….I would also like to see this incorporated into the plugin.

  14. Mark Fisher
    Mark Fisher  • 13 years ago

    This looks like a really useful tool.
    I would imagine it to be a simple task now to create more stylesheets for tasks, for example, internal link generation, or a menu generator module.

  15. César Couto
    César Couto  • 13 years ago

    Very interesting. You should also add the option of working with sub directories, I’m making the changes on the script for that, it can be useful for some people.

  16. Oposiciones
    Oposiciones  • 13 years ago

    Your article was helpful for me, thanks Joost.

    Best regards from Spain.

  17. Ed
    Ed  • 13 years ago

    Hey YoastFans, very interesting topic, as im not a programer or wedsite developer, but it sure is interesting all this techie language. Anyway, since im a blogger and I use WordPress, im having issues and im unable to get help from the support team at MSI.Hosting.The problem is when I try and enter my site im getting a message “502 Bad Gateway
    nginx/0.8.53” anyone of you intellegent yoast geeks come help me. Also when can I find my sitemap, I hear all about this and I don’t know where to Look? So if of you brainy geeks can help I sure would appriciate it very much. Ed :)

    P.S.

    I’m a subscriber to your newsletter, and read your blog posts as I have an attach link to my blog, and got to say very interesting topics…

  18. Dale Reardon
    Dale Reardon  • 13 years ago

    Hi,

    I am using your great wordpress SEO plugin for sitemaps. Is there anyway of making that plugin add pdf files to the sitemap?

    Thanks,
    Dale.

  19. Robert Visser
    Robert Visser  • 13 years ago

    Many of the sites with which I work have subdomains. Could you either provide or recommend how to config the php to work with subdomains. Thanks.

  20. neil
    neil  • 13 years ago

    Excellent thanks JdV

  21. Jeremy
    Jeremy  • 13 years ago

    Looking good Joost! I was wondering, is it possible to use some sort of cronjob to update the sitemap every week or month?

  22. bart
    bart  • 13 years ago

    Would be nice if you released this as a plugin Joost. Your ceo plugin is already top notch!

  23. Craig Cacchioli
    Craig Cacchioli  • 13 years ago

    A useful script Joost.
    Do you have any plans to integrate this into the wonderful WordPress SEO plugin or does it already use this script?

  24. James Dowen
    James Dowen  • 13 years ago

    This looks like a great script. I might use it on future website that I will build. Thanks for sharing!

  25. Herman dailybits
    Herman dailybits  • 13 years ago

    Nice script and indeed usefull for static websites. Before I manually created the sitemap and was indeed also as lazy to not update it everytime.