Advanced crawl optimization settings: URL cleanup
The URL cleanup feature in the crawl optimization settings helps to reduce the impact of URL parameters on your site’s performance and security.
If you want to take advantage of this feature, you should make sure that you understand how URL parameters work, and if/how they’re used on your website.
What are query parameters, and how do they work?
When people link to your site, they may choose to add parameters to the URL. This is a way of passing additional information to the website. For example, if you’re running an email marketing campaign, you might add
?from=example.com to the campaign’s URLs so that you know where your site traffic came from. This kind of functionality is common with tracking systems like Google Analytics.
You might also use parameters to power parts of your site’s functionality. For example, a calendar plugin might let you browse to specific dates by appending parameters like
?year=2021&month=01&day=01 to the URL. This kind of approach is also common in e-commerce websites, for filtering categories and products.
Lastly, it’s important to understand that any user or website can link to your pages using query parameters. It’s common for crawlers, systems, social networks, software, and third-party tools to append query parameters to the URLs they crawl and request.
Query parameters can cause SEO problems
In some scenarios, parameters like these can cause SEO problems. That’s because
example.com/?source=email are technically different URLs. So search engines and systems that crawl your URLs and links must now crawl many more URLs – and they might see those pages as duplicate content. Even if you’re using canonical URLs, those extra URLs still represent an additional ‘cost’ to the crawling and indexing processes. They might also impact your site’s performance, as it’s harder to cache and optimize lots of individual, unique URLs.
If those parameters are important for your tracking or site functionality, then that might be a worthwhile tradeoff. But when query parameters aren’t being used, it might be beneficial to remove them.
Removing query parameters
Our URL cleanup feature allows you to automatically remove all unknown query parameters. When you enable this feature, we’ll trigger a redirect to strip out parameters that you haven’t explicitly said that you want to keep. For example, requests to
example.com/?nonsense=true will now automatically redirect to
example.com (via a 301 redirect).
Note that we won’t automatically remove registered query parameters, and won’t remove anything when the user is logged in. We also won’t remove some commonly-used parameters, like Google Ads’
gclid tracking parameter. There’s a full list of these below.
If there are specific query parameters that you don’t want to remove, then you can define those in our interface.
What’s a registered query parameter?
When a developer builds a plugin or a theme which relies on query parameters (like our calendar example above), they should ‘register’ the query parameters that they use, so that WordPress knows about and can process those in line with best practice. This isn’t always the case, though – in some cases, developers will add and use query parameters without registering them.
Risks and testing
If your site relies on query parameters for key functionality (or tracking), then automatically removing these might obviously cause problems. If you enable this feature, we recommend testing extensively to be sure that no critical query parameters are accidentally removed.
Parameters that we don’t ever remove
As we mentioned above, there are various (non-registered) query parameters that we never remove, because they’re commonly used by websites, analytics systems, and other software. Those parameters are:
*This list is incomplete, but will be updated soon.
Rewriting Google Analytics query parameters
If you use Google Analytics campaign tagging, you might also benefit from using our setting for optimizing
utm tracking parameters. This setting automatically redirects requests like
?utm_medium=email to the
# equivalent; e.g.,
#utm_medium=email. Google Analytics supports this approach ‘out of the box’, and this can improve your site’s crawl efficiency without sacrificing functionality.