New: Yoast releases performance optimizations for larger websites

With our latest 27.8 release, we introduced performance optimizations that should reduce loading times throughout the plugin’s functionalities, especially noticeable in large sites with lots of posts and users. 

Note: This post contains technical content and implementation details.

Offering well-tuned software with minimal overhead in servers and fast loading times is always at the forefront of everything Yoast developers do. However, Yoast SEO is installed in millions of websites so the variance of setups that we must be well-tuned for is big. This means we should be continuously going back to search for windows in optimizing the performance of the plugin. We’ve been known to do that consistently in the past, like when we improved our database system

The 27.8 release is the outcome of one of those targeted reviews. We deliberately picked features whose behavior at scale offered the most headroom and reworked them to be leaner and faster. From modifying queries to make pages faster for sites with many users and shaving heavy operations in the admin for sites with many posts, to reducing rounds trips to the database for multiple features and generally applying performance best practices, this is a release meant to improve the user and developer experience in the Yoast SEO plugin. 

We would also like to offer a technical summary of the improvements in this release here, focusing on their nitty-gritty details because it’s always nice to raise awareness about performance best practice (not to mention that it’s always fun to talk about code). 

Significantly reduce loading times of the root sitemap on sites with many users 

For context, for Yoast SEO to calculate the Last Modified value of the author sitemap, when it outputs the root sitemap, it uses the usermeta of the all the users that are eligible to be included in the author sitemap.  

Calculating the eligible users was traditionally done by checking user capabilities. This was done by adding the ‘capability’ => [ ‘edit_posts’ ] argument in the get_users() call that was used. As a result, a very heavy query with multiple joins and no use of the indexes of the database was triggered.  

Specifically, the resulting query added a clause like this: 

AND ((((mt1.meta_key = 'wp_capabilities'
        AND mt1.meta_value LIKE '%\"edit\\_posts\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"administrator\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"editor\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"author\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"contributor\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"wpseo\\_manager\"%')
       OR (mt1.meta_key = 'wp_capabilities'
           AND mt1.meta_value LIKE '%\"wpseo\\_editor\"%'))))

Since LIKE ‘%…%’ cannot use any B-tree index, MySQL must read each matching wp_capabilities row in full and do seven substring scans of the serialized PHP meta_value per row. 

By modifying that calculation from using the capability check to looking for users with published posts (via using the ‘ has_published_posts ‘ => true argument), we instantly turned the resulting query to be one that uses indexes and that performs way better in sites with many users.  

In fact, on one of our tests, on a site with around 2 million users, the time it took to complete each query (so approximately the time that took the root sitemap to render), went from over 300 seconds down to just 25 milliseconds! This means that the change has the potential for drastic improvements in loading times of root sitemaps in similar sites.  

Finally, considering that the ‘ has_published_posts ‘ => true argument was already used in a later stage of the sitemap generation, the change itself should have little to no negative impact on the actual functionality of the feature. 

Reduce loading times of the author sitemap on sites with many users 

For Yoast SEO to render author sitemaps, it needs to calculate the eligible users. On sites with many users, this can be a very heavy operation. Aside from the above optimization, we noticed that while Yoast SEO was calculating eligible users, it also added a meta query to check whether the user_level of each user was over 0.  

It turned out that this was a remnant from old times, because the user_level framework had been deprecated by WP core since version 3.0. While this didn’t break things in our sitemap feature, it unnecessarily added an INNER JOIN in the resulting query without much purpose and in sites with very big user and usermeta tables that was degrading performance. So we went and removed the unnecessary JOIN: 

INNER JOIN wp_usermeta AS mt1 ON wp_users.ID = mt1.user_id 

... 

AND ( mt1.meta_key = 'wp_user_level' AND mt1.meta_value != '0' )

Since the user_level framework was deprecated a long time ago, we made the deliberate call to drop support for it, especially since doing so would make our feature smoother. In fact, we are comfortable shipping this optimization and expect minimal disruption as a result, exactly because of how old that deprecation is. 

Prevent unnecessary expensive database queries in admin pages 

In order to timely notify admins that they need to perform the necessary actions for their site data to be indexed optimally in our internal storage, Yoast SEO used to run a database query daily while admins navigated throughout the backend. For big sites, that database query had the potential to run for several seconds, slowing the rendering of admin pages periodically. 

Specifically, the Limited_Indexing_Action_Interface::get_limited_unindexed_count() function that can run complex queries like below, was running periodically in admin pages slowing the speed of bigger sites’ rendering.  

SELECT Count(P.id) 

FROM   wp_posts AS P 

WHERE  P.post_type IN ( 'post', 'page' ) 

       AND P.post_status NOT IN ( 'auto-draft' ) 

       AND P.id NOT IN (SELECT I.object_id 

                        FROM   wp_yoast_indexable AS I 

                        WHERE  I.object_type = 'post' 

                               AND I.version = 2)

We managed to re-arrange the logic of the code responsible for the notification that told admins about pending actions in such a way that those heavy queries now run only once, at the moment it’s first detected that such a notification should be created.  

That way, we effectively cache the results of the Limited_Indexing_Action_Interface::get_limited_unindexed_count() and rely on cache invalidation that existed before our changes but weren’t properly utilized. As a result, a potentially very heavy database query went from being triggered daily (and in cases of very busy sites, with lots of concurrent users, once per 15 minutes) to being triggered only once, in most sites. 

Optimize expensive database queries in admin pages  

Related to the above query-preventing change, not only did we manage to avoid running that aforementioned heavy database query more than once per site, but we also managed to optimize the query itself. An added benefit from that is that we made the SEO optimization tool much faster in sites with lots of posts. 

Specifically, we went from: 

AND P.ID NOT IN ( 

    SELECT I.object_id FROM wp_yoast_indexable AS I 

    WHERE I.object_type = 'post' 

)

To:  

AND NOT EXISTS ( 

    SELECT 1 FROM wp_yoast_indexable AS I 

    WHERE I.object_id = P.ID 

      AND I.object_type = 'post' 

)

Since NOT IN (subquery) builds the entire list of object_ids, while the second query short-circuits the moment one row matches, the query runs considerable faster in sites with multiple thousands of posts. 

Reduce roundtrips to the database 

As a rule of thumb, roundtrips to the database are considered to be expensive operations that should be reduced to a minimum whenever possible. Our reviews discovered instances where we were retrieving data for multiple posts in sequential SELECT queries where we could have done a single batched SELECT query to gather data for all posts at once. 

For example, a piece of code that looked like this: 

$indexables = []; 

foreach ( $post_ids as $post_id ) { 

	$indexables[] = $this->repository->find_by_id_and_type( (int) $post_id, 'post' ); 

}

was refactored into something that looked like this: 

$ indexables = $this->repository->find_by_multiple_ids_and_type( 

	\array_map( 'intval', $post_ids ), 

	'post', 

);

That meant that for a chunk of 1000 posts, instead of performing 1000 SELECT queries that yielded a maximum of one row, we now perform a single SELECT query that yields a maximum of 1000 rows. Naturally, we made sure that the posts that will be requested each time do not exceed a certain threshold, to avoid reaching MySQL usage limits. 

As a result, sites with e.g. 1000 posts would save 960 roundtrips to the database for certain operations like part of their SEO optimization or part of the output of the schema aggregation feature

Improve post editor performance by preventing unnecessary re-renders 

The WordPress editor re-renders Yoast’s sidebar panels whenever the data they pull from the store appears to have changed. Unfortunately, “appears to have changed” is decided by reference equality (JavaScript’s ===) not by comparing values. A selector that returns { items: [‘foo’] } looks identical to a human, but if it’s a fresh object literal each time, React treats it as new and re-renders the panel. And if we multiply that by a busy editor that dispatches state updates on every keystroke, the result is panels that re-render constantly for no reason. 

With the 27.8 release, we identified multiple instances where data that weren’t actually changed triggered unnecessary re-renders in the post editor and patched them, making our editor integration much more robust and performant.