This month we are commemorating 5 years since the Yoast SEO readability check was first released! Since then, we’ve continued making improvements. We sat down with our Linguistics team lead, Manuel Augustin, and researcher-developer, Hanna Worku, to get a deeper view of how the readability check works and how they adapt the tool to add support for more languages.
The Yoast linguistics team
Could you first tell us a bit more about the team? For instance, what are your backgrounds, are you all linguists?
Manuel: “Our team is one of the most international teams and all of us are linguists. While most of us are big language nerds, our other interests include hiking, dancing, playing music, and gardening. We also keep learning coding to grow as developers, which has been a bit challenging, but a very rewarding experience.”
How many languages do you speak together?
Hanna: “While knowing many languages isn’t crucial for linguistic research, it is definitely helpful! For example, it makes working with less familiar grammatical systems much easier. Our team speaks 14 languages overall.”
The ins-and-outs of readability checking
Can you shortly explain what Yoast SEO’s readability analysis does?
Hanna: “The Yoast SEO readability check analyzes the user’s post and tells them what can be done to make it more readable. The analysis shows the user where the text’s readability can be improved, for example, which sentences to rewrite. This input helps users create posts that are SEO-friendly, engaging, accessible to a wider audience and rank higher.”
How do you know what makes a text readable? How do you measure this?
Manuel: “A text is considered readable when it is easy to read and understand. There are multiple main qualities of a readable text. For example, short sentences are easier to understand than long sentences. Using connecting words (e.g. ‘however’, ‘moreover’, ‘in conclusion’) and subheadings make the content more engaging and easy to follow. Depending on the language, using active voice (‘They built the house’) is preferred over passive voice (‘The house is built by them’).”
Hanna: “Our initial research helped to estimate ratios that make the text analysis more consistent. For example, if at least 30% of the sentences in your text contain a transition word, the bullet for the transition words check will be green. If they are used in more than 20%, or in less than 30% of your sentences, you get an orange bullet. And the bullet will be red if less than 20% of the sentences of your text contain a transition word.”
From researching readability to readability checks
Can you explain a bit more about the research process and how you translate that into readability checks?
Manuel: “We always start adding a new check with researching the language. The first goal of the research is to find out whether (and how) our existing readability checks apply to it. The second goal is to find ideas for readability checks to add specifically for that language. After research, we add all the necessary data for analyzing a text in the language: e.g. how passive voice looks like in that language, or how many words to expect from an average sentence length.”
Hanna: “To make sure we take into account as much relevant knowledge as possible, we collaborate with a native speaker consultant who takes part in the research and gives feedback on the first version of the feature. Some of the readability checks are used for each language (e.g. paragraph length, subheading presence), while others depend on the specific language. Because properties of the language play a big role in what makes a text readable, researching the particular language is an important step in adding the assessments.”
The overall readability score
Yoast SEO gives a score on individual checks, but also an overall score. How do you balance all these checks? When do you consider an entire text readable enough to get an overall green bullet?
Hanna: “There are indeed multiple checks that analyze the readability of your post, a few of which are language-dependent. Many of these checks are based on sentences. For example, we check whether too many sentences start with the same word, and check whether enough connecting words were used across sentences. Other assessments pay attention to how the content is organized: e.g. paragraph length check and subheadings presence check.”
Manuel: “The overall bullet turns green if the majority of readability checks are covered. It’s worth noting that an individual check should be used as a guideline rather than instruction. In the end, it is for the user to decide which assessment is the most relevant and actionable based on their style and genre.”
The readability check has existed for 5 years now, how did it evolve over time? Did we change or adapt checks?
Manuel: “So far there have been no drastic changes in how we assess readability. However, we continue to work both on improving the existing readability checks and on exploring ideas for new ones. For example, we have an idea to add a check for double negatives. Importantly, we also plan to add readability checks that are more language-specific. Gaining deeper insight into a specific language helps us enhance the assessment by adding checks that are suitable specifically for the user’s language.”
Readability in different languages
The complete readability analysis is available in 15 languages now. Are there any striking differences in readability between languages?
Hanna: “There have been no striking differences so far, but there is quite some variety among languages in regards to sentence length. For example, an average sentence in Spanish is longer than an average sentence in English. We are also investigating word groups that affect readability in one language, while not affecting it in another.”
What about non-alphabetic languages? What would a readability check in, for instance, Japanese look like? Do we have plans to add non-alphabetic languages?
Manuel: “That’s something we want to work on in the future. The most unfamiliar writing systems we worked with were so far Hebrew and Arabic. Our future plans involve Japanese and Farsi. There are definitely some differences. For example, languages like Japanese and Chinese don’t have spaces between words, so the concept of counting for example the number of words for long sentences isn’t quite the same.”
Working on the readability checks
What does a day in the life of a Yoast linguist look like? Do you spend most of your time on the readability checks?
Hanna: “We spend a lot of time both on the readability and SEO (keyword) analysis, especially for Premium features. For example, the research for adding word form support is especially extensive because it involves implementing code that is able to recognize different forms of one word (e.g. planted gets recognized as plant). Another big part of our work is improving the existing code (for example, making it more understandable), addressing feedback from users, and sharing new ideas for more ways to analyze texts.”
In your work, do you feel you’re more of a linguist or a developer?
Manuel: “Definitely depends on the task. Adding a new language to the plugin always involves both linguistic research and coding. Since we focus on making our features available in as many languages as possible, both coding and linguistic research are always in the picture. One of the most exciting research tasks is exploring ideas for new features, for example, a language level assessment that would help the users determine the difficulty level of their text. Another big part of the research is devoted to improving the feedback we provide for user’s content. For example, making it more actionable. Sometimes we work purely with code: we reorganize it, add documentation for it, and refactor it to make it more consistent and easy to grasp.”
What is your biggest annoyance when it comes to text readability? Do we have a check for that?
Manuel: “For English, I guess I’d choose the overuse of passive voice. It makes a text sound more impersonal and less dynamic. When we add a new language, we research whether the passive voice affects readability in that language. If that shows to be true, we research how to detect passive voice in that language and add it to the readability check.”
Check out the latest languages and features
Thank you for sharing these insights from behind-the-scenes, Manuel and Hanna! You can read more about the 5-year anniversary of our readability analysis in this interview with Marieke van de Rakt and Irene Strikkers, the original creators of the analysis.
We’re continually developing new features for the Yoast SEO plugin, and adding new languages too. Do you want even more features in your SEO toolkit? Upgrade to Yoast SEO Premium to access the full collection and take your SEO to the next level:
Enjoy all the newest features with Yoast SEO Premium!
Get access to the Premium analysis, including word form recognition, synonyms, and related keyphrase and courses such as SEO copywriting, when you upgrade to the Premium plugin.
Read more: The ultimate guide to content SEO »
Coming up next!
- Event December 17 - 18, 2022 Team Yoast is Sponsoring WordCamp Kolkata 2022, click through to see if we'll be there, who will be there and more! See where you can find us next »
- SEO webinar 20 December 2022 Our head of SEO, Jono Alderson, will keep you up-to-date about everything that happens in the world of SEO and WordPress. All Yoast SEO webinars »