How does Google understand text?

On, we talk a lot about writing and readability. We consider it a very important part of good SEO. Your text needs to satisfy your users’ needs. This, in turn, will help your rankings. However, we rarely talk about how Google and other search engines read and understand texts. In this post, we’ll explore what we know about how Google analyzes online text.

Are we sure Google understands text?

We know that Google understands text to some degree. Think about it: one of the most important things Google has to do is match what the user types into the search bar to a search result. User signals alone won’t help Google to do this. Moreover, we also know that it is possible to rank for a phrase that you don’t use in your text (although it’s still good practice to identify and use one or more specific keyphrases). So clearly, Google does something to actually read and assess your text in some way or another.

What is the current status?

I’m going to be honest. We don’t really know how Google understands texts. The information simply isn’t freely available. And we also know, judging from the search results, that a lot of work is still to be done. But there are some clues here and there that we can draw conclusions from. We know that Google has taken big steps when it comes to understanding context. We also know that it tries to determine how words and concepts are related to each other. How do we know this? On the one hand, by analyzing some of the patents Google has filed over the years. On the other hand, by considering how actual search results pages have changed.

Read more: Google Search On 2020 event: AI improvements for search »

Word embeddings

One interesting technique Google has filed patents for and worked on is called word embedding. I’ll save the details for another post, but the goal is basically to find out what words are closely related to other words. This is what happens: a computer program is fed a certain amount of text. It then analyzes the words in that text and determines what words tend to occur together. Then, it translates every word into a series of numbers. This allows the words to represent as a point in space in a diagram, a scatter plot, for example. This diagram shows what words are related in what ways. More accurately, it shows the distance between words, sort of like a galaxy made up of words. So for example, a word like “keywords” would be much closer to “copywriting” in this space than it would be to “kitchen utensils”.

Interestingly, you can do this not only for words, but for phrases, sentences and paragraphs as well. The bigger the data set you feed the program, the better it will be able to categorize and understand words and work out how they’re used and what they mean. And, what do you know, Google has a database of the entire internet. How’s that for a dataset? With a dataset like that, it’s possible to create reliable models that predict and assess the value of text and context.

From word embeddings, it’s only a small step to the concept of related entities (see what I did there?). Let’s take a look at the search results to illustrate what related entities are. If you type in “types of pasta”, this is what you’ll see right at the top of the SERP: a heading called “pasta varieties”, with a number of rich cards that include a ton of different types of pasta. These pasta varieties are even subcategorized into “ribbon pasta”, “tubular pasta”, and several other subtypes of pasta. And there are lots and lots of similar SERPs that reflect the way words and concepts are related to each other.

google entities types of pasta
After typing [types of pasta] Google now shows this entity-based rich result

The related entities patent that Google has filed actually mentions the related entities index database. This is a database that stores concepts or entities, like pasta. These entities also have characteristics. Lasagna, for example, is a pasta. It’s also made of dough. And it’s food. Now, by analyzing the characteristics of entities, they can be grouped and categorized in all kinds of different ways. This allows Google to better understand how words are related, and, therefore, to better understand context.

Practical conclusions

Now, all of this leads us to two very important points:

  1. If Google understands context in some way or another, it’s likely to assess and judge context as well. The better your copy matches Google’s notion of the context, the better its chances. So thin copy with limited scope is going to be at a disadvantage. You’ll need to cover your topics exhaustively. And on a larger scale, covering related concepts and presenting a full body of work on your site will reinforce your authority on the topic you specialize in.
  2. Easier texts which clearly reflect relationships between concepts don’t just benefit your readers, they help Google as well. Difficult, inconsistent and writing with poor structure is more difficult to understand for both humans and machines. You can help the search engine understand your texts by focusing on:
    • Good readability (that is to say, making your text as easy-to-read as possible without compromising your message).
    • Good structure (that is to say, adding clear subheadings and transitions).
    • Good context (that is to say, adding clear explanations that show how what you’re saying relates to what is already known about a topic).

The better you do, the easier your users and Google alike will understand your text and what it tries to achieve. Which also helps you rank with the right pages when a user types in a certain search query. Especially because Google seems to basically be trying to create a model that mimics the way us humans process language and information. And yes, adding your keyphrase to your text still helps Google to match your page to a query.

Google wants to be a reader

In the end, the message is this: Google is trying to be, and becoming, more and more like an actual reader. By writing rich content which is well-structured and easy to read and is clearly embedded into the context of the topic at hand, you’ll improve your chances of doing well in the search results.

Keep reading: SEO copywriting: the ultimate guide »

18 Responses to How does Google understand text?

  1. Raibin
    Raibin  • 2 years ago

    I already knew about SEO and keyword research and keyword targeting, but this is something new. I never understood how google views the content and understand the context.
    Thanks for this amazing article, now I will fix the issues that had resulted in poor rankings.
    Thank again!

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hey Raibin!

      Cool to see that this article helps you improve your site. Good luck!

  2. sreeja
    sreeja  • 2 years ago

    After reading the article, I noticed the mistakes I have done on my website content. And from now I will be very careful in my content.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hope it helps, Sreeja!

  3. Nishant Maitre
    Nishant Maitre  • 2 years ago

    OMG! This is very interesting information about google. Never ever heard about this before. Thanks for sharing.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hi Nishant! You’re welcome, great to hear you found it interesting :)

  4. John
    John  • 2 years ago

    I’ve never heard of Word embeddings before!

    This is something that sounds very interesting and I will probably need to research it in detail a bit more.

    I love the yoast blog!

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hey John! Yup, word embeddings are mind-blowing! I remember when I first heard of them, I went into a big deep dive and got more and more excited with every new piece of information I found. Hope you have the same experience :)

  5. Girish
    Girish  • 2 years ago

    Hello Yoast, this sounds tricky. What if a user writes well but does not use enough suitable “related words” identified by Google? Will the content be deemed less valuable? Also, using this logic means that the Google algorithm will encourage content creators to rehash existing content on the Internet, i.e to remodel their content to contain all the related keywords they can find. Also I think the search results will all be very similar with similar content if the algorithm places high value on this as a ranking factor.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hey Girish,

      I don’t think approaching this from the perspective of ranking factors is the way to go here. In my mind, this is not one static factor that outweighs other things you do on your website. From a broad perspective, rich content that is embedded into the context of a specific field is useful (for your users as well).

      But I’m not suggesting you’ll get ‘penalized’ for not using enough related words. If parts of these technologies factor into an algorithm, it’s probably more nuanced than that, as part of a whole web of interrelated variables.

      In the end, just focus on creating useful content that answers your users’ questions and adds something to the existing corpus, and you should be fine. These patents just suggest that effectively relating your content to that corpus could be increasingly measurable.

  6. V.J. Miller,Sr
    V.J. Miller,Sr  • 2 years ago

    I think SEO should ignore quotes of another person or famous individual. The post I just finished got a hit because I used the same word to start a sentence four times in a row. After reading over the post I discovered that it was a quote by Rod Serling. I can’t change the quote just to please the SEO score.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hi! You definitely shouldn’t change the quote just to please the score. In cases where it makes sense (like this one), we actually encourage you to ignore the SEO feedback. For more tips on how to deal with the Yoast SEO feedback, check out this article:

      • V.J. Miller,Sr
        V.J. Miller,Sr  • 2 years ago

        Thanks. This eases my mind. I always try to write the best I know how and to make it interesting. I won’t ignore the SEO advice but only use it as a guide.

        • Jesse van de Hulsbeek
          Jesse van de Hulsbeek  • 2 years ago

          Sounds like exactly the right approach, good luck!

  7. John
    John  • 2 years ago

    Nice post! So it comes back to focusing on your reader… writing informative content that is well-structured and easy to read.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hey John! That’s exactly it; try to stand out by being useful and accessible!

  8. Imran
    Imran  • 2 years ago

    I’m using yoast for 2 years and never face any kind of readability issue. But I don’t think that how Google treat text.

    • Jesse van de Hulsbeek
      Jesse van de Hulsbeek  • 2 years ago

      Hi Imran! Good to hear that you’ve never faced readability issues. I think there’s a clear development and focus in Google efforts towards trying to become more and more like a reader. Understanding text is spectacularly difficult. In some cases, it’s clear that Google doesn’t understand text well at all, which shows there’s a lot of work to be done. I’m not suggesting Google is capable of doing these things at a high level, just painting a picture of what they’re working on and how they’re positioning themselves.