Semantic content optimization with entities

12 min read

The biggest question for most SEOs who optimize centralized sites these days is how to optimize content to

  1. Rank better for the main query
  2. Rank for more related and synonymous queries
  3. Show up in featured-snippets and PAA boxes (“people also ask”)

When you have user-generated content or a product inventory to scale, creating custom content might not be your biggest concern (exceptions apply). But when you create all the content yourself, you usually swim in very competitive waters. You need an edge over your competitors and make the most out of each organic click you get.

The question is “how do you get an edge?” and for the answer, we need to take a quick step back for a second.

How content optimization changed over the years

I remember when keyword density was the strongest ranking factor in SEO. Nowadays, hearing the “keyword density” makes people shake like wizards in Harry Potter when they hear the name “Voldemort”. Google did well in demoting keyword stuffy content and not even the worst SEO charlatans would recommend keyword stuffing anymore.

Stuffing a text with keywords is detrimental to user experience and can be punished by Google, Source: https://support.google.com/webmasters/answer/66358?hl=en

Today, content optimization means writing “relevant” text and copy with optimized headings, titles, adding images and videos, and providing a great user experience. Ignoring how fuzzy some of these concepts are for a moment (and leaving that topic for a future article), we’ve come a long way from keyword stuffing. Successful content strategies have shifted from quantity to quality.

Now, there’s an additional quality factor of good content that the SEO community slowly retrieves: entities.

entities are the new hotness

Thus, If there was a maturity model for content optimization, it would look like this:

  1. Keyword stuffed text for machines
  2. Text for humans
  3. High-quality content for humans with rich media
  4. High-quality content for humans with rich media satisfying user-intent
  5. High-quality content for humans with rich media satisfying user-intent and covering synonyms and related keywords
  6. High-quality content for humans with rich media satisfying user-intent, covering synonyms, related keywords, and important entities

But what exactly are entities, how do you use them to create and improve content, and how much sense that make?

What are entities?

Entities are semantic, interconnected objects that help machines to understand explicit and implicit language. In simpler terms, they are words (nouns) that represent any type of object, concept, or subject.

Examples for entities
Examples for entities

Entities are nouns like events, ideas, people, places, etc.

Check out my presentation about entity optimization at Optimisey:


Video: https://www.youtube.com/watch?v=cz7xnsvsQVY&feature=youtu.be

Summary + Slides: https://optimisey.com/seo-advice/entity-optimisation-in-2019-kevin-indig/

According to Cindy Krum and her fantastic entity series, Google seems to restructure its whole approach to indexing based on entities (while you’re at it, read AJ Kohn’s article about embeddings). Understanding entities and how Google uses them in search sharpens our standards for content creation, optimization, and the use of schema markup.

Entities are semantic, interconnected objects that help machines to understand explicit and implicit language. Click To Tweet

Entities provide Google lots of benefits:

  1. They are language-agnostic.
  2. An entity-based index is easier to shift away from backlinks, or at least a helpful layer on top of it.
  3. Strong foundation for voice search and grouping brand assets (apps, sites, etc.).
  4. A deeper understanding of language, especially concepts.
  5. An easier transition to becoming a discovery engine.

Entities provide the foundation for Knowledge Graph, which in itself is a big piece of the puzzle in Google’s transition from search to discovery engine. Knowledge Graph is the most visible touch point between users and entities and contains more and more search features that take away clicks from websites. Just think of the Knowledge Card tabs Google plans to show, as described in my article about Google’s transition linked above.

Each entity has attributes that describe the entity. They help providing more context by enriching the context of the entity. A person, for example, has an age, a name, a birthday, etc. You can check attributes on Wikidata.

Let’s go through an example: the moon. Wikidata mentions that it’s the “only natural satellite of Earth” and also known as “Luna”.

To my fascination, Google pulls a lot of information from Wikidata, for example, the image.

Google search results for the query “moon”
Google search results for the query “moon”
An image of the moon on Wikidata
An image of the moon on Wikidata

We know that Google pulls information like the description from WIkipedia, but also resources like Freebase, DMOZ, and others. You can find them all listed on the Wikidata page.

The entity Freebase ID on Wikidata
The entity Freebase ID on Wikidata
Clicking on the Freebase ID redirects to the Google search results

The indicator for Google using Freebase (or having used) is provided by the freebase URLs redirecting to the Google search results for that entity.

In fact, according to the Freebase API, which is now deprecated, the Freebase dataset was based on Wikidata.The strong connection between Google’s Knowledge Graph and Wikidata is something we as SEOs need to pay more attention to.

We don’t know how big Google’s database of entities is but back in 2012, Amit Singhal proclaimed that knowledge graph contained over 500 million objects (read: “entities”):

It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.


Amit Singhal (Source: https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html)

Another benefit of entities for Google is an easier shift away from the link graph to organize the index. Google relied and still relies on links between pages (actually URLs) to find and rank the most relevant content. The entity graph makes it easier to understand the quality of content better and thus, shift away from links. Sorting results based on references, i.e. backlinks, was a great step forward for the internet but Google can’t rest on that forever. I’m not claiming Google wants to get rid of links as a quality signal, just that entities provide an additional graph to lay over the link graph.

As far as we know, Google uses a technology called Word2Vec to map entities to a graph and give them a unique ID. It can perform exact calculations and understand the relationship between words to a point that highlights abstract concepts and allows for mathematical calculations with words.

Mathematics with words is weirdly logical
Mathematics with words is weirdly logical

Speaking of graphs, language is another hurdle that can be overcome with the entity graph. From personal experience, I see Google still struggles to get content in different languages right. But a computational approach to words allows to solve this issue and understand what the equivalent word in each language is and means. That’s what makes entities language-agnostic.

Google is able to map queries on a graph, which overcomes the hurdle of languages, amongst other things
Google is able to map queries on a graph, which overcomes the hurdle of languages, amongst other things

We must assume that entities have a user intent assigned to them, at least temporarily. It’s clear when they come with an intent-indicating modifier like “buy” (“buy iPhone“) but since single entities are mostly nouns, the intent is often unclear or mixed. As such, entities have a demand (search volume) that can change over time or be seasonal. These are all important factors we need to take into account when looking at entities. The time when SERPs were static are over.

How does schema markup come into play? It’s a “dictionary” for Google to understand how entities are used, what they are, and how to show them in the SERPs. Using Schema Markup is an annotation of source code, which might result in better-looking snippets in the search results but does help Google recognize entities and their relationships.

That being understood, here’s an attempt at applying entities to content creation and optimization.

How can we use entities for content optimization?

Let me be straight upfront: I don’t think we can go beyond using entities as a guide to create and optimize content, but not yet as tool that delivers hard data. I do think that several SEO tool vendors will find a way to quantify the relevant coverage of one or several entities in content, but we’re not there, yet.

Let me list a couple of caveats before we dive deeper.

First, be aware that we don’t know for sure how much weight Google gives entities as a quality signal for content. The algorithm updates we saw in 2018 and 2019 seem to be strongly connected to an improved understanding of language for Google, as described by AJ Kohn in the article I linked above.

Second, content is by far not the only quality signal (think: ranking factor) out there. Google uses hundreds of factors to rank results. So, it’s not like you “fix your entities and rank #1”.

Third, this approach helps discover topics you’re not covering or should write about. It’s easy to get lost in words when looking at entities but what you’re really trying to get at is what greater topics and sub-topics you should be covering in your content.

That being said written, the thought process is that good-ranking content must cover more (relevant) entities than content that doesn’t rank well. Thus, if we can reverse engineer what entities the best ranking content covers, we can improve our own.

How to check content for entities

To assess entities of a page, we can use Google’s Natural Language Processing API, either through the API or by simply copying/pasting content into the demo feature on the landing page.

Google’s Natural Language Processing API landing page

The report you get back contains a couple of metrics but the one you should be paying attention to the most is “Salience”, which measures the relevance of an entity in its context. In plain words, Salience says how important the entity is within the text you analyze. So, focus on entities with higher salience within content that ranks well to inform your content creation and optimization.

The impact of entities on rank today

I ran an example for the query “agile project management”, which has 57K global monthly searches, according to AHREFs. I compared four URLs:

  • https://www.atlassian.com/agile/project-management – 331 entities (Rank #5)
  • https://searchcio.techtarget.com/definition/Agile-project-management – 230 entities (Rank #1)
  • https://www.pmi.org/learning/library/agile-project-management-scrum-6269 – 679 entities (Rank #2)
  • https://www.cio.com/article/3156998/agile-project-management-a-beginners-guide.html – 493 entities (Rank #3)
The top 10 rankings for the keyword “agile project management”, including backlink profile and number of ranking keywords from AHREFs
The top 10 rankings for the keyword “agile project management”, including backlink profile and number of ranking keywords from AHREFs

I then used Google’s Natural Language API to check the top 20 entities for each URL ranked by salience:

AtlassianSearch CIOPMICIO
0.3 project management0.55 Agile Project Management0.72 Abstract Scrum0.07 project management
0.05 releases0.02 benefit0.02 meeting0.06 methodology
0.03 ARTICLE Epics0.02 project team0.01 sprint0.05 Microsoft project A
0.03 system requirements0.01 sections0.01 scrum0.02 improvement
0.02 program0.01 teams0 Jeff Sutherland0.02 prospects
0.01 backlog0.01 APM0 process
0.01 product backlog0.01 issues0 methods0.01 principles
0.01 software teams0.01 project0 ScrumMaster0.01 working product
0.01 team0.01 Christina Torode0 focus0.01 Agile Manifesto
0.01 ARTICLE0.01 belief0 team0.01 success
0.01 Jira Software A0.01 iterations0 authors0.01 William Royce
0.01 workflows0.01 iteration0 Teams0.01 projects
0.01 teams0 step0 metaphor0.01 guide
0.01 coach0 project processes0 Team0.01 development
0.01 Agile0 products0 project management0.01 developers
0.01 workflow0 insights0 rugby term0.01 software development
0.01 software team0 section0 Methodology0.01 master
0.01 project0 project managers0 Scrum0.01 methodologies
0.01 sprint backlog0 projects0 features0.01 customer satisfaction
0.01 work0 Scrum0 XP0.01 It-business alignment

What I found is that no URL was superior in terms of the amount or salience of entities in relation to its ranking. In plain words, I wasn’t able to detect entities making the difference here.

Checking for double mentions of important entities didn't result in any observable pattern
Checking for double mentions of important entities didn’t result in any observable pattern

Duplicate entities found across the top #3 articles for “agile project management” vs. Atlassian’s article on the topic

Is that result surprising? No! As I explained in the caveats, SEO is multifactorial and there could be many factors at play. Also keep in mind that this was just one query I checked. In a real study, we’d have to look at thousands of queries and see if there’s a statistically significant correlation between entity coverage and rank.

Using entities as guide for content optimization

If you want to go really deep on

  1. Identify the topic you want to write about and the main query you want to rank for.
  2. Take the top 5 ranking URLs for the main query and plug their content into Google’s NLP API.
  3. Look for links to Wikipedia for entities with high salience and extract valuable information from the Wikipedia page.
  4. Collect the top 20 unique entities from each text with the highest salience in the NLP API.
  5. Make sure those entities are covered (well) in your content.
  6. Search for the main entity on Wikidata and look for attributes.
Links to Wikipedia in Google’s NLP API results
Links to Wikipedia in Google’s NLP API results

When covering entities in your content, you want to make sure you cover them from the right angle depending on the user intent for that article. Ask yourself “what is the user trying to accomplish” and following up to that “what is the angle on an entity that would help her/him accomplish that?”. Most often, you’ll come out with attributes that are more important than others in the context of the article you’re writing. Say, you’re writing about Larry Page’s wealth. You would then want to cover attributes like his net worth, positions held, etc. Questions, for example from answerthepublic or Google Suggest, as complementary data source can help finding the right angle.

Tips for content creation
Tips for content creation

What else can you do?

  1. Develop a strong brand
  2. Create expert-level content
  3. Use Google’s NLP API
  4. Structure your content as much as possible
  5. Reverse engineer the SERPs and understand what Google wants to rank high
  6. Research what data is crucial for an article to be valuable and relevant
  7. Try to map out the search journey a user is on for a specific query
  8. Identify the user intent
  9. Make sure your content is better than everything else in the top 5 results
  10. Optimize your content’s CTR

Entities in the context of content optimization

For now, entities can inspire and help us understand what information is valuable but they don’t replace user intent research or good writing. They can guide us in the topics we should write about and what Google deems important but entity-driven content creation or optimization is not yet accessible for us. I expect that to change soon, but until then, we have to make use with what we have

However, entity-optimization is the next level of content quality.

If you want another primer on what makes great content great, check out this article.