Semantic content optimization with entities
This article seeks to answer the question of how we can use entities to optimize content today and how we, as SEOs, should look at them.
The biggest question for most SEOs who optimize centralized sites these days is how to optimize content to
- Rank better for the main query
- Rank for more related and synonymous queries
- Show up in featured-snippets and PAA boxes (“people also ask”)
When you have user-generated content or a product inventory to scale, creating custom content might not be your biggest concern (exceptions apply). But when you create all the content yourself, you usually swim in very competitive waters. You need an edge over your competitors and make the most out of each organic click you get.
The question is “how do you get an edge?” and for the answer, we need to take a quick step back for a second.
How content optimization changed over the years
I remember when keyword density was the strongest ranking factor in SEO. Nowadays, hearing the “keyword density” makes people shake like wizards in Harry Potter when they hear the name “Voldemort”. Google did well in demoting keyword stuffy content and not even the worst SEO charlatans would recommend keyword stuffing anymore.
Today, content optimization means writing “relevant” text and copy with optimized headings, titles, adding images and videos, and providing a great user experience. Ignoring how fuzzy some of these concepts are for a moment (and leaving that topic for a future article), we’ve come a long way from keyword stuffing. Successful content strategies have shifted from quantity to quality.
Now, there’s an additional quality factor of good content that the SEO community slowly retrieves: entities.
Thus, If there was a maturity model for content optimization, it would look like this:
- Keyword stuffed text for machines
- Text for humans
- High-quality content for humans with rich media
- High-quality content for humans with rich media satisfying user-intent
- High-quality content for humans with rich media satisfying user-intent and covering synonyms and related keywords
- High-quality content for humans with rich media satisfying user-intent, covering synonyms, related keywords, and important entities
But what exactly are entities, how do you use them to create and improve content, and how much sense that make?
What are entities?
Entities are semantic, interconnected objects that help machines to understand explicit and implicit language. In simpler terms, they are words (nouns) that represent any type of object, concept, or subject.
Entities are nouns like events, ideas, people, places, etc.
Check out my presentation about entity optimization at Optimisey:
Summary + Slides: https://optimisey.com/seo-advice/entity-optimisation-in-2019-kevin-indig/
According to Cindy Krum and her fantastic entity series, Google seems to restructure its whole approach to indexing based on entities (while you’re at it, read AJ Kohn’s article about embeddings). Understanding entities and how Google uses them in search sharpens our standards for content creation, optimization, and the use of schema markup.
Entities provide Google lots of benefits:
- They are language-agnostic.
- An entity-based index is easier to shift away from backlinks, or at least a helpful layer on top of it.
- Strong foundation for voice search and grouping brand assets (apps, sites, etc.).
- A deeper understanding of language, especially concepts.
- An easier transition to becoming a discovery engine.
Entities provide the foundation for Knowledge Graph, which in itself is a big piece of the puzzle in Google’s transition from search to discovery engine. Knowledge Graph is the most visible touchpoint between users and entities and contains more and more search features that take away clicks from websites. Just think of the Knowledge Card tabs Google plans to show, as described in my article about Google’s transition linked above.
Each entity has attributes that describe the entity. They help providing more context by enriching the context of the entity. A person, for example, has an age, a name, a birthday, etc. You can check attributes on Wikidata.
Let’s go through an example: the moon. Wikidata mentions that it’s the “only natural satellite of Earth” and also known as “Luna”.
To my fascination, Google pulls a lot of information from Wikidata, for example, the image.
We know that Google pulls information like the description from WIkipedia, but also resources like Freebase, DMOZ, and others. You can find them all listed on the Wikidata page.
The indicator for Google using Freebase (or having used) is provided by the freebase URLs redirecting to the Google search results for that entity.
In fact, according to the Freebase API, which is now deprecated, the Freebase dataset was based on Wikidata. The strong connection between Google’s Knowledge Graph and Wikidata is something we as SEOs need to pay more attention to.
We don’t know how big Google’s database of entities is but back in 2012, Amit Singhal proclaimed that knowledge graph contained over 500 million objects (read: “entities”):
“It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.”
Amit Singhal (Source: https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html)
Another benefit of entities for Google is an easier shift away from the link graph to organize the index. Google relied and still relies on links between pages (actually URLs) to find and rank the most relevant content. The entity graph makes it easier to understand the quality of content better and thus, shift away from links. Sorting results based on references, i.e. backlinks, was a great step forward for the internet but Google can’t rest on that forever. I’m not claiming Google wants to get rid of links as a quality signal, just that entities provide an additional graph to lay over the link graph.
As far as we know, Google uses a technology called Word2Vec to map entities to a graph and give them a unique ID. It can perform exact calculations and understand the relationship between words to a point that highlights abstract concepts and allows for mathematical calculations with words.
Speaking of graphs, language is another hurdle that can be overcome with the entity graph. From personal experience, I see Google still struggles to get content in different languages right. But a computational approach to words allows to solve this issue and understand what the equivalent word in each language is and means. That’s what makes entities language-agnostic.
We must assume that entities have a user intent assigned to them, at least temporarily. It’s clear when they come with an intent-indicating modifier like “buy” (“buy iPhone“) but since single entities are mostly nouns, the intent is often unclear or mixed. As such, entities have a demand (search volume) that can change over time or be seasonal. These are all important factors we need to take into account when looking at entities. The time when SERPs were static are over.
How does schema markup come into play? It’s a “dictionary” for Google to understand how entities are used, what they are, and how to show them in the SERPs. Using Schema Markup is an annotation of source code, which might result in better-looking snippets in the search results but does help Google recognize entities and their relationships.
That being understood, here’s an attempt at applying entities to content creation and optimization.
How can we use entities for content optimization?
Let me be straight up front: I don’t think we can go beyond using entities as a guide to create and optimize content but not yet as a tool that delivers hard data. I do think that several SEO tool vendors will find a way to quantify the relevant coverage of one or several entities in content, but we’re not there, yet.
Let me list a couple of caveats before we dive deeper.
First, be aware that we don’t know for sure how much weight Google gives entities as a quality signal for content. The algorithm updates we saw in 2018 and 2019 seem to be strongly connected to an improved understanding of language for Google, as described by AJ Kohn in the article I linked above.
Second, content is by far not the only quality signal (think: ranking factor) out there. Google uses hundreds of factors to rank results. So, it’s not like you “fix your entities and rank #1”.
Third, this approach helps discover topics you’re not covering or should write about. It’s easy to get lost in words when looking at entities but what you’re really trying to get at is what greater topics and sub-topics you should be covering in your content.
That being said written, the thought process is that good-ranking content must cover more (relevant) entities than content that doesn’t rank well. Thus, if we can reverse engineer what entities the best-ranking content covers, we can improve our own.
How to check content for entities
To assess entities of a page, we can use Google’s Natural Language Processing API, either through the API or by simply copying/pasting content into the demo feature on the landing page.
The report you get back contains a couple of metrics but the one you should be paying attention to the most is “Salience”, which measures the relevance of an entity in its context. In plain words, Salience says how important the entity is within the text you analyze. So, focus on entities with higher salience within content that ranks well to inform your content creation and optimization.
The impact of entities on rank today
I ran an example for the query “agile project management”, which has 57K global monthly searches, according to AHREFs. I compared four URLs:
- https://www.atlassian.com/agile/project-management - 331 entities (Rank #5)
- https://searchcio.techtarget.com/definition/Agile-project-management - 230 entities (Rank #1)
- https://www.pmi.org/learning/library/agile-project-management-scrum-6269 - 679 entities (Rank #2)
- https://www.cio.com/article/3156998/agile-project-management-a-beginners-guide.html - 493 entities (Rank #3)
I then used Google’s Natural Language API to check the top 20 entities for each URL ranked by salience:
|0.3 project management||0.55 Agile Project Management||0.72 Abstract Scrum||0.07 project management|
|0.05 releases||0.02 benefit||0.02 meeting||0.06 methodology|
|0.03 ARTICLE Epics||0.02 project team||0.01 sprint||0.05 Microsoft project A|
|0.03 system requirements||0.01 sections||0.01 scrum||0.02 improvement|
|0.02 program||0.01 teams||0 Jeff Sutherland||0.02 prospects|
|0.01 backlog||0.01 APM||0 process|
|0.01 product backlog||0.01 issues||0 methods||0.01 principles|
|0.01 software teams||0.01 project||0 ScrumMaster||0.01 working product|
|0.01 team||0.01 Christina Torode||0 focus||0.01 Agile Manifesto|
|0.01 ARTICLE||0.01 belief||0 team||0.01 success|
|0.01 Jira Software A||0.01 iterations||0 authors||0.01 William Royce|
|0.01 workflows||0.01 iteration||0 Teams||0.01 projects|
|0.01 teams||0 step||0 metaphor||0.01 guide|
|0.01 coach||0 project processes||0 Team||0.01 development|
|0.01 Agile||0 products||0 project management||0.01 developers|
|0.01 workflow||0 insights||0 rugby term||0.01 software development|
|0.01 software team||0 section||0 Methodology||0.01 master|
|0.01 project||0 project managers||0 Scrum||0.01 methodologies|
|0.01 sprint backlog||0 projects||0 features||0.01 customer satisfaction|
|0.01 work||0 Scrum||0 XP||0.01 It-business alignment|
What I found is that no URL was superior in terms of the amount or salience of entities in relation to its ranking. In plain words, I wasn’t able to detect entities making the difference here.
Duplicate entities found across the top #3 articles for “agile project management” vs. Atlassian’s article on the topic
Is that result surprising? No! As I explained in the caveats, SEO is multifactorial and there could be many factors at play. Also keep in mind that this was just one query I checked. In a real study, we’d have to look at thousands of queries and see if there’s a statistically significant correlation between entity coverage and rank.
Using entities as guide for content optimization
If you want to go really deep on
- Identify the topic you want to write about and the main query you want to rank for.
- Take the top 5 ranking URLs for the main query and plug their content into Google’s NLP API.
- Look for links to Wikipedia for entities with high salience and extract valuable information from the Wikipedia page.
- Collect the top 20 unique entities from each text with the highest salience in the NLP API.
- Make sure those entities are covered (well) in your content.
- Search for the main entity on Wikidata and look for attributes.
When covering entities in your content, you want to make sure you cover them from the right angle depending on the user intent for that article. Ask yourself “what is the user trying to accomplish” and following up to that “what is the angle on an entity that would help her/him accomplish that?”. Most often, you’ll come out with attributes that are more important than others in the context of the article you’re writing. Say, you’re writing about Larry Page’s wealth. You would then want to cover attributes like his net worth, positions held, etc. Questions, for example from answerthepublic or Google Suggest, as complementary data source can help finding the right angle.
What else can you do?
- Develop a strong brand
- Create expert-level content
- Use Google’s NLP API
- Structure your content as much as possible
- Reverse engineer the SERPs and understand what Google wants to rank high
- Research what data is crucial for an article to be valuable and relevant
- Try to map out the search journey a user is on for a specific query
- Identify the user intent
- Make sure your content is better than everything else in the top 5 results
- Optimize your content’s CTR
Entities in the context of content optimization
For now, entities can inspire and help us understand what information is valuable but they don’t replace user intent research or good writing. They can guide us in the topics we should write about and what Google deems important but entity-driven content creation or optimization is not yet accessible for us. I expect that to change soon, but until then, we have to make use with what we have
However, entity-optimization is the next level of content quality.
If you want another primer on what makes great content great, check out this article.