How to rock SEO in a machine learning world24 min read

Aug 7, 2017 | 0 comments

The biggest threat to SEOs is missing substantial changes in the industry. If you are not aware of how machine learning already impacts SEO you’re at a high risk of making the wrong decision for your company or client. The time to be ahead of the curve is now.

 

In HipHop, they say “the game has changed”.

 

That’s why I spoke at Digital Summit in Denver about exactly that topic.

 

The research for the presentation left me with a ton of references and resources, which I didn’t want to throw away. Hence this article. If you want to check them out right away, scroll to the bottom of this article.

 

Before we dive deep into the topic, two caveats:

  1. I’m not going to provide a general introduction to machine learning. Others, like Eric Enge, Udacity, the NYTIMES, Jeff Dean or Mike King, can do that much better.
  2. Please be aware that we’re not going to wake up in the Terminator apocalypse tomorrow.  Machine learning and artificial intelligence are still very much in their infancy. It’s only used sparingly in SEO. Yet.

Artificial intelligence will substantially change the world – and Google

The world is spinning faster and faster. The iPhone just turned (only) 10. Only 13 years after it started, Facebook has reached 2 billion users. Google has reached a level of data size and processing capability, at which they’re able to make almost daily updates to their algorithm(s). Machine learning will only accelerate that development.

In a recently published study [1] (May 2017), scientists from Oxford and Yale asked 1,600 AI experts when they thought artificial intelligence could reach human like intelligence.

how many years it takes for machines to reach human like intelligence

The participants also estimated that machines will be better than humans by

  • 2024 in translation
  • 2026 in writing a highschool essay
  • 2027 in driving a truck
  • 2031 in selling in retail
  • 2049 in writing a bestseller
  • 2053 in performing surgery on a human being

Of course, the usual suspects, from Facebook to Amazon, drive this trend. And of course, Google. It started with the acquisition of DeepMind [2], went on with Sundar Pichai stating that Google moved from being “a mobile first company to an AI first company” and reached a tipping point in Amit Singhal’s replacement by John Giannandrea in 2015 [3]. Amit was a strong proponent of classic information retrieval. According to people who worked with him on the search algorithm, his intuition for search quality was so good they just had to build the algorithm around it. That has now changed with Giannandrea, a clear driver of AI, as new head of search.

The acquisition of Deepmind accelerated the use of machine learning at Google. A video from Jeff Dean about Deep Learning [4] shows its exponential growth starting in 2014 (acquisition of Deepmind).

deep learning at google

Be aware that this graph ends in Q2 2016, which is now about a year old. So you can imagine how much further Google integrated machine learning.

 

That leaves us in a weird kind of transition from classic information retrieval to machine learning. We’re at a tipping point at which it’s important to understand what’s coming at us so we can adjust and reposition ourselves.

 

So the question is: what does SEO look like in a world that’s on the edge of AI driven search?

Optimize for user intent

optimizing for user intent

In 2017 you have no chance to rank if you don’t hit the user intent with the right format. Users have clear expectations of what type of content they want for a certain query. But it’s not always obvious what their expectation is. Luckily, the SERPs give us a good idea.

Look at the SERP for “hairstyle inspiration”.

Users are looking for inspiration behind this query. And they want it in form of images. If your page for that keyword is not image-heavy, you won’t rank. Of course, Google shows an image search integration pretty high on the SERP, which is a hint for that.

 

By the way, the site www.lookli.st figured that out pretty well and built a whole app around this user intent. They have not a single line of text on the page / site and rank #4 for “hairstyle inspiration”. This is a nice anecdote for content not having to be text. Content can be images (Pinterest), videos (Youtube) or an app (Looklist).

Another example for Google clearly having figured out the user intent for a search query is “haircut”. Users want to actually get a haircut, when they search for “haircut”. They don’t want a definition of the word, the history of haircuts or know what the best haircuts are. It’s a location driven user intent.

Google even shows the map on top of the SERP to navigate people to the next location right away. Another “hint”. What’s fascinating me at this point is that 6 out of 9 organic results are from Yelp.

 

The query “straighten hair” has a learning driven user intent. People want to learn the best way to straighten hair. Therefore, Google shows tutorials for 9 out of 10 results. If your content is not structured around a tutorial, you won’t rank.

Google even tries to show a tutorial itself in the Featured Snippet box (hint).

One last example to make the point: people looking for “hair dryer” want to buy one.

Seven out of 10 results are pages you can buy hair dryers on. Only three results are comparing hair dryers. Of course, Google puts a Shopping integration on the top of the SERP (hint).

 

So you get the idea: wrong format = no ranking. The minimum viable thing you should do is perform a quick search for the keyword you want to rank for and see if you can satisfy user intent with your content. As we’ve also learned, Google is giving us hints in the SERPs for what kind of content they expect for a given query.

The next logical question should be “what should I do, once I identified user intent?”

Create holistic content

create holistic content

Holistic content is characterized by covering a topic in its entirety, with all related sub topics.  That means when you write about the topic “car”, you need to cover attributes (speed, price, horsepowers), parts (tires, windshield wipers seats), brands (Audi, Porsche,Toyota) and features (air conditioning, leather seats, sunroof).

 

Ask yourself what the classifications [12] within the topic you’re writing about are. Say you’re writing about an animal. An animal is part of a specific species, has a size and attributes like a snout or a tail. Or you write about a city, which also has a size but also an inhabitant count, relationship to a state and country and landmarks.

 

In machine learning, those classifications are called entities. Google’s goal is to understand the relationship between those entities.

Jeff Dean, the machine learning uber guru at Google, explains this principle in the video (starting 32:19) I mentioned before [4].

jeff dean content nearby words

Google is using the Natural Language Processing frameworks Word2Vec [5] and SyntaxNET [7] in TensorFlow to understand the relationship between words. Word2Vec allows you to put terms on a vector and calculate the distance between them (very simplified).

word2vec linear relationships

By the way, our friend Jeff Dean [9] was part of writing the original patent ;-).

SyntaxNET helps Google to understand the syntax of sentences. This is crucial to understand the relationship of entities.

syntaxNET

Instagram recently used Word2Vec [6] to understand the context (meaning) of Emojis, which are becoming their own language.

 

That allowed them to understand how users interpret emojis. For example, the praying hands emoji was originally concepted as a high five, but people use it to express gratefulness.

 

For Google, understanding the meaning behind a topic or keyword is a huge challenge and we can make it easier for them by creating holistic content.

 

Besides covering all relevant entities, Holism is also characterized by depth and width. “Depth” means how detailed and extensive you cover a topic, “width” means how much you put it into context and cover related topics. One way to achieve that is to consolidate content, i.e. by merging several articles that cover related topics into one big monster article.

 

Holistic content is also curated, meaning it’s up to date and contains all relevant pieces and references. In concrete terms that means you have to regularly check your content for information that’s outdated and information that’s new but missing on your page.

 

You also want to make sure to use different types of content on the page (images, videos, social media integrations like tweets and IG posts, apps, etc.) to provide the best user experience. As we’ve learned, user intent leads the way here.

 

A good signal to see how holistic your content is is to check how many keywords it ranks for. To stick to the hair examples, look at how many keywords the first result for “hair dryer” ranks for. 1,766! With one page! The second result, from Sallybeauty, ranks for 3,068 keywords!

 

When looking at those keywords, it becomes apparent that Google ranks the page for all kinds of variations and related keywords to “hair dryer”.

We started seeing that trend since Google overhauled its core algorithm and called it “Hummingbird” in 2013. It has become much better at understanding context and meaning ever since. The engine responsible for that understanding is RankBrain [10]. I’d put a lot of money on the theory that Google’s understanding will skyrocket over the next 12-24 months.

 

Google pushes for holistic content because it’s the best experience for users. Speaking of users…

Focus on user signals

user signals seo

User (behavior) signals are telling us how satisfied people coming to our site (from organic search) are with what they see. Are we solving the problem the user came to us for? How much Google uses those signals is controversial [13]. Recent raning factor studies from Searchmetrics [8] and SEMrush both found correlations of user signals with higher rankings.

Especially click-through rate (from the SERPs) seems to have a high weight in ranking. It scored as strongest ranking factor in Searchmetrics’ last study.

CTR as a ranking factor

Time on site, pages per session and bounce rate are also worth a look. It’s important to segment these metrics, meaning dividing them further up, to get a better understand. It’s not enough to look at bounce rate, for example. Instead, look at the bounce rate per device or country.

 

Make sure to compare user signals for pages against the average of the whole site or even better, against pages with the same template. Also, look at user signals over a longer time period than 7 days. The last 30 days are a bare minimum, after my mind. Otherwise, it’s hard to spot seasonality and get enough data for a representative sample.

 

In Google Analytics, it would look like this:

find user signals in google analytics

As you can see in the example, two pages (I blanked the URLs) are sticking out with a higher than average bounce rate.

For those pages, it’s worth investigating what the problem could be. Hypothesis could be:

  • The page ranks for keywords it’s not relevant for (happens less and less)
  • The page is not delivering what it promises
  • The product / main content is bad

This process can only raise more questions but it will lead you to the core problem. Improving bad user signals is key to SEO success – now and in the future.

 

When putting together my presentation, I also got curious about how backlinks would stack up against user signals and content. I started in an era in which it was enough to bomb a page with backlinks to rank for all sorts of keywords. That obviously changed. So I did a little mini study, looking at four high search volume keywords and comparing their backlink profile to their ranking. To get the best view on the data, I used AHREFs, SEMrush and Moz data.

 

The first keyword I looked at was “agile” (~50K monthly searches), for which the first two results have the strongest backlink profiles. The ones on #2-10 however, don’t have strong backlink profiles at all!

The content on agilenutshell.com (third site on the list) is even super thin! It has only two sentences and two images – but the user experience is great and I’m sure lots of people click through the microsite. So this site ranks neither because of content nor backlinks.

User satisfaction: 1, backlinks: 0

 

Next, I looked at “iPhone 7”, a keyword with ~277K monthly searches. We see a similar pattern. The first result has a very strong backlink profile. It makes sense, since Apple is the producer of the iPhone (and the strongest brand in the world). But then it becomes interesting again: #2-5 have weak backlinks profiles.

User signals: 2, backlinks: 1

 

“Free credit score” has ~ 250K monthly searches but shows a different picture. The result with the strongest backlink profile on #9!

In fact, most results ranking on top are weak in terms of backlinks, instead of maybe #2. But given the fact that the credit / finance industry is probably the most competitive (and lucrative) one, this result is astonishing!

 

User satisfaction: 3, backlinks: 1

 

Finally, let’s look at the most searched for keyword in this mini-study with ~450K monthly searches, “auto loan calculator”.

Once again we find a very diversified picture for the top10 pages ranking and only the top result with the strongest profile.

 

Bottom line: it seems that you still need a strong backlink profile to rank #1 but not for #2-10.

You still need a strong backlink profile to rank #1 but not for #2-10. Click To Tweet

If I had to hypothesize, I’d say that machine learning opens the way for Google to handle huge masses of data and therefore more feasible to use user behavior data. Google representatives used to say that this data was “too noisy” but I personally believe this becomes less and less of an issue.

Test how ranking factors apply to you

Another huge change in the SEO industry is the move from general ranking factors to industry specific ranking factors. Say what? Yes! Studies like the recent ones from Searchmetrics are starting to show how ranking factors seem to be weighted differently from industry to industry.

keyword in title as ranking factor

For example, the ranking factor “keyword in title” applies differently to the Health industry compared to the finance industry. That’s substantial! From a logical standpoint of view it makes sense. I have no data to back this up, but I deem it possible that this trend will intensify and that’s why it’s important to test how certain ranking factors apply to your website and industry*.

 

*You can’t measure the impact on the whole industry but at least on the keywords you rank for.

 

As SEOs, we want to know what moves the needle. We share our knowledge with the “scene” and colleagues in the hope that we all make each other smarter. Everyone wins. But when the same rules don’t apply to every industry, we have to find ways to experiment within the industry. That’s where testing comes into play.

Etsy did it.

etsy ab testing

Thumbtack did it.

thumbtack seo testing

Pinterest did it, too.

pinterest seo testing

The question is: how did they do it? Classic A/B or multivariate testing is impossible in SEO because you’re not dealing with users. You’re dealing with one search engine and a sample size of 1 doesn’t allow for comparison. It’s impossible to create laboratory environments for SEO. We cannot determine the impact of one single factor on the ranking of a page with 100% certainty. Google simply has too many ranking factors that work at the same time and overlap.

 

To solve this problem, we use paired t-testing. In a simplified version, this means

  1. a) identifying pages with the same template
  2. b) dividing them into groups (at least two)
  3. c) making changes to one or some of the groups and
  4. d) comparing them against a test group.

 

It’s important to compare the averages (means) of those groups against each other since they receive different amounts of traffic.

 

A couple more things to consider:

  • Taking a large enough set of pages and being reasonable of group sizes is crucial for the experiment’s success. If you have 1,000 pages in the set you can divide them easily into four groups or even 10. At the same time, 5 pages are not enough to test.
  • Leave enough time for the changes to take effect. Since pages have a different crawl frequency it will take Google longer to crawl some compared to others. Therefore, the changes you make to the test group(s) take some time to take effect. Wait for at least 14-21 days after having made the change(s) and after reverting the change. The experiment run-time also depends on the amount of traffic you get on these pages. With time, you get a better feeling for a reasonable run time.
  • Don’t forget to revert the change you made to the test group(s). Otherwise, you cannot measure the exact impact of the ranking factor.
  • Document experiments for the afterworld / SEO scene / personal indulgence. Too often, I see great learnings that are not documented within companies because people are too lazy to do it.

Question your findings. Research (and testing) is a game of questions that never ends. Answers lead to more questions. Of course, you can act on findings from the test but you should never stop testing. It should become part of your DNA.

You should never stop testing. It should become part of your DNA. Click To Tweet

But testing is not the only thing we now have to do to thrive in this new environment. We can’t only look at ourselves. The growing trend of industry specific ranking factors also makes it crucial to keep an eye on others.

Monitor your industry

More often than not can we learn what works from the best in class. Monitoring your competitors does not only reveal opportunities and threats but also where the industry is going as a whole.  In SEO, we look at market share in terms of keywords and rankings.

Take the fashion retailer industry, for example. We have Macys (blue), Nordstrom (green), Kohls (orange) and Sears (red). Let’s focus on these for now. By looking at sheer SEO Visibility, which in itself doesn’t tell you much, we see that Macys is about double as big as Sears.

Market trends become obvious when we zoom in a bit and look at the shifts on a keyword level. All players had an uptick of SEO Visibility starting on 01/29 and a fall starting on 05/14.

Kohls Macys SEO Nordstrom

Well, that’s interesting.

 

Let’s look at another example that showcases an even more interesting market shift. The social media industry has some of the largest sites in the world.

Facebook (red) is clearly leading the herd, followed by Twitter (orange) and Instagram (blue) and Pinterest (green) are having a close race for third place.

social networks SEO

It turns out that Pinterest had passed Instagram in the first week of June.

social networks instagram pinterest

Both sites have “explore” subdirectories that contain pages on which they show collections of user generated images for certain topics.

What’s important to notice is that the /explore/ subdirectory provides a huge chunk of rankings for Pinterest…

…but not for Instagram.

One of the topics both sites collect images for is “Christmas”. The Christmas page of Instagram looks similar to the one of Pinterest at first glance, but looking closer reveals some interesting details.

There are small, subtle differences that make a big difference. This could be the reason for why Pinterest is passing Instagram.

While Instagram has only the images, Pinterest does a much better job in linking the explore pages together to create topic hubs. Remember when I wrote about holistic content at the very beginning of this article? That’s exactly what Pinterest is doing here. They cover topics holistically.

 

You see the colored buttons under the headline of the page (“Christmas things”, “Diy xmas decorations”, “Christmas decor”, etc.). That’s the first layer: related topics / keywords. By covering all related sub-topics they increase their relevancy for the main topic (Christmas).

The breadcrumb navigation at the top of the page creates the second layer. You see it links to a higher level in the hierarchy, “Holidays and events”.

This shows how strategically Pinterest covers sub-, main and related topics. The strategy pays off big time. Instagram’s Christmas page doesn’t stand a chance against Pinterest’s, even though Instagram’s user base is 4x as big as Pinterest’s.

Imagine you’re a player in the social media industry. You need to know these things! They can give you a competitive advantage, which translates into Dollars.

You get much more insightful data when you monitor your most important keywords against your competitors to understand your market share. If you were Pinterest this would tell you that you’re also competing with Instagram but Amazon, eBay, Wikipedia, etc.

Pinterest competes with online retailers now because more transactional keywords also have an inspirational user intent (assumption). It used to be that people either wanted to learn or to buy online but nowadays people are skimming the internet to get inspired and find new things. It’s online window shopping.

You only come to these findings when you invest time monitoring your industry.

Tl;dr: Focus on user intent, holistic content, SEO testing and monitoring your industry

Machine learning will change SEO more fundamentally than Penguin and Panda did – but we’re not there yet.

Machine learning will change SEO more fundamentally than Penguin and Panda did - but we’re not there yet. Click To Tweet

AI is only used in small, specific use cases for organic search but its usage seems to be growing quickly. My hypothesis is that it will open many doors for Google to use data that was “too noisy” before. It will transform search and our lives in general [11].

 

To stay on top of SEO and still be successful in 2020, make sure to get the user intent right, measure user behavior and create holistic content. Make sure you have a close eye on your industry by tracking specific keywords, monitoring your competitors and testing which ranking factors apply to you. Finally, make sure to follow me on Twitter @Kevin_Indig and follow this blog to get regular updates.

 

PS: the last point is optional ;-).