This Q&A with Gary Illyes took place at the Bay Area Search meetup in San Francisco on 10/14/2019 and was hosted by AJ Kohn.
Keep in mind that my notes are mostly paraphrased from what Gary said. I added clarifications and my understanding in parentheses here [and here]. I only put in quotes what I was really sure of.
The Q&A was moderated by AJ Kohn and he did a fantastic job. All questions, except for one that came randomly out of the crowd, were asked by him.
I publish these notes in service of the SEO industry so we can all become smarter; not to disrespect anyone, but to kill some common misconceptions (also my own). I appreciate Gary stepping into the Lion's den and answering the questions as good as possible.
Q: How often does Google analyze the robots.txt? Can you send notifications to people screwing up?
A: We have a continuous monitoring system that looks for patterns in how people (mis)use robots.txt. Say, a big company like Disney puts up a robots.txt that blocks everything, we’d get an alert.
There are a few other rules we don’t support anymore that most people don’t know about.
There was a sample [I assume a “template”?] robots.txt on Github that Had a noindex for the whole site. So if you used it, your whole site would be noindexed. Presenting that issue got Gary approval to remove noindex from robots.txt to basically protect people who were not showing up in search.
“Some people, especially on Twitter, were very… ‘mental’ about us removing noindex. People thinking it’s about them is a bit arrogant. It was not about them but about the 99% of people who have no idea what they’re doing.”Gary Illyes
Q: Did you make the recent nofollow change because you did something in the dev environment that gave you an advantage? Where you more excited about the link graph value or about the anchor text related to those?
A: Let me step back. What did we actually announce? We said that from that day on, we’re able to use nofollow links for ranking purposes and from March 1st on for crawling.
If you read that announcement again, you’ll see that we didn’t announce a ranking change. What we announced was that we’re able to now use those links.
“It was one of my most exciting launches.” I spent a lot of time in countries that are not that advanced [in SEO]. Many people living in those countries set all outgoing links to nofollow [by default], which is “not the best” for a search engine that uses links to parse the web.
Half of the internet is outside the US and Europe and that means we’re blind to that part of the web(Gary talked about trying to get the Indian Times to drop default nofollow, for example),
How are we going to use that change in nofollow? I don’t know. There’s no ranking change coming with it.
One of the most important thing coming with backlinks [in general] is anchor text, which tell us something about a page. “Click here” is a bad anchor text, for example.
So, now, from March 1st on, we can see what’s on the other end of the link, e.g. malware. That’s pretty useful.
You don’t have to help us labeling data by using re=sponsored. You’ll still have to use nofollow, though.
Q: You did come up with a bunch of new snippet tags. How are snippets pipelined in the backend? How is snippet success measured?
A: When snippets are generated we get a response from the index. In “the thing” that comes back from the index we include all the token keywords we found in the page after rendering (and their location and information about if, it was bold or italic, and so on).
Then, the snippet algorithm reconstructs the snippet from there. There’s no magic behind it.
The actual magic comes in when filtering out bad passages. We have a list of “bad” tokens that the snippet algo is not allow to draw from.
Featured Snippets come from an extra algo that bids for that location.
The SERP is an auction place. Everything you see work by bidding for places [ranking positions]. Whoever wins a bid for a certain position gets that place. But the first position is usually restricted.
Featured Snippets relate to the query more than a normal snippet. We try to pick passages that are more complete.
When results constantly appear and disappear, it means their bid is very close to someone else’s. The scores are utterly close to each other. [In this case, small things can make a difference.] For example, you’re on https. That could be an edge to beat your competitor.
(Random person from the crowd asks “would better tuning for the user intent help you?”
A: It can. But what would help you is more “traditional” than ML.
Schema is different. It helps us to better understand entities on a page. For that we use Hummingbird.)
Q: How often do you travel?
A: 310-320 days a year.
Q: Since you’re in a lot of places, let’s talk about local. How does Google determine which queries get localized?
A: Let me answer how it works for universal results, e.g. map packs that come from the map repository. At that level, it’s clicking on the search results page. Say, someone searches for ”Deadpool trailer” and they’re not satisfied with the results. They then click on the video tab. That’s a hint for us. When this happens thousands of times, a video pack appears.
As for ranking locally or on a metro level (Gary worked on that team for quiet some time), it goes down mostly to signals you’re in control of (especially on the metro level).
[Gary won’t say which signals because webmasters are in control of them and could manipulate it. But he gave a hint that John Mueller recently asked a relevant question about this on Twitter, which is related to one of the things webmasters are in control of and Google looks at.]
Q: Let’s talk about href-lang. The documentation says when using it you can translate the wrapper and the main content can stay the same. Thoughts about replacing hreflang-tag?
A: Gary actually rewrote the hreflang parsing a couple of years ago [together with Christoph]. They sat for weeks in a room to come up with something easier but couldn’t. Hreflang is already the most basic thing you can do. You indicate sibling pages from one page to another. It’s simple and readable.
“Bill Hunt is one of the best hreflang expert I know. I tell him ‘if you have a better idea, I’m absolutely open to it.’ But we couldn’t come up with something simpler.”
It’s a pain for us internally because the clusters use a lot of space. For example, Facebook uses up Petabytes for all their sites in various languages.
I handed hreflang over to a new team 4 years ago. I’m not sure what they’re doing with it. We had plans to look at the language of that page more closely, e.g. by looking at how people react to the results provided by the hreflang pipeline.
Q: Favorite country for food?
Q: Most hated food
A: Anything lamb.
Q: Since we’re talking about food, let’s talk about E-A-T (whole room laughs). Let’s talk about the way E-A-T is established. How might text signals impact E-A-T?
A: This is one of the spaces where I don’t want to give too much away. I don’t mind talking about snippeting because it doesn’t change much if you know [how it works]. In the YMYL space, it [Gary giving away too much] can be really harmful. It may be even dangerous for people.
I had a nasty conversation at Pubcon. After my keynote, someone who’s very “popular” in the public E-A-T space approached me and asked if I can help with one of their client sites because it’s not ranking for anything anymore after the core algo update. Gary said “that sounds like the YMYL space, right?”.
The person said “yes” and that the site is in alternative medicine.
Gary’s father died of cancer and it’s a touchy topic for him. He says “If someone tries to convince me that carrot juice heals cancer, I will probably leave.”
The person [at Pubcon] said there was no harm in trying alternative medicine for a flu. Gary notes that people actually die from the flu. That person didn’t like that answer and that’s pretty much where the conversation ended.
E-A-T is a dumbed-down version of what the algorithms [are trying to] do.
There’s, of course, not just one algorithm. There are probably millions of little algorithms that work together in unison. One algorithm might endorse what the scientific community thinks [Gary notes that he’s citing the search quality rater guidelines here].
The rater guidelines reflect what the algorithms are aiming for.
There’s no E-A-T score.
Thinking that cleaning up your link profile might help after a core update is silly.
We assess YMYL sites extra carefully because they can have such a strong impact on people’s lives.
Gary tells a story about a friend who’s a psychotherapist and who saw a huge uptick of site visitors and calls after a recent core algorithm update a while ago. A few quarters later, the friend said he was on the verge of hiring more people because he couldn’t handle the load of inquiries from search. Over the course of 5 years, he was simply writing about what he knew: psychology-related content. He was linking to articles that could verify his claims. He was simply creating content. He didn’t build a single link. He didn’t use structured data. He added a responsive template after Gary pinged him like 7 times. He didn’t do any hardcore SEO thing. His business grew to 15 people in total with 7 extra psychotherapists - just with content.
The importance of links
Q: How does Google think about trying to rank very relevant content with a thin link profile?
A: PageRank is just one of hundreds of signals we use to rank pages. We also use links in other ways but we don’t need links to rank pages at all.
It’s harder to see this in the US but there was someone presenting at a conference in Norway who brought up a sub-page on the Porsche Norway site. He showed rankings of that page and it had 0 links. None. They were not even linked from their own site. It was in a sitemap [I assume an XML sitemap]. For the queries he showed, the Porsche page was likely the best choice (Gary noted that Rankbrain might help with that).
“Especially in English (and some other, more “internet-mature” languages), where we have a good understanding of the web, we don’t always need links.”
Q: Are footer links worth as much as Content links?
Q: Putting a link into a “span” instead of an “a-tag” - good idea/ bad idea?
Not because we can’t understand them - but it sometimes takes a long time and requires rendering. Whatever requires rendering takes time.
Q: Let’s talk about crawling. A while ago, John Mueller linked to a doc that showed that Google was assigning different crawl rates to different folders. How does Google go about determining the crawl rate for different parts of the site?
A: Let me turn this around and ask “How would you design a system that does this?”
Generally, it’s enough if you look at links. If you just look at PageRank, for example, and average it out over the directory, that would be enough.
(AJ adds that internal links might play a big role and Gary nods)
Q: Would fragments be a better solution than parameters?
A: Don’t use fragments. Fragments in a URL can deliver less than ideal results.
Ok, what were fragments invented for? When you add one, you jump to a section of a page.
What are developers using fragments nowadays for? Changing the whole content of a page!
Fragments were not invented for that. If a fragment in a URL is misused, it might not even show up in search. Fragments don’t get through when we crawl a page. If you add a new URL with a fragment, we might not be able to going to see it.
Q: What if I have parts of I side I don’t want Google to see? Say, from facetted search.
A: Robots.txt is still the best way to solve such problems. Robots.txt is expected to work in the next 25 years, with fragments I’m not so sure.
Q: Let’s talk about intent. Say, you have a search result with fractured intent, meaning it contains commercial and informational results. How does Google determine the mix [of results]?
A: One of the most important things here is the ranking of tokens. People forget is that we’ve been using entities in ranking for a very long time.
We look at what we understand about an entity based on how the query is structured. If you search for a query that equals an entity, e.g. “Gary the Snail”, you’d get a knowledge panel. If you google “Gary the Snail Amazon”, we know you want to navigate there.
We think about whether a knowledge panel is useful for a query or not. You can figure that out over time depending on how users interact with results.
When users refine their queries, we take that into account, especially for personalization [Gary notes Rankbrain plays a role here sometimes but it’s unlikely].
If you use an ambiguous query, you get ambiguous results, and then you have to refine.