On April 28th, Brian Dean from Backlinko published a ranking factor study that caused a lot of outrage in the SEO scene. SEJ fired some shots in collaboration with Bill Slawski, more people got appalled on Twitter, then Gary Illyes validated that page speed is just a \u201cteeny-tiny\u201d ranking factor. It\u2019s easy to dismiss a ranking factor study. Some of the most common arguments: correlation doesn\u2019t equal causationGoogle doesn\u2019t use ranking factorsNobody can analyze at the scale of Google and therefore, no sample can be big enough While all of these arguments have some truth to them, they're also an easy way out. First, every causation has a correlation somewhere. Second, nobody (but a few people in the world) knows what the \u201ccore algorithm\u201d looks like but there has to be a list of factors that influence rankings. So, whether we call them signals, factors, or filters doesn\u2019t change the concept. Third, dismiss any data that doesn\u2019t hit a huge sample is the wrong way to go about studies. In biology, medicine, or social sciences, you always infer from smaller samples and don\u2019t take a single study for the end all be all. Many studies pointing at the same findings increase your confidence in getting closer to the truth and that\u2019s how we should handle every study. That\u2019s also an idea I tried to convey in the ranking factor study meta-analysis in my article The 10 Ranking Factors we know to be true. So I spend a couple of hours looking through the data of Backlinko\u2019s study and worked out a more nuanced critique. Let\u2019s separate facts from stories. Material First, the material: Study: https:\/\/backlinko.com\/search-engine-ranking Methods: https:\/\/backlinko.com\/wp-content\/uploads\/2020\/03\/search-engine-ranking-study-methods.pdf More detailed methodology: https:\/\/frontpagedata.com\/projects\/backlinko\/rankings\/4_final-report.html What happened - just the facts Now, let\u2019s focus on only what happened without interpreting or juding. Brian (+team) analyzed 11.8M keywords to see which ranking factors correlate with first-page search engine rankings. The study was conducted with AHREFS, which provided link data, and Clearscope, which provided content score data. The findings summarized: AHREFs DR strongly correlates with higher rankings#1 rankings have 3.8x more backlinks than #2-10Content with a high Clearscope grade outperformed content lower-grade contentHigher Alexa page speed does not correlate with better rankingsHigher domain popularity leads to better rankings\u201cThe vast majority of title tags in Google exactly or partially match the keyword that they rank for. However, we found essentially zero correlation between using a keyword in your title tag and higher rankings on the first page.\u201dAHREFs Page Authority weakly correlates with rankingsWord count was evenly distributed among the top 10 resultsHTML page size does not correlate with rankingsShort URLs tend to slightly outrank longer onesSchema Markup does not correlate with higher rankingsLonger time on site correlates with higher rankings Method: The team crawled 1+M pages for onpage factors.URLs like amazon.com or youtube.com were not crawled (so called \u201clarge domains\u201d).6% of URLs couldn\u2019t be crawled for technical errors (7.6M URLs could be crawled and used).Alexa API was used for page speed and time-on-site variables.Clearscope provided content scores on 1,000 high search volume keywords.AHREFs provided 1.1M keywords with 11.8M rankings (10M unique URLs). Critique Now, let me add my interpretation of the results. First off, Brian uses language like \u201c it\u2019s impossible to determine the underlying reason behind this relationship from our data alone.\u201d He tries to be careful with absolute statements but I personally think he could\u2019ve done more. Obviously, it\u2019s not sexy to relativize everything and admit that you can\u2019t really provide absolute findings. But that\u2019s the scientific way and we should treat it as such. One of the biggest critiques of the study is blanket statements and I do agree that the results of the study are phrased in a much more attention-grabbing way than necessary. General data Speaking of the scientific way, one thing that bothers me is that the actual correlation\/regressional coefficients and statistical significance weren\u2019t shown. Just reading that some correlations are strong or weak is not as helpful as seeing the actual values. Domain exclusion Furthermore, I think the study took out too many large domains. URLs like amazon.com or youtube.com were not crawled. The full list: en.wikipedia.orgyoutube.comamazon.comfacebook.compinterest.comyelp.comtripadvisor.comebay.comreddit.comlinkedin.comtwitter.comwalmart.comimdb.comyellowpages.commapquest.comquora.cometsy.comtarget.cominstagram.com Those large domains make up 2.1M URLs of the 10M unique URLs. It\u2019s not uncommon for Wikipedia to be excluded from ranking factor studies because the domain skews results so much. However, we have to ask ourselves whether that\u2019s still an accurate representation of the web? I think taking out all large domains was a mistake as it skews the results too much and provides a faulty image of ranking factors (given that they would apply in the mental picture we paint for ourselves). Reality is that most of us are competing with at least one large domain, no matter what vertical we\u2019re in. Backlinks Over 95% of URLs had 0 backlinks, as a result of the analysis. All concepts of ranking factors and deep learning algorithms aside, that\u2019s a very interesting fact (given the data is accurate)! Only a logarithmic Y-axis shows sites with more than 0 backlinks. This reminds me of my article on nonlinearity in marketing: power laws everywhere. You\u2019ll often find that a few pages get way more internal and external links than others. It pays off to distribute that link power more evenly throughout the site, which should be followed by a higher crawl rate and organic rankings. However, to me, this is one of the biggest issues I have with the study. It claims that AHREFs DR has a high correlation with rankings but if >95% of URLs have 0 backlinks, how does that work? It seems that sites with 0 backlinks were ignored, which is another move that skews the results heavily. I do think, though, that pointing out the higher impact of the domain-related over the page-related link profile is in line with my observations. Data sources and sampling I think the idea of pulling data from different sources like AHREFS, Clearscope, and Alexa is actually good! The danger lies in relying too much on proprietary metrics without fully understanding them. I think # of backlinks or referring domains could have been looked at in relation to referring domains and page authority. # of keywords could have been put in relation to content score. Page Speed When it comes to Page Speed, the way I interpret the results is less that it doesn\u2019t have an impact on rankings but rather that some sites rank really well despite being slow: More likely, a few heavy and slow pages that are often ranked on the top 3 (for other reasons than page speed) skew the trend. This also becomes obvious when looking at the trend of median (dots) which seems to increases (slightly) with better positioning. (Source: https:\/\/frontpagedata.com\/projects\/backlinko\/rankings\/4_final-report.html) Since so many optimized sites have a loading time of 1.65 seconds and the study looked at the top10, I assume that slow sites are outliers. Time on site When it comes to time on site, the difference between position 1 and 10 in the study seems minuscule. I would very much like to see the correlation coefficient and averages here. My impression is the difference isn\u2019t as big as it\u2019s portrayed. Higher time-on-site correlating with better rankings is something SEOs suspected for a long time but also very controversial. My humble opinion is that short vs. long clicks are much easier for Google to measure and that we need to be very careful with user behavior metrics. Schema Markup It seems that the study tried to correlate ranking positions with a binary measurement of schema (added or not added to a page). Deriving from this that there is no impact of Schema markup on rankings is a stretch, after my mind. Don\u2019t get me wrong, I think Schema can make a snippet more attractive, not rank it higher, but the statement still seems to stand on wonky pillars. Keyword in title One statement that doesn\u2019t make a lot of sense to me is: \u201cThe vast majority of title tags in Google exactly or partially match the keyword that they rank for. However, we found essentially zero correlation between using a keyword in your title tag and higher rankings on the first page.\u201d The way that I interpret that data is that Google mostly ranks pages with the keyword in their title (exceptions apply) in the top 10. Thus, saying that there is no correlation between having the keyword in your title with higher rankings doesn\u2019t make sense. Instead, having the keyword in the meta-title doesn\u2019t give you an advantage over other pages\/sites. Conclusion All in all, I think we can\u2019t simply dismiss the study but it does make some questionable statements. Some of the things I\u2019m taking away from it as lessons: Domain Authority might be more important than Page Authority.Tools like Clearscope seem to be worth looking at for content optimization.We need more ranking factor studies and look at their commonalities What are your thoughts on the study?