top of page

The Top SEO Recommendations Gained from The Google Search Data Leak

Updated: Jun 11

Google algorithm
A Google document leak has revealed inner workings of the search algorithm.

What Is the Google Search Data Leak?

You may have seen in the news, last week, that following an accidental leak, thousands of documents were published on Github, giving an unprecedented view into some of the features and factors Google uses to rank content on its search engine. Many authors and SEO commentators have crunched the leaked documents (please see our sources/ further information at the end of this article), but how does the information translate into practical SEO best practices?


What Are the Key SEO Learnings and Recommendations from the Google Search Data Leak?

Here are the key learnings and recommendations, taken from the Google search data leak, that can improve brands’ SEO:


  1. Inbound Links Are Important. Links from relevant and diverse websites are indeed a factor in Google’s search algorithm. This might not seem surprising, but Google has downplayed the importance of links in recent years. Recommendation: Continue to gain relevant links from multiple websites.

  2. Indexing Tiers Impact Link Value. Google uses a metric called sourceType that defines where a webpage is indexed and, therefore, how important it is. As background, Google’s index is broken into tiers where the most important, regularly updated and most visited content is stored in flash memory. Less important webpages are stored on solid state drives, and occasionally updated pages are stored on standard hard drives. The existence of the sourceType feature confirms that the higher a website’s indexing tier is, the more valuable that website’s links are. Fresher (newer) pages are considered high quality. Recommendation: The best links come from websites that are most visited and with content that is regularly updated.

  3. Google Has a Feature Called siteAuthority. Google gives each website an authority score called siteAuthority. It is unclear whether siteAuthority is only utilized for webpages within a website, or if it is used to score the value of links from a website (or both). Recommendation: The documents do not detail how siteAuthority is utilized for ranking, but sufficed to say, it will not harm brands to gain a high siteAuthority for their own website (presumably from good content and user experience – see below) and gain links from other websites with high siteAuthoority scores (again, websites with good content and user experience).

  4. Homepage PageRank Is Associated with All Webpages. All website pages (and documents) have their homepage PageRank associated with them. As background PageRank is mainly made up of how many quality, relevant and diverse webpages link into that page (Hey, did we say links are important?). How homepage PageRank impacts webpage ranking is unclear – it could be used as a proxy for new webpages until those pages gain their own PageRanks, or perhaps homepage PageRank is considered alongside webpage PageRank, albeit at a lower weighting. Recommendation: It is important to gain quality relevant and diverse links into a brand’s homepage, as that impacts all subsequent webpages.

  5. Page Titles Are Reviewed Against Search Queries. The leaked documents show that a Google feature exists called titlematchScore – how well the page title matches the user search query. Relevant page titles are still very important and are something that Google actively gives value to. Recommendation: Well-written and well-thought-out page titles really matter. Brands should do keyword research, match keywords to target pages and ensure the keywords are in the page titles (and page content).

  6. Metadata Character Count Does Not Matter (Directly). Longer page titles are not penalized for ranking. Google does not have a limit, measure or score regarding the length of page titles or any other piece of meta data, or snippet. Recommendation: Longer metadata is fine for ranking purposes so if brands need more characters to communicate the product/ service, that’s okay. However, bear in mind that longer titles and descriptions will be truncated in the search results, so could be less appealing and drive fewer click-throughs as a result.

  7. Dates are Very Important. As we already know, Google likes fresh content. Therefore, dates associated with content are very important. Google has various methods for extracting dates – from the URL, from the page and from the content itself. Recommendation: Brands should be explicit with content creation dates, and ensure the dates are accurate and consistent (otherwise the content will be penalized).

  8. Authorship Of Content Is Tracked. Google has been very open with how it evaluates content writers, within its E-E-A-T framework (experience, expertise, authoritativeness, trustworthiness). The leaked documents confirm that Google has various techniques to extract the author from content, and that authors are measured, stored and tracked. Recommendation: Over time, content authors should focus on demonstrating experience, expertise, authoritativeness and, therefore, trust to build up their writing authority with Google.

  9. Google Scores Click Quality. Google has metrics called goodClicks, badClicks, lastLongestClicks and unsquashedClicks. Successful clicks are important for website ranking and continued ranking. Google uses measurements such as time on site and bounce rate as proxies for user experience, in the context of the search query. Google removes any outliers click measurements (squashedclicks) to normalize data, and website content decay is measured by lastLongestClicks (last good click). Recommendation: Captivating content and user experience are very important. Google uses click performance as a factor for determining these.

  10. Demotion Drivers. The leaked documents list various factors that can lead to algorithmic demotions – actions or technical missteps that will result in ranking drops.

    1. Anchor Mismatch. When a link does not match the target site it is linking through to.

    2. SERP Demotion. A signal communicating that user experience has fallen, most likely as measured by clicks.

    3. Navigation Demotion. A demotion to pages due to poor navigation practices or user experience problems.

    4. Exact Match Domains Demotion. As previously shared by Matt Cutts at Google, exact match domains are not as important as they once were. The documents confirm there is a specific feature for this demotion.

    5. Product Review Demotion. Although there is no definition for this, it likely relates to weak product reviews.

    6. Location Demotion. There is a suggestion that global pages and super global pages can be demoted, due to Google’s preference to associate pages with a location and rank them accordingly.

    7. Porn Demotion. Content that features porn.

  11. Recommendation: Build and maintain the online brand. Flipping the demotion drivers detailed above shows what good SEO looks like and, taking a step back, what a good online brand looks like. Strong user experience, local where possible, and great content equals a good brand and leads to good SEO.

  12. Font Size of Text and Links is Important. Google is tracking the average font size of text and anchor text on a page, and content that deviates from that average size. Recommendation: Although it’s unclear if/ how Google includes increased font size into its algorithm, it continues to be a best practice to utilize header tags, at an increased font size, to identify important content.

  13. Your Money or Your Life Content (YMYL). This might not be surprising, but Google has mechanisms to recognize and predict content regarding users’ well-being. These features can rapidly prioritize (time-sensitive) content and rapidly deprioritize (or deindex) misinformation that can affect users’ well-being. Recommendation: If a brand is publishing Your Money or Your Life content, it should be aware that Google utilizes a different, more accelerated and amplified algorithm to index it.

  14. Google Considers the Last 20 Versions of a Website for Ranking. Google keeps a copy of every version of every page it has ever indexed. However, Google uses the last 20 changes to a URL when analyzing pages. Recommendation: Older, weaker webpage content and architecture could be impacting how the latest webpage ranks. To gain a fresh start on Google, a webpage needs to be updated and indexed 20 times.

  15. “Twiddlers” Exist. Twiddlers are defined as re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document”. We are interpreting these as weightings that can be adjusted overall (think of an algorithm update or content freshness), or by user (location, device, previous user behavior). Recommendation: Brands should use mark-up language to explicitly tell Google what their content is about so that Google can prioritize (twiddle) it to certain users or locations.

  16. Whitelists Exist. A couple of modules indicate Google whitelists certain domains related to elections and COVID (features called isElectionAuthority and isCovidLocalAuthority). These are exception lists to ensure that algorithm changes do not inadvertently impact the websites’ rankings. Recommendation: No recommendation, more of an FYI.



Precision's Thoughts

Without doubt, this leak is one of the biggest moments in search history. The Google algorithm has been a fascination for many, with an entire industry born from it, and hundreds of billions of dollars dependent on it. We are unlikely to ever gain an insight into the workings of Google search, at this magnitude, again.


Google has not denied the legitimacy of the leaked documentation, but has noted that the documents are out of context and based on incomplete information. That said, none of the takeaways are particularly surprising or even new. Moreso, the leak has confirmed suspicions that SEOers already held, albeit with interesting feature names ("twiddlers" anyone?). For many, the biggest surprise is that Google has actively denied the existence of some of these features and factors - although who can blame a company for protecting its secret sauce?


The recommendations that we make above are not completely new, but we do now have confirmation that the factors exist. Recurring throughout though is Google's drive to define and measure content and user experience. Therefore, the biggest takeaway of all is still this - in order to rank high on Google, create quality content and give a positive user experience.


Sources:

Fishkin, R. (2024, May 30). An anonymous source shared thousands of leaked google search API documents with me; everyone in SEO should see them. SparkToro. https://sparktoro.com/blog/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them/


King, M. (n.d.). Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked. iPullRank. https://ipullrank.com/google-algo-leak

 

Further Reading:





Comments


bottom of page