Google internal docs on search engine algorithm have leaked

Google’s internal documentation on how their search engine works, ave been leaked online. The documents are causing quite a stir, as they contradict a few statements Google has made in the past about the way they “rank” sites in the search results.

For example: subdomains are treated as separate websites by the crawler and do impact your site’s ranking. Also, newly registered domains are penalised and ranked lower than more established ones. Domains that mimic search queries (like mens-luxury-watches.com or big-purple-car.com) are also penalised.

One particularly troubling reveal is that Google ranks results by user centric click signals. In other words: heavily advertised websites and websites that generate a lot of traffic are ranked higher, contrary to what Google has stated numerous times in the past. This means that sites from big brands always show up higher in the list, no matter how hard a smaller brand’s tries to optimise their website or has authority. As today’s episode of Techlinked puts it: “It’s like Mr. Beast showing up higher in the search results than your own doctor, just because Mr. Beast has more Twitter followers.”

Google has in the past also stated that data from Chrome (their browser) is in no way used to determin a page’s ranking. Guess what? They do use that data - including from people who have opted out from allowing Google to use browsing data.

The documents were accidentally leaked by a Google employee, who included them in his Github repro. He tried to delete the docs soon after, but they’d already been backed up by third part document management services, who still have them available.

The leak is quite big news - SEO specialists and website owners need to read them, as you can read exactly how to get traffic to your site by reading between the lines of the documents.

I’m not linking to the docs themselves, as I’m not certin of the legal implications of doing so for both me and Realmac (as the owner of this platform), but you can (oh I love irony) find them using Google Search just fine…

Google has confirmed the authenticity of the documents to The Verge, but followed that with a statement that one should not jump to conclusions as “the pages are out of context”. Just how 2,500 pages of text can lack context was not explained though. Google also states that the documents are for an older version of their systems.

Cheers,
Erwin

Sources:

5 Likes

Google is the real threat in IT and the future of the internet I have no doubt and lots of my suspicions about them over the years seem to validated now.

1 Like

In History it has always been very bad and wrong to let just one make the lists… in fact in History it’s always very bad and wrong to let someone make a list

The first hint this could happen was when they dropped “Don’t be evil.” as their credo.

1 Like

If you have been doing real scientific A-B research on SEO for any length of time. Absolutely NONE of this is news. This is why my original Rapidweaver-SEO dot com website closed a long time ago.

Trying to win the search engine race for riches and fame is a fools game. You must find a very narrow niche and carve out your market. There is one tool out there that does it. And it works with Rapidweaver products as well as all other website builders if you take the time to learn the system.

It happened several years before that. They dropped it because of other mistakes and legal hearings they got called out on and lost.

What I think we all want is copies of these documents. This could help us get an edge up on managing Evil Google.

@Flash Hi, I think about it since a quite long time (sorry for grammatical injuries :grin:). The result of my intense thoughts :exploding_head: is that SEO can’t help alone tiny business among megaone , other sources of advertising are required (social media, YouTube, books, social and associative activities, teams, friends of a friend of a friend, newspapers if still read by anyone….). Sad but realistic it seems :pensive:

1 Like

I doubt many people will go through 2,500 pages, but seems like you could plug those documents into ChatGPT and ask it to summarize “Based on the provided docs, what is the best way to achieve high page rankings in Google’s search engine results pages?”

Just don’t use Google Gemini :rofl::

Google’s AI Overviews feature suggested a user put glue on pizza so the cheese would stay put.

Touts the health benefits of tobacco for kids.

How many rocks should I eat? and the AI Overview suggested the user take at least one rock per day for vitamins and minerals quoting UC Berkeley geologists.

2 Likes

This is true. The trick or magic is to compete in a sub market or a small niche. Then once you bring your website visitors in using the niche you can slowly and respectfully introduce them to the larger market you want to win.

1 Like

I assume this was a joke. But if not, keep in mind it can only read about 6000 words at a time. That’s a lot of uploading that it cannot keep in context.

Nope wasn’t a joke, I wasn’t aware there were any character limits in regards to documents it can scan through. Are you sure that 6000 limit is across the board? For example do ChatGPT Plus members get higher character limits, or would using a different model give higher limits (like GPT 3.5 vs 4o), or would using the API give higher limits?

I seem to recall reading about people plugging in a lot of information into ChatGPT and it ingesting it fine. Maybe this was in regards to people making their own GPTs in the GPT store though…

I also seem to recall reading about people plugging in legal docs which can be extremely word heavy to get a summarization.

It can take in everything, but it will not be an accurate summary.

GPT 4 Turbo can read 128k if you pay for it. That’s about $11US for input on just that amount alone. However output is still limited to about 4K. Will it be a more accurate summary? Yes. However in this case, a technical document, details matter, and since this is more than likely a white paper, there probably is not a lot of fluff. Also if you pay for and develop an API there is obviously a lot more that can be done. But I am assuming we are discussing consumer level products here.

Honestly the players are too big to game for a consistent income anymore. They have been for years.

See my comment above about winning at niches.

1 Like

Yes you’re right. I have created my own private gpt for psychological research. I put them thousands pages (btw many thanks to pdf) and then I ask. The important thing is to be very careful about the constant way you build your asks because it also train itself to learn about that. Of course the statistics lead the game, an example : impossible for ChatGPT to tell that there still isn’t any proof for Psychotherapeutic effectiveness, too much papers about Psychotherapeutic effectiveness to tell the contrary. That’s a drastic and very problematic limitation I think for these tools.

2 Likes

I use jennieAI upload research papers and the like into a folder and ask questions based on those documents normally PDF