Infopost | 2024.08.09

Mass Effect galaxy map

Time for another brief installment in the series of posts on search, indexing, and Google Search Console. Last year I linked a post and some HN discussion about changes in indexing by Big Search. The discussion digressed into grievances about Google removing personal pages from its index. These were echoed in a more recent post that asserts that these observations indicate a new search strategy by Google.

Vincent Schmalbach Google is no longer trying to index the entire web. In fact, it's become extremely selective, refusing to index most content. This isn't about content creators failing to meet some arbitrary standard of quality. Rather, it's a fundamental change in how Google approaches its role as a search engine.

The comments section pointed out the obvious:

crazygringo This post seems to be based entirely on personal anecdotal experience.

There isn't a shred of hard data to support the headline claim that Google now "defaults to not indexing content".

Google never indexed everything, removing duplicates, blogspam, useless pages, etc. Maybe they've changed their thresholds or maybe not. But this post provides zero evidence of anything. It's pure speculation without any facts at all.

OP showed up to say, "you're not wrong, crazygringo":

Vincent Schmalbach You're right, this post is based on personal anecdotal experience. I have access to Google Search Console data for over 100 websites, and most have many pages in the "Discovered - currently not indexed" and "Crawled - currently not indexed" categories, despite ranking well for some keywords and getting traffic. This wasn't the case 10 years ago.

Regarding "Google never indexed everything" - I'd say it came close. They did manual de-indexing for heavy spam sites and would even send an email when they did this. Apart from that, nearly everything was in the index, including duplicates. De-duplication happened at the ranking stage, not the indexing stage.

Whether or not Vincent has accurately described Google's new strategy, crazygringo is denying an actual trend simply because there's no public announcement or leaked memo. In a way, this only matters to small-time web publishers and SEOs. And yet it's also just weird to think that this fixture of the internet could be moving from "index everything" to "America Online was right all along".

aiauthoritydev I think the problem is not Google specific rather the internet has grown far too large with too much of crap floating around. Google, in my opinion has done the best job of getting relevant information followed by Reddit.

While OpenAI etc. is pretty good (so does Google Gemini) what is OpenAI like interfaces prevent me from doing is to segue from a focused topic to related areas to discover knowledge on the periphery, which is the most important aspect of learning in my opinion which chatbots today are not able to do that well.

Moment of Zen

My web index pointed me to another Chris who also has a characters page.




2024.08.05

Second lap

Remnant II and BG3 are just as good the second time around.
2024.08.18

Coasts

Catalina and then some gaming in NoVa.


Related / internal

Some posts from this site with similar content.

Post
2024.05.06

Wandering

Purposeful and aimless walks into the internet.
Post
2023.11.20

Subsurface

Links pages, webrings, and search.
Post
2023.07.23

A walk in the dark forest

All of the internet in one short stroll.

Related / external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

Has a preview image link and yet 404 :/
www.vincentschmalbach.com

Google Now Defaults to Not Indexing Your Content - Vincent Schmalbach

Picture this: It's ten years ago, and you've just launched a new WordPress blog. Within hours, sometimes even minutes, your content is indexed by Google.
Has a preview image link and yet 404 :/
gehrcke.de

Google changes: recently I see more of "discovered - currently not indexed" - Jan-Philip Gehrcke, PhD

Has Google become more conservative with indexing content of personal websites? I think we might see less and less low-traffic quality contents in Google search results. I have carefully done basic...
Has a preview image link and yet 404 :/
www.theverge.com

How Google perfected the web

Google has dominated the search market for decades, leading to a web filled with SEO-driven content. With generative AI on the horizon, this could all come crashing down.

Created 2024.10 from an index of 419,839 pages.