Infopost | 2023.07.23

Stable diffusion generated dark forest ominous

There has been a flurry of interesting plotlines surrounding the past, present, and future of the internet; implosions at Reddit and Twitter, Facebook's Twitter clone, and the impact of AI on web search. Simultaneously I have read up on more parochial web matters - the indieweb, blogging, and SEO.

Here's a post about all of them, because they're all intertwined.

Let's start with esteemed sociological discussion since later we will wade into greentexts and "Fuck Spez" graffiti. Hacker News pointed me to an editorial from 2021 discussing the modern incarnations of analog-era countercultural movements. It was worth the read, though comparing Boyd Rice to Tekashi 6ix9ine was a bit outside my sociological wheelhouse. More relevantly (to me), the author discusses the impact of media platforms on modern movements, how even rebellious speech silently (or overtly) carries commercial branding. Her discussion of the oldweb and modern platforms featured a few familiar themes:

Caroline Busta As online activity began to centralize around search engines, such as Netscape, Explorer, and Google, in the late-'90s and early-'00s, the internet bifurcated into what became known as the "clearnet," which includes all publicly indexed sites (i.e., big social media, commercial platforms, and anything crawled by major search engines) and the "darknet" or "deep web," which is not publicly indexed.

There were also a number of sites that though officially clearnet, laid the groundwork for a sub-clearnet space that we might think of as a "dark forest" zone-particularly message board forums like Reddit and 4chan, where users can interact without revealing their IRL identity or have this activity impact their real-name SEO.

Hey, the dark forest again. I read something about that last year:

Ideaspace In response to the ads, the tracking, the trolling, the hype, and other predatory behaviors, we're retreating to our dark forests of the internet, and away from the mainstream.

Dark forests like newsletters and podcasts are growing areas of activity. As are other dark forests, like Slack channels, private Instagrams, invite-only message boards, text groups, Snapchat, WeChat, and on and on. This is where Facebook is pivoting with Groups (and trying to redefine what the word "privacy" means in the process).

Busta echoes those same sentiments and I'm now wondering where this analogy was first used.

Interestingly, LLMs may have flipped the script on the 'sub-clearnet' being, well, preferable to platforms that require PII. The influx of AIs and script-assisted astroturfers may soon drive users away from platforms where they could simply be interacting with a marketing bot. Suddenly Zuckerberg demanding your driver's license and a genetic sample is a small price to pay for authentic conversation. And that's why Elon wanted to fight him. Or something.

Caroline Busta Taken from the title of Chinese sci-fi writer Liu Cixin's 2008 book, "the dark forest" region of the web is becoming increasingly important as a space of online communication for users of all ages and political persuasions. In part, this is because it is less sociologically stressful than the clearnet zone, where one is subject to peer, employer, and state exposure... One forages for content or shares in what others in the community have retrieved rather than accepting whatever the platform algorithms happen to match to your data profile. Additionally, dark forest spaces are both minimally and straightforwardly commercial. There is typically a small charge for entry, but once you are in, you are free to act and speak without the platform nudging your behavior or extracting further value... [Using these platforms] is therefore not analogous to legacy countercultural notions of going off-grid or "dropping out."

I'm not sure it has to be either 4chan or TikTok. Recommendation engines and visibility algorithms can be really nice to have. But we seem to be moving toward an internet where the price of entry for content recommendations is all of your personal data and a high tolerance for embedded advertisements.

At least until Rob implements my (likely not original) idea to save the internet.

Caroline Busta To be sure, none of these spaces are pure, and users are just as vulnerable to echo chambers and radicalization in the dark forest as on pop-stack social media. But in terms of engendering more or less counter-hegemonic potential, the dark forest is more promising because of its relative autonomy from clearnet physics (the gravity, velocity, and traction of content when subject to x algorithm). Unlike influencers and "blue checks," who rely on clearnet recognition for income, status, and even self-worth, dark forest dwellers build their primary communities out of clearnet range-or offline in actual forests, parks, and gardens... The crux of Liu Cixin's book is the creed, when called by the clearnet: "Do not answer! Do not answer!! Do not answer!!! But if you do answer, the source will be located right away. Your planet will be invaded. Your world will be conquered."

Side note to any hostile aliens: my domain name is a pseudonym.
Fear and loathing in the sub-clearnet

Inside Higher Ed "EJMR is currently melting down with people convinced their careers are in danger, presumably because they've said some very nasty and/or stupid things in locations that will easily identify them," tweeted Ben Harrell, an assistant professor of economics at Trinity University, in Texas. "In the end, nothing of value will be lost."

The synopsis:
It's a fun intersection of cyber and the dark forest but also a cautionary tale about anonymity and doing counterculture (or just plain evil) things. I don't know why, when I read this article, I instantly thought of KO's tales of drama from neuroscience academia.
"No referring sitemaps detected"


Let me briefly pivot to a very specific technical issue I encountered with Google Search Console. This digression is pretty skippable but does fit in to the larger intent of this post. Google Search Console, by the way, is a tool Google provides to let you check if your site has been indexed (is available for search results). It also reports how often your pages or images show up in a search.

I've posted a few times about my mild fascination with SEO (trying to get your website to the top of search results). I have no compelling interest in modifying my site content to be search engine-friendly, but I'm amenable to data shaping that might help guide searchers to any worthwhile information on my site. Most of all, I'm occasionally entertained by the the perspectives of the SEO crowd. So when I saw the Search Console error message, "no referring site maps detected" attached to many of my pages, I indulged a little troubleshooting.

Google Support

There were a few Google Support requests matching my issue from February 2023. I should also note that around the same time I experienced bulk de-indexing.

Jovanovic I am trying to index the URLs from my website, While doing so I keep seeing this error in the Sitemap section "no referring sitemap detected" (as seen in the attached picture). Even though I have a correct sitemap submitted (as seen in the attachment), I keep getting this error. How can I fix this?
Please help
Thanks

I had been briefly worried that my code had somehow fatfingered a url when it generated my sitemap (software is traditionally not susceptible to this physical world issue, but you never know). Nope, Search Console said it parsed the sitemap just fine and ctrl-f confirmed the urls were correct. So what gives? The well-lit forest of Google Support offered a hilarious answer:

JWP Hi Vid

There is nothing to solve here.

The page has been discovered, so Google knows about it. In 99.9% of all case going forward, it'll then just swing by and re-crawl it once in a while (with no reference to the sitemap).

There is a common misconception, that sitemaps are really important [to indexing]. That's not actually the case (certainly for smaller sites with good internal structures). The Google bot is a very capable spider and once it's gained entry to your site and assuming that the pages are inter-linked in a sensible way, it is perfectly capable of indexing the entire site, without any sitemap.

This is exactly how I picture Apple community support. Kind of like Stack Overflow's traditional, "but why do you want to do this?" but with a very heavy dose of brand loyalty. "Ignore the erroneous information, it doesn't matter because Google is like really good at internet."

And it wasn't a one-off, another response to a different support request about the same issue:

Gupta Hello Skinly Aesthetics,

Googlebot found this page from another page and indexed it before crawling the sitemap file. But don't worry because this is not an issue; the purpose of sitemap files is also to help search engines in the discoverability of the pages.

That's probably not true for Skinly Aesthetics and it's certainly not true in my case - my sitemap and most of my urls have been around for years. The weird deference to the platform was pretty offputting, does no one care about inaccurate data? Whether or not a random walk found Skinly's page first, doing a map lookup for that url is not compute-intensive.

barryhunter In practice Google only notes what sitemap(s) the is in when it actually crawls.

So 'URL is unknown' doesn't ever show sitemap (nor referring page!) details.

Until a crawl is actually at least attempted, won't get any details.

Nope. Navigating to an indexed page via the sitemap listing still reports this issue.

/r/seo

The next search result was, some say, the place I should have started. While I found /r/seo's answer to be satisfactory, it was far from definitive. To paraphrase their answer, "Google Search Console is notoriously unreliable."

Oh, it's just unreliable. And notorious for it. He wouldn't say it was notorious unless there was some established consensus on the matter. There could be dozens of people who feel this way, hundreds even.

Anyway, it's a better answer than, "I'm sure everything is fine why do you even care anyway lol?".

The UI answer

The Search Console report item, in its entirely, looks like:

Discovery
   Sitemaps            No referring sitemaps detected
   Referring page      None detected
   URL might be known from other sources that are currently not reported

The fact that this seemingly erroenous information is listed in the 'discovery' section seems to agree with Gupta from above; the url was pulled from some database that doesn't reveal its origination information. That's reasonable enough, though listing each of the possible discovery methods when only one is valid fits neither traditional design norms nor Google's minimalist aesthetic. More importantly, the unused discovery result should be labeled, "this method lost the race to this url" rather than "we didn't detect anything using this method".

Finding answers entertainment on the internet

Google on google grave meme

Google's top results for my issue linked to several February 2023 threads on their support forum. That's theoretically good, except that the answers were all non-answers from "Diamond-Tier Lackey Contributors" and then the threads were locked. And I never did get a definitive answer, though I didn't look too far down the search results. And that's for three reasons:
  1. I didn't care all that much.
  2. I've learned not to look past result #3 with SEO questions because it's either autogenerated nonsense or paywalled.
  3. I got distracted by catching up with /r/seo.
DaveMcGDaveMcG I am amazed by the number on posts on r/SEO where the OP doesn't even use google to answer their own quesitons first.

I did a double take, sprayed my Mountain Dew Code Red everywhere, and had to check the url. Yep, it was a real question. The responses were good:

tmac_79
/u/tmac_79
Everyone knows you can't trust search engine results... people manipulate those.
mmmbopdoombop
/u/mmmbopdoombop
ironically, sometimes the best advice is doing a Google search for your question followed by 'reddit'. We SEOs killed the regular results.

SEO, blogs, and the topography of the realnet

DrJigsawDrJigsaw There is a TON of outdated info about link-building on the net.

Here's what DOESN'T work these days:
  • Forum link-building. Most forums no-follow all outgoing backlinks.

  • Web 2.0 links. People spamming their links on Reddit are 100% wasting their time. Google can tell a user-generated content site apart from all other sites. Hence, links from Reddit, Medium, etc. are devalued big-time.

  • Blog comment links. Most blogs no-follow blog comment links, so that's a waste of time too.

  • PBNs (ish). Well-built PBNs work just fine. The PBNs you bought from some sketchy forum, though, will crash your site big time.


So what DOES work?

Real links from real, topically related websites.

E.g. if you run a fitness site, you'd benefit from getting links from the following sites:
  • Authoritative fitness blog/media

  • Small-time yoga blog

  • Weight loss blog/media/site

I don't know how true this information is, but for the moment I'll proceed as if it's 90% accurate.

I was surprised to see blogs come up on a post about SEO. Sure there are some popular bloggers, but it's weird to me that something like "Britney's Yoga Blog" would elevate a page more than even the smallest of mentions on one of the popular 2.0 platforms. To be quite honest, I'm not sure links from Britney's Yoga Blog should be all that influential to search ranking.

The other way to look at it: if the internet is just social media, corp pages, and blogs/personal sites, blogs are the only places where backlink gains can be made. If you're doing SEO and the search engines have stopped ranking social media, you're unlikely to talk JPMorgan or Ars Technica into backlinking you, so you're left with the amateurweb.

This feels simultaneously cynical and hopeful for the indieweb.
Reddit, AI training data, and monetization

Reddit place fuck spez
Reddit decided on an out of cycle return of /r/place, their crowdsourced graffiti wall. It went as expected.

I read an opinion last month that Reddit's controversial API price hike was about repricing their data for the deep pockets of Big AI. I appreciated the new perspective but didn't personally find the argument very compelling. The simple explanation seemed far more likely and far more on-brand. There may yet be an AI twist, brought about by yet another uncreative attempt by Spez to game Reddit's valuation. From an admin announcement:

Substantial_Item_828
/u/Substantial_Item_828
> In the coming months, we'll be sharing more about a new direction for awarding that allows redditors to empower one another and create more meaningful ways to reward high-quality contributions on Reddit.

Sounds ominous.
Moggehh
/u/Moggehh
It already broke in the APK notes. They're adding tipping for US redditors.
Astro4545
/u/Astro4545
Ew

The APK notes, in part:

Fake internet points are finally worth something!
Now redditors can earn real money for their contributions to the Reddit community, based on the karma and gold they've been given.
How it works:
* Redditors give gold to posts, comments, or other contributions they think are really worth something.
* Eligible contributors that earn enough karma and gold can cash out their earnings for real money.
* Contributors apply to the program to see if they're eligible.
* Top contributors make top dollar. The more karma and gold contributors earn, the more money they can receive.

Incentivizing popular content creation sounds like a strategy to overcome the post-blackout brain drain while creating a new revenue source. On the other hand, creating an influencer class to replace the powermod class might accelerate the platform's downfall.

The horse and cart may be reversed here. Perhaps Reddit was long planning to introduce this tipping system but first needed to cut off API access so not everyone could create an LLM-trained money farmer. Scraping and data mirrors would still be viable, but not as easy as a pushshift query.

Reddit might also create a few bots of their own. Having Redditor-surrogate LLM bots would (in their mind) increase engagement and net them tips that don't have to be shared with flesh and blood users. It wouldn't be unprecedented. Something that came up in API controversy (but was largely eclipsed) was that Reddit had quietly been translating popular English-language content to other languages and regions. When asked by moderators, the admins said, "hey this is popular and we thought we'd see if it was of interest to your community". It's not strictly a bad idea but bore all the hallmarks of a revenue-motivated engagement experiment.
Speaking of in-house AI, X


NBC News On Sunday, Twitter CEO Linda Yaccarino said the branding change will introduce a major pivot for the microblogging platform, which she said will become a marketplace for "goods, services, and opportunities" powered by artificial intelligence.

"It's an exceptionally rare thing - in life or in business - that you get a second chance to make another big impression," the chief tweeted. "Twitter made one massive impression and changed the way we communicate. Now, X will go further, transforming the global town square."

Welp, he said he was creating a WeChat knockoff (er, "X, the everything app") back in October. It's just kind of funny that the bird logo was replaced on a Sunday afternoon but everything else still said Twitter/tweet/etc.

PaulHoule The next question is: "Is he really serious about the super app?". The horror is that he probably is, but what business wants to deal with a mercurial leader who might stop payments, pay people extra, or impound money in your account for no good reason. What business is going to want to put an "X" logo up by their cash register when it means they are going to have arguments with customers. (I bet it will be a hit for "go anti-woke and go broke" businesses though.)

I, for one, can't wait to contact Xpay customer service about fraudulent activity on my Xaccount only to receive a poop emoji reply.

heyjamesknight I've been mostly ambivalent about the Musk-era at Twitter-mostly because I just don't care enough to have an opinion.

This, though. This one makes me angry and disappointed.

Twitter has had such a solid brand for so long. It's accomplished things most marketers only dream of: getting a verb like "Tweet" into the standard lexicon is like the pinnacle of branding. Even with all of the issues, "Twitter" and its "Tweets" have been at the core of international discourse for a decade now.

Throwing all of that away so Elon can use a domain he's sat on since '99 seems exceedingly foolish.
Back to SEO

Self-licking ice cream cone

From a Hacker News link:

Izzy Miller Last week, I found a glitch in the matrix of SEO. For some reason, every month 2,400 people search for the exact string "a comprehensive ecosystem of open-source software for big data management".

And weirdly, there are ~1,000 results for the exact query "A comprehensive ecosystem of open-source software for big data management". This is at once a weirdly small and weirdly large number- small because most Google searches have tens of millions of results, but large because most Google searches for exact string matches of that length actually turn up few, or no results. So there's something to this phrase.

This means that thousands of students started searching for "a comprehensive ecosystem of open source software for big data management" every month as they studied for their final IoT exam. And the SEO analytics dashboards noticed.

The really interesting thing about this case though, is that the original source content driving this search interest is not publicly available or indexed. This query is copied and pasted verbatim from an exam, which are famously not something you want to be found on Google.

To summarize Miller's conclusion: web marketers noticed that the test question search term was unexploited territory so they quickly spun up autogenerated pages for this highly specific phrase.

It seems that in addition to a dark forest there is a well-illuminated forest full of plastic trees.

Izzy Miller This is only going to get weirder with LLMs, at least in the interim before everyone stops using Google. There's now a ton of tools that automatically find low competition keywords and generate hundreds of AI blog posts for you in just a few minutes.

The real problem is just the lack of alignment these articles have with search intent- if you want people to land on your site and remember you favorably, you should just answer their question. Everything else is extraneous. I don't have high hopes for AI accurately determining the intent of all the strange keyword combinations out there, and so I expect we'll see more and more of these glitches.

Perhaps, only 4chan can save us with their reputation for kindness and straight to the point questions and answers...
"Accurately determining the intent" and a Reddit bamboozle

Forbes As someone who writes about video games for a living, I am deeply annoyed/terrified about the prospect of AI-run websites not necessarily replacing me, but doing things like at the very least, crowding me out of Google, given that Google does not seem to care whatsoever whether content is AI-generated or not.

That's why it's refreshing to see a little bit of justice dished out in a very funny way from a gaming community. The World of Warcraft subreddit recently realized that a website, zleague.gg (I am not linking to it), which runs a blog attached to some of sort of gaming app which is its main business, has been scraping reddit threads, feeding them through an AI and summarizing them with "key takeaways" and regurgitated paragraphs that all follow the same format. It's gross, and yet it generates an article long enough with enough keywords to show up on Google.

Well, the redditors got annoyed and decided to mess with the bots. On r/WoW, they made a lengthy thread discussing the arrival of Glorbo in the game, a new feature that, as you may be able to guess from the name, is not real.

Never thought I'd lol side by side with a Blizzard fan.

The Portal Reddit user malsomnus hails it as the best change since the quest to depose Quackion, the Aspect of Ducks.
Moment of zen

Waynes World head bang Bohemian Rhapsody downsampled

Blogroulette sent me to this awesome post about a game I now want to play:

merritt k Explaining All of the Fake Games From Wayne's World on the SNES

Or maybe you instead decide to create an elaborate framing story where Wayne and Garth are talking about terrible games they've been playing at Noah's Arcade, the company that sponsors their show in the film. That's what Gray Matter did back in 1993, opening the game with some pixelated approximations of Mike Myers and Dana Carvey listing some titles with hacky joke names in a bit that somehow sets up the entire bizarre adventure.

It's kind of a bold move to open a terrible game with a list of fictional terrible games, but what exactly were these titles supposed to be? Here are my best guesses.




2023.07.16

Cascadia

A summer trip to Washington State.
2023.08.02

Red sky in morning

Oppenheimer and choice quotes about the recent web2 happenings.


Related / internal

Some posts from this site with similar content.

Post
2023.11.20

Subsurface

Links pages, webrings, and search.
Post
2023.12.30

Feature complete

My static site generator can now recommend external blog/smallweb posts with similar subject matter.
Post
2022.05.07

90s aesthetic

The indieweb and blogging with a couple of webring rabbit holes. RSS with source and a small Elden Ring gallery.

Related / external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

Has a preview image link and yet 404 :/
www.vincentschmalbach.com

Google Now Defaults to Not Indexing Your Content - Vincent Schmalbach

Picture this: It's ten years ago, and you've just launched a new WordPress blog. Within hours, sometimes even minutes, your content is indexed by Google.
Has a preview image link and yet 404 :/
ahrefs.com

The Beginner's Guide to Technical SEO

Technical SEO is the process of optimizing your website to help search engines like Google find, crawl, understand, and index your pages.
Has a preview image link and yet 404 :/
www.theverge.com

How Google perfected the web

Google has dominated the search market for decades, leading to a web filled with SEO-driven content. With generative AI on the horizon, this could all come crashing down.

Created 2024.10 from an index of 421,329 pages.