Infopost | 2023.12.20
|
Marginalia |
This is a write-up about an experiment from a few months ago, in how to find websites that are similar to each other. Website similarity is useful for many things, including discovering new websites to crawl, as well as suggesting similar websites in the Marginalia Search random exploration mode. The approach chosen was to use the link graph look for websites that are linked to from the same websites. This turned out to work remarkably well. |
Marginalia |
In plain English, this service looks at which websites link to a particular target website, and then it ranks websites that are popular among those linking websites using a method commonly used in recommendation algorithms. In technical jargon, it reinterprets the incident edges in the adjacency matrix as sparse high dimensional vector, and uses cosine similarity to find the nearest neighbors nodes within this feature-space. |
Marginalia |
As a whole the feature shares a lot of similarity with how you would construct a recommendation algorithm of the type "other shoppers also bought", and in doing so also exposes how creepy they can be. You can't build a recommendation engine without building a tool for profiling. It's largely the same thing. If you for example point the website explorer to the fringes of politics, it will map that web-space with terrifying accuracy. |
Marginalia |
qanon.pub's neighbors Note again how few of those websites are actually indexed by Marginalia. Only those websites with 'MS' links are! The rest are inferred from the data. On the one hand it's fascinating and cool, on the other it's deeply troubling: If I can create such a map on PC in my living room, imagine what might be accomplished with a datacenter. You might think "Well what's the problem? QAnon deserves all the scrutiny, give them nowhere to hide!". Except this sort of tool could concievably work just as well as well for mapping democracy advocates in Hong Kong, Putin-critics in Russia, gay people in Uganda, and so forth. |
bikobatanari |
The Personal Web, to many people, only exists in a select few places. It could be solely sites on Blogspot, or Neocities, or some other adjacent platform, and that to them is the "Personal Web". However, once you've exhausted these places and found the sites that you find interesting, it's extremely difficult to figure out where to go next-to go to some unknown territory that you don't even know exists. For myself, I've browsed Neocities for what seems like four years now as of writing this. I've seen many sites come and go-some plenty interesting, and others not at all. And even now, with plenty of sites that I don't recognize, I've become rather jaded. It's hard for me to find sites that pique my interest anymore-and if they do, it's hard to find them actually being updated or not be completely barren. All of this is what led me to going on excursions to places that not many people have gone to. This is difficult in and of itself since funnily enough, Neocities users tend to link to only Neocities users and no one else. Despite many of its users being against walled gardens, it ironically became one itself. |
bikobatanari |
From what I've seen, [doujin sites are] structured by various search engines whose sole purpose is to index personal Japanese sites and nothing else; by "index", what is really meant is people register their own personal websites onto the engine-sort of like a glorified link directory. Its scope is even narrower than that of Neocities and other hosting platforms because sites with more formal contexts (such as business sites) are not even allowed in these spaces. This got me curious then: how differently do people over in the East Asian sphere (primarily Japan) handle personal websites compared to the West? |
bikobatanari |
Something that I've noticed in general is that the personal sites over there tend to be very creations/product focused. That is, their sole purpose is to show off things that they've made, rather than embody some sort of persona. Even the site topic distribution makes this evident. The front page of a search engine that specializes in doujin sites called [Yorozulink when romanized] has sectioned off registered sites into categories, and the visual arts trumps practically every other category. An overwhelming majority of these sites' admins post illustrations, lots of them post their own mangas (original or derived from an existing series), write novels and stories, and indulge in a lot of other creative hobbies. Personal diaries and blogs do exist, but I don't think it's as ubiquitous there as it is compared to the West. |
bikobatanari | Usually the design is all coded by hand, and templates are, in a way, frowned upon. But relating to the creations-focused philosophy that a lot of these sites adhere to, the design of many of these sites are actually rather... tame. Minimalistic, even. Portfolio-like. Designs that showcases their work rather than ones that potentially take away attention from it. This is despite the fact that they're actually not portfolio sites. |
bikobatanari |
A term which I've encountered quite a bit on Japanese personal sites [translates to] simply "search avoidance". Essentially, there are plenty of personal sites that go out of their way to make sure their space doesn't get spotted or picked up by search engines; and not the search engines that index these types of sites mind you (like the ones I linked to in the beginning of this article)-they're explicitly talking about search engines like Google, Bing, Yahoo!, etc. These sites will have a disclaimer saying that their site "avoids search", and more often than not they will also add an additional disclaimer saying that their site is not allowed to be linked on SNS (basically their shorthand for social media). |
Amy Hoy |
[In the 90s], we didn't have platforms or feeds or social networks or... blogs. We had homepages. |
Amy Hoy |
Homepages had a timeless quality, an index of interesting or useful or relevant things about a topic or about a person. You didn't reload a homepage every day in pursuit of novelty. (That's what Netscape's What's Cool was for!) Chronological content was in the minority. The Internet at the time was largely populated by academics, professionals, and college students. Not everyone had the desire to publish their angsty poetry, sexcapades, or surfing habits on a daily basis; the other limiter on chrono-content was the sheer time and energy it required. Diarying was a helluva lot of work. First you had to have something to say, then write, edit it, format it, add clip art, edit your index.html, edit any prev/next links, check those links, and lastly, upload the files. |
Amy Hoy |
And once you've had a taste of effortless updates, it's awfully hard to go back to manual everything. So they didn't. And neither did thousands of their peers. It just simply wasn't worth it. The inertia was too strong. The old web, the cool web, the weird web, the hand-organized web... died. And the damn reverse chronology bias - once called into creation, it hungers eternally - sought its next victim. Myspace. Facebook. Twitter. Instagram. Pinterest, of all things. Today these social publishing tools are beginning to buck reverse chronological sort; they're introducing algorithm sort, to surface content not by time posted but by popularity, or expected interactions, based on individual and group history. There is even less control than ever before. |
bikobatanari | Another issue which holds for platforms with this type of structure (especially IG, Twitter, and Tumblr) is that looking back through another person's archived works is an absolute chore. If you want to look for a particular piece of work in someone's account, have fun wading through years of work in reverse chronological order just to find it. Because of this, people just end up resigning to have content spoonfed to them through the feed, as opposed to searching for all of the hidden gems that have long since disappeared from the public eye. It's a real shame, because there are possibly plenty of great works that will not be seen ever again because it's such a nightmare digging through all of this stuff just to find something specific. |
bikobatanari |
In a way, this article has influenced my website's structure in some way. If you were here to see my website a few months ago, you would find that my articles page used to be entirely in reverse chronological order. It wasn't until November that I started categorizing my articles into separate topics, and I think that small little change has done wonders for both myself and for those who want to read something more specific to their interests. The more I think about it, the more I see that the rise of chronologically ordered content for all of these platforms has impacted content creation in a way which I think is detrimental. Not only has it affected a piece of content's lifespan and long-term influence, but it has also normalized a structure which doesn't suit the majority of content in the first place. |
bikobatanari | I do want to note that the feed itself isn't bad. It has its uses. Blogs and journals work perfectly fine with a feed. The main gripe that I have with it is that with the normalization of using social media as a platform for content creation, the feed became the structure which everything was forced into, regardless of what type of content it is. |
Ludic |
One of the keynote speakers runs a major customer loyalty program, which as a non-specialist I believe is code for "we sell all your purchasing data in the hopes that people who can't do math don't realize our rewards are worth like $200 over your entire lifespan". If you are a specialist, I will accept corrections but also, I dunno, fuck you on principle I guess. You might not deserve that, but it's a Monday and I just had to go through standup. This person breathlessly took the stage and spoke happily about how they've had almost 10% year-on-year growth because of the crippling increases in rent and groceries driving the working class to seek savings wherever they could. Very cool and normal, and also fuck you on principle, even if it isn't a Monday. They then continued to talk about the thrill of seeing that family finally purchase that vacuum cleaner that was always aspirational. Again, fuck you, and also I hope you fall down a flight of stairs. I swear to God, I can't even imagine what kind of defective software you have to be running in your brain to be that tone-deaf, but I was deeply concerned to see this is what our bajillionaire class is doing. It's a super concerning blend of being a complete sellout and too goddamn stupid to even hide it well. How hard is it to get on a stage without sounding like a Disney villain? |
Ludic | I think the worst part was realizing that this didn't flag for some people in the audience, even techies. Some part of their brain just turned off and went "10% year-on-year growth? That's money. And look how important that person on the stage is! I wish I got attention!" |
Ludic | I am not sure how to describe this rationally so I'm not going to try, but the air felt like someone had been operating some grease-filled humidifier, and I think this hit me because I walked in and immediately saw the event was sponsored by some dipshit crypto application. The funny thing is that rather than having blind hatred, I read Mastering Ethereum for a bit because it would have been so convenient if I could actually just print money by finding some crypto use case that I'd be morally okay with, and I just couldn't. So rather than blind hatred, my hatred has intense visual acuity. |
Ludic |
We were then approached by a guy, who we will call Henry, that immediately blasted us with totally unsolicited advice on how to get our own business off the ground... he makes sure that we have his cards as we spend twenty minutes trying to extricate ourselves. He seems like he learned his social skills from Dale Carnegie, which is forgivable, but he thinks he's better than us, which is possibly but true but not forgivable. |
Ludic | That's right, there's a whole genre of corporate fanfiction out there. Was it useful to read? Yes. Does it miss some of the real barriers to organizations improving? Yes, which I should talk about in another article. Was it cringe-inducing at points? Hell yes. |
Nicole Express |
The other thing that makes Tengen stand out is how they broke the Nintendo's lockout chip. Modern consoles maintain their lockout using cryptography. But in the 1980's, that would get your console classified as a munition, and the NES' 1.7MHz CPU would struggle to implement anything regardless. Plus, cryptographic locks had no legal force at the time; this wouldn't be the case in the US until the 1998 Digital Millennium Copyright Act. So instead, Nintendo developed a small microcontroller, which implements a program of sending random numbers back and forth. The microcontroller in the console compares with one in the cartridge, and if their numbers don't match the expected pattern, it resets the console every second, preventing gameplay. The chip is configured so the program can't be dumped easily, and if you do dump it, the program is protected by copyright law. Here's Camerica's 1992 release Micro Machines. You might notice some circuitry in the corner; what this is is actually something we've covered on this blog before, a charge pump that produces a negative voltage from the console's 5V input. When the console turns on, a negative voltage spike is sent down the reset line of the lockout chip, frying it long enough to break its program and cause it not to reset the console. |
◄ |
2023.12.11
For the eyesThe first hour of Remnant II and the end of Act II of BG3. |
2023.12.30
Feature completeMy static site generator can now recommend external blog/smallweb posts with similar subject matter. |
► |
2023.11.20
SubsurfaceLinks pages, webrings, and search. |
2023.12.30
Feature completeMy static site generator can now recommend external blog/smallweb posts with similar subject matter. |
2023.07.02
Feature requestI just want to link other people's blogs. |
paulrobertlloyd.com
The web we want: A beginner's guide to the IndieWeb · Paul Robert LloydHow to build a place on the web where you can own your identity, control your content and create whatever the hell you like. |
mxb.dev
The IndieWeb for Everyone | Max BöckMany people are looking for alternatives to Twitter. Can the IndieWeb step up? How can we build better social media for people without technical knowledge? |
indieweb.org
POSSE - IndieWebPOSSE is an abbreviation for Publish (on your) Own Site, Syndicate Elsewhere, the practice of posting content on your own site first, then publishing copies or sharing links to third parties (like social media silos) with original post links to provide viewers a path to directly interacting with your content. |