From Votes to Vetoes
The early 21st century web championed human signals and "democratized" truth. This decade, Generative AI acts like an oracle and favors technocracy, not democracy.
Colligo is a reader supported guide to artificial intelligence and digital technology, from the perspective of human culture and human values. If you want to support my work, the best way is with a paid subscription. Sign up here:
Gain exclusive access by inviting your friends to subscribe with your unique referral link.
Greetings Colligo readers,
I’ve promised another post about the inner workings of LLMs, and I intend to get to that soon. Here I’d like to step back and talk about one of my favorite topics, the growth, evolution and devolution of the web since the early days in the 2000s. The broad sweep of web history has been from democratic hopes and dreams to more technocratic and anti-human technology and thinking. Yes, yes, it’s not all bad—I agree. But the trend is here to see, and it’s troubling. We should be aware of it.
I hope you enjoy.
What I Worried About in 2005-ish
In the mid-2000s I was in Austin, Texas working on information extraction and sort of obsessed with improving search. Google, of course, had captured general web search. I had discovered Google relatively late, in 2000 while working at an early AI company called Cycorp. PageRank was, as everyone now knows, a kind of “super layer” on top of something called “term frequency - inverse document frequency” (tf-idf), a kind of information retrieval Holy Grail used by every early search engine on the web and everywhere else. Google used tf-idf too, but they added a recursive link computation that counted up the “votes” for a web page based on HTML links to it, and HTML links to the linking web page, on and on (hence, “recursive”). That was a brilliant idea. I still think the PageRank algorithm developed by Google founders Larry Page and Sergey Brin ranks (no pun) among the deepest and most important innovations on the web, right up there with HTML itself. Search engines like Alta Vista didn’t really work before Google came along. They were content based—looking at frequencies of terms and gauging the relative importance of terms across document sets. That left wiggle room for lots of false positives. It left lots of room to retrieve crap.
PageRank
Even earlier than Google, one of the first commercial web companies, Yahoo!, looked at the then more manageable world wide web and tried to curate it by top-down categorization. It was a useful site when the web had tens of thousands of scattered pages. It wasn’t so useful when it had millions and exponential growth. The first break from the old order, then, came with tf-idf. The break was complete—we were in a new world—with PageRank. Suddenly every library was, seemingly, at your fingertips. Yahoo! tried to buy a fledgling Google; the founders (wisely) passed. The rest is history.
YouTube launched in 2005 and was snatched up by a now web giant Google in 2006. Around that time, as I was futzing with ideas for a “Semantic Web” and working on faceted search and other curiosities (alas), it hit me—to borrow Queensryche lyrics—like a “two ton heavy thing,” that the web had slowly at first but then very quickly embarked on a project of redefining relevance and authority not as expertise but as popularity. Everyone “voted” on the new web, dubbed “web 2.0.” Pushing a “Like” button was a vote, on news aggregation sites like Digg or video platforms like YouTube. I remember I was obsessed at the time with this question: “But popularity isn’t truth. How do we determine what’s true? What’s real?” No one seemed to care.
Keep reading with a 7-day free trial
Subscribe to Colligo to keep reading this post and get 7 days of free access to the full post archives.