Decentralizing the Web: Paths, Dangers, Strategies

Today's AI centralizes power and influence in the hands of a few. Federated approaches to training AIs--and other ideas--might be healthier.

May 05, 2024

Hi everyone,

I stumbled upon federated learning—technically, not a new idea, but not hugely discussed—when ruminating on how we might decentralize the web. To date, “decentralizing the web” has been hijacked by the cryptocurrency and blockchain crowd, and while there are some bright spots in that tangle of voices and incentives, fresh ideas about breaking the choke hold on AI from Big Data and Big Tech are few and far between. This is an open-ended post where I dig into some possibilities.

Friedrich Hayek

The economist Friedrich Hayek argued in the last century, in tomes like The Constitution of Liberty and others that decentralized “signals” about goods and services and ideas maximized the chance of getting good information, and that this enabled markets to operate efficiently without centralized control. He was attacking the Soviet model, which relied on central agencies and plans like GOSPLAN and the bureau of statistics to apportion goods like potatoes to villagers across its vast geographic expanse. Jolly ideas, here: all that central planning left food rotting or denied it for reasons only known to someone else. And people got hungry. And people stood in bread lines. Decentralized information is at the heart of market-driven dynamics and it’s really at the heart of the idea of democracy (How do I vote? Tell me. No. You decide.) How do we decide who to vote for? Don’t ask the Bureau of Statistics in Washington. Don’t ask Big Tech or OpenAI either.

Think about the world we have today. More and more, we’re receiving our information from a few choke hold points—companies like Google, Meta, OpenAI and Microsoft. We now get answers from virtual oracles like ChatGPT, in effect, trained outside our supervision and reinforced for some answers and not others without our knowledge. It’s not that such technologies don’t “work” (but that’s a major problem too). It’s that they are far too powerful for us not to question. And better than questioning the specific technologies, we should question the entire centralized big data, we own your information, we decide what the web will do for you approach. We have to question Big Tech today, just like concerned citizens would have been wise to question Standard Oil over a century before. That’s the job of a free society. Not acquiescence. Investigation.

Innovation—disruptive innovation—is still possible. GMC, Chevy, and other Big Automakers made huge cars literally with fins on them guzzling poisonous lead gasoline, and if we were to stand there in the 1950s and contemplate Teslas, it wouldn’t seem possible or real. But those cars are gone, and Teslas are here (not that Tesla is the be all and end all of innovation). Big Tech companies using Centralized Big Data to power the web is a phase. Those big tech companies don’t have an incentive to change anything since they’re the Standard Oils, the monopolies of our day. They’re the ones making all the money and deciding what sort of political slant works best with “AI.” So, they’ll sit there. Innovation happens among the peeps.

Why is the CBD “Centralized Big Data” model dangerous? A short history of human mischief tells us. Flip a switch, and something we thought was true will become false, or just disappear from search results. A position or idea or candidate we like will be maligned or ignored. Misinformation will flow without our awareness (it already does). Big tech companies are—and this is hugely arguable—mostly on board with the democracy game and different political and other viewpoints today. What about tomorrow? We should rethink the web. Believe it or not, big ideas can and do make a difference.

Federated Learning: A Small Change in the Right Direction

In federated learning, edge devices like desktops, laptops and even cell phones (as in “A” in the graphic) contribute data for local training on the device. The data used in training is submitted and controlled by its owner, and private or secure information stays on the device—it is not transferred to a central server. The results of the local training using local information are transferred to a central server—but not any of your data. It’s not a perfect solution, since in addition to problems like latency and distributed compute expense, there are ways to interrogate centralized models to ferret out details of the training data. But it’s much better than our current fully central approach. Large models can be trained by users who decide to collaborate to train them.

I started looking into federated learning listening to a podcast with author and Georgetown computer scientist Cal Newport, who has written and spoken extensively about the problem of centralization but in particular “global conversational platforms” like Twitter/X, Facebook/Meta and others. Five hundred million users on one platform is, let’s say, anti-Hayek. Un-Hayek. The only conceivable way to “curate” content with hundreds of millions of souls is by using algorithmic curation. That’s what social media sites do today. And they’re good at it. But “algorithmic curation” is from a central source on the platform, and it means that Big Brother AI will be showing us what we see. Trends on such platforms reflect global influence, so local information flows get discounted. It’s not the coordination of people that counts, it’s individual messages that get noticed by the “eye in the sky.” A (let’s say) tweet has to “break out” and become global for it to be important. All those well-meaning souls end up playing to the gods of AI, hoping to get in on the global popularity game.

What sort of content gets promoted? Shitty content. Highly politicized. Etc. Etc. So global platforms place downward pressure on local intelligent human interaction. The owners of the platforms get rich off of advertising, targeted to everyone because every move you make is getting logged and analyzed. Everyone else plays “look at me, look at me!” games like monkeys. It’s a common problem now in the US and the West, and it creepily resembles the central control networks of autocratic regimes like the former Soviet union—just replace “AI” with a Ph.D. in a suit, sweating out his decisions under the watchful gaze of party members in Moscow (there are many differences, but centralized data capture and decision making is the key similarity, and it’s a big one.) Newport (and me, and others) points out that the original vision of the web was diametrically opposite: we would have networks of networks, with loose ties between them, an approach that increases the availability of good information and empowers individuals, rather than Politburos.

Back to decentralizing AI.

The Semantic Web

Free Editable Semantic Map Graphic Organizer Examples | EdrawMax Online

The “Semantic Web” is Dead. Long Live the Semantic Web.

The “semantic web” was a hot topic in the 2000s, backed by none other than world wide web inventor (Sir) Tim Berners-Lee. In its original guise, the next phase of the web, the semantic web, would empower local, individually owned AIs—one for each of us—called “intelligent agents.” Your agent would know a lot about you: your personal data like food allergies, where you live, bus routes to your favorite locales, vacation hotspots, friends, social pursuits, lunch hangouts and wine preferences. All that would reside with you—the user—and wouldn’t be hoovered up by the likes of a-holes in Big Tech, intent on stealing all your private information in order to deliver its value back to you in the form of personalized services and recommendations, customer service support, and all the rest. There was a problem, of course. Houston, we do have a problem.

Your personal information wouldn’t be very useful for your own, personal intelligent agent (AI) if it couldn’t do much with it to improve and empower your experience on the new web. On Berner’s Lee’s new web, there are no central repositories of data, like today. There are just “networks of networks,” of friends, family and colleagues. Of Whomever you want to interact with. There are just people doing what they please, free from Centralized Big Data.

Problem? Where does the “intelligence” to power your agent come from? Berners-Lee’s answer was to, in effect, re-write the web, by encoding simple logical statement into web pages.

Statement (see graphic): “A Wolf is an animal that is a relative of the Coyote"

Statement: "If a person has a parent who is a doctor, then that person has at least one family member who is a member of the medical profession."

A “semantically enhanced” web page might then announce itself in a logical language that your buddy agent could read: “oh, I see, the movie Oppenheimer doesn’t start until 11:50… what’s the price of two tickets? 26 bucks. Gotcha.” Your agent would read the computer-ese on the new web pages, do some local calculations, and deliver the same personalized services and recommendations that we now get from a handful of monopolistic companies that have, in effect, taken all our data and eliminated the private and personal control. Problem? It never worked. It very likely never can work, because there’s really no way to get billions of web pages converted from (very messy now) HTML to some other markup that includes explicit—even simple—computer-readable, or in other words logical statements.

(There’s another problem that I will mention here briefly, but I cover in more detail in my book. The statements are basically propositional logic statements, and so what they express isn’t context-sensitive. The information would either sometimes—often, I think—be irrelevant to a particular information request, or it would be relevant only by constantly updating it or somehow including a context. This is too deep in the pool for this discussion, however.)

So. This never worked, for one because the amount of work required to convert the web into computer-readable stuff outweighed the total value us humans would get back. Or more to the point: no one would be willing to write these statements into web pages to somehow cover an exponentially exploding web (might, today, an LLM do it? Hmmm.).

Again, we get all this “semantic web” functionality today because there’s a server farm flickering the lights out in a large city somewhere that contains all this information as data in database tables. It’s just that now, your little buddy agent with all your personal information isn’t residing with you. He’s not your buddy. He works for Sam Altman. Larry Page. He’s part of Big Tech. We now make requests to the Lords of Tech, and they deliver this same type of information, yes. But at the obvious price of our autonomy and independence and the health of our society and information space. And now my question is: are we really stuck like this, on the web, forever? I’d say no. But we must first try to start thinking about alternatives that might work. And, today, we have many more tools at our disposal than the Tim-Berner’s Lee’s and the other smart folks did when first proposing their alternative vision of the web.

A Sketch of a New Way

Let’s assume we can work out the details of keeping our data with ourselves and our own devices, and contributing what we wish for training “models.” Let’s assume, in other words, that we can work out the kinks in federated learning using decentralized data. What then? We still have a central model that combines the results of all the locally trained models. But let’s import Newport’s “network of networks” idea the discussion of AI and LLMs and training models. Suppose we have a hierarchy of models, with some models “centralized” only relative to a community of interest’s “network” of users and data—not globally. Suppose those models service those communities, and reflect those communities’ choices and preferences. Suppose Sam Altman doesn’t own those “intermediate” models. Something like this—my guess is—could probably work, particularly if some big infrastructure and solutions company like IBM provided the software glue to make it all happen—without requiring that you upload anything to their cloud. Or startups could provide innovative end-to-end solutions. Or what have you.

Now suppose we still have the big mega-models like GPT-4, and GPT-5. A community could query those models using an encrypted channel (we can do this today of course), but the calls to those models would be for information that augmented or was somehow useful given collaboration and interaction occurring at the local level. You’re not asking the Big Tech Model to solve every locally-defined problem for you. You’re asking it for a fact you don’t have but need, for your and your community’s purposes. Perhaps we can build, too, virtual models that are distributed all over many user’s computers, in the spirit of blockchain, but without the downside of Sam Bankman-Fried and Effective Altruism orgies. Something “blockchain-adjacent,” that captures decentralization and chucks all the other stuff. Possible? Of course.

These are open-ended thoughts. More like intuition pumps. Who knows if these are the best ideas, or if there are still other ways of federating and distributing our activities on the web so that we can use myriad intelligent inputs, like the old hoary economists like Hayek persuasively argued for but, in a different but quite similar context, or if we’re doomed to reproduce really crappy, controlling, centralized approaches to living our lives on the web.

Final thought. Does all this techy-stuff matter, anyway? Yeah. Wait until l the elections, and see what new shenanigans are in store for us. The piece of the pie that’s all tech trouble keeps growing. We must demand and create a better information space, if we expect more from our democracy and our future.

Erik J. Larson

Eric Dane Walker

May 5, 2024

What an interesting post! It set my subsidiarist sensibilities fluttering.

If I'm understanding you, you're arguing for an arrangement whereby the algorithms that determine what news or social media posts I see on my device are trained on what I and my fellow community members let them be trained on. And what we deem training-worthy — what we let the algorithms train on, what data we choose to share — will be determined by criteria shaped more by our local interests than, say, by what is in the interest of a monstrous profit-seeking entity. And the distinction is not simply between local and global: there are intermediary spheres governed by less and less locally trained algorithms.

Have I understood you? Assuming so, let me register a question.

One of Hayek's concerns about centralized control over distribution (of any good) was that such control was liable to unaccountable capture by private interests. Hayek wasn't simply concerned with efficient signaling and information flow. He was concerned that public and common interests would be subverted.

My question is this: if local interests are shaping the informational landscape, do you think there's any less risk of those local interests being private interests as opposed to public and common interests?

Expand full comment

2 replies

Martin Anantharaman

May 7, 2024Edited

Very interesting - as an approach (so I understand it) to structure (semantically) and federate/distribute "knowledge" "democratically", i.e. without intransparent, biased, self-serving moderators. But the question remains what the value of such "knowledge" is - is that value just our personal judgement - or majority-vote? I am reminded of the age-old scentific method that works by tacit consensus among a peer-group of accepted authority - but it remains confined to those with sufficient training, knowledge and research-experience - in silos of specialization. Then, any one of those silos can overwhelm us - become a life-task to the exclusion of anything outside it - so missing almost everything that really is important in life. But who can judge what that is, even in an enumerative (so not deeply analytic) sense? That has always been shaped explicitly/implicitly by an exclusive elite of "cultural leaders", ideally well-versed in matters of the world BUT ALSO philosophy (which I understand as science outside the silos) - AND spirituality (which informs us about purpose and meaning) - but, in practice, those personifying the aspirations arising from our own inadequacy and frustration. A cryptic quote I noted today on X comes to mind: Does the slave dream of being free - or becoming a slave-owner?

So, yes, federated and democratic learning has it's value - but only if it has mechanisms that assure the quality of knowledge - as the scientific method does - and respects meaningful spiritual goals - and putting all that together could shape culture in a new way.

18 more comments...