The Decentralised Inference Thesis | Arxonic

The Concentration Problem

The AI inference market is valued at roughly $118 billion in 2026 and is projected to more than double by the end of the decade. Inference workloads now account for approximately two-thirds of all AI compute, up from a third in 2023. This is the single largest computational workload humanity has ever produced, and it is overwhelmingly concentrated in the hands of a few providers.

Today, if you want to build an AI-powered product, you rent inference from OpenAI, Google, or Anthropic. You accept their pricing, their content policies, their data retention practices, and their terms of service. You build your business on their API and hope the rules don't change.

This is not a theoretical concern. Developers have been deplatformed overnight. Pricing has shifted without warning. Content policies have tightened in ways that broke production applications. Every builder on a centralised inference API is one policy update away from a broken product.

The parallel to early cloud computing is instructive. In the mid-2000s, compute was concentrated. AWS had overwhelming dominance. The conventional wisdom was that scale effects would make centralisation permanent — that nobody could compete with Amazon's infrastructure advantage. Instead, compute unbundled. Specialised providers emerged. Multi-cloud became the norm. The market grew so large that even small percentage shares became enormous businesses.

Inference is following the same arc, with one critical addition: privacy. When a user sends a prompt to a centralised provider, that provider sees the prompt, processes it on their hardware, and in many cases retains the data. For consumer use cases this is uncomfortable. For enterprise use cases involving proprietary data, legal documents, or medical records, it is increasingly unacceptable. For autonomous AI agents that need to transact and reason independently, relying on a centralised provider that can censor, throttle, or surveil their operations is a structural limitation.

The demand for inference that is private, permissionless, and not subject to a single entity's content policies is not a niche preference. It is a structural requirement for the next phase of AI adoption.

Open-Weight Models Changed Everything

Decentralised inference was technically possible before 2023, but it wasn't practical. Without access to competitive models, a decentralised network could only serve inferior outputs. No developer would accept worse results for the sake of decentralisation alone.

The release of Llama 2, followed by Llama 3, Mistral, DeepSeek, and dozens of other open-weight models, changed the calculus entirely. For the first time, models that approached frontier performance were available for anyone to host and serve. The quality gap between open-weight and closed-source models, while still real for some tasks, narrowed dramatically. For the majority of inference workloads — chatbots, code generation, image generation, summarisation, translation — open-weight models are now competitive or superior.

This is the prerequisite that makes everything else possible. Without open-weight models, decentralised inference is an ideology. With them, it is an architecture.

The implications compound over time. As open-weight models continue to improve, the performance ceiling available to decentralised networks rises. Meanwhile, the cost of hosting and serving these models falls as hardware improves and optimisation techniques advance. The economic case for decentralised inference strengthens with every generation of open-weight models released.

Mapping the Value Chain

To understand where value accrues in decentralised inference, it helps to decompose the stack into distinct layers. Each layer has different economics, different competitive dynamics, and different investment implications.

The Hardware Layer consists of GPU operators who contribute physical compute to the network. They provide the raw processing power that makes inference possible. This layer is necessary but structurally commoditised. GPUs are fungible. Any provider with the right hardware can serve the same models. Competition drives margins toward the cost of electricity and hardware depreciation. Individual node operators may earn attractive returns in early-network incentive phases, but as supply scales, margins compress. This mirrors the economics of Bitcoin mining: essential infrastructure, but not where outsized returns concentrate over time.

The Coordination Layer is the protocol that matches inference demand to supply. It handles scheduling, routing, quality assurance, uptime guarantees, and economic incentives. This is the layer that solves the hard problem: how do you create a reliable inference service from a permissionless network of independent GPU operators? The coordination layer sets the rules, enforces quality, and creates the economic flywheel that attracts both supply and demand. In decentralised systems, value has historically concentrated at this layer. Ethereum is the canonical example — validators commoditise, but the protocol captures value through fees and monetary premium. The coordination layer captures a toll on every unit of inference processed through the network.

The Access Layer sits between the protocol and end users. It provides API gateways, credit systems, developer tooling, and abstractions that make the underlying network usable. A well-designed access layer makes decentralised inference feel like a conventional API — developers interact with familiar endpoints and don't need to understand node selection, routing, or token mechanics. The access layer creates sticky demand. Once a developer has integrated an API, switching costs are real. Credit systems that prepay for inference capacity create even stronger lock-in, as users have an economic stake in the ecosystem's continued operation.

The Application Layer is what end users actually interact with — AI assistants, code generators, image creators, agents. This layer's value depends on the defensibility of the application itself, not the inference layer beneath it. Applications built on decentralised inference have a structural advantage in privacy and censorship resistance, but they must still compete on user experience, model quality, and product design.

The investment thesis is not to bet on the application layer, which is fragmented and competitive, or the hardware layer, which commoditises. The thesis is to own the coordination and access layers — the toll infrastructure that benefits from every unit of inference flowing through the network, regardless of which applications are built on top or which hardware operators serve the requests.

Venice: A Case Study in Coordination and Access

Venice is a privacy-preserving inference platform built on Base that embodies the dual-layer thesis described above. Co-founded by Erik Voorhees (former CEO of ShapeShift) and Teana Baker-Taylor (former VP at Circle), it has grown to over 1.3 million registered users and more than 50,000 daily active users, processing over 45 billion tokens daily across text, image, and code generation models. It serves both consumers through its web interface and developers and AI agents through its API.

What makes Venice architecturally interesting is its dual-token system, which maps directly onto the coordination and access layers of the value chain.

VVV is the coordination layer token. Staking VVV grants a pro-rata share of the network's daily inference capacity. If you stake 1% of all VVV, you control 1% of the network's compute. You do not pay per request. You own an ongoing share of the resource. Stakers also earn emissions-based yield, currently around 19% APR, though this fluctuates with network utilisation. VVV launched with a 100 million genesis supply, no presale, and no VC funding — 50% was airdropped to users and AI community projects on Base. The economic alignment is straightforward: as demand for Venice's inference capacity grows, demand for VVV grows, because staking it is the only way to access the network at zero marginal cost.

The deflationary mechanics reinforce this. In March 2025, approximately 32.6 million unclaimed airdrop tokens were permanently burned — roughly a third of the genesis supply removed in a single event. On top of this, Venice conducts monthly buyback-and-burns using protocol revenue, purchasing VVV on the open market and permanently removing it from circulation. Revenue-based burns began in December 2025 and have been executed every month since, with the cumulative total growing alongside platform revenue. Including the airdrop burn, over 33.7 million VVV — approximately 42.5% of the genesis supply — has been removed from circulation as of early March 2026.

Meanwhile, annual emissions have been reduced aggressively and repeatedly: from 14 million at launch (January 2025), to 10 million (August 2025), to 8 million (October 2025), to 6 million (February 2026). Further staged reductions are scheduled through mid-2026, bringing emissions down to 5 million in May, 4 million in June, and 3 million by July — a 50% cut from current levels in under four months. The trajectory points toward a crossover where annualised burns exceed annualised emissions, making VVV net deflationary. The coordination layer token becomes scarcer as the network it coordinates grows.

DIEM is the access layer token. Each DIEM provides $1 per day in Venice API credits, in perpetuity. DIEM is minted by locking staked VVV into the protocol, and the locked VVV continues to earn 80% of normal staking yield. DIEM is a tradeable ERC-20 token on Aerodrome, but its primary function is utility: it converts variable inference costs into a fixed, on-chain asset. A developer or AI agent holding DIEM has guaranteed, ongoing API access at a predictable cost, regardless of what happens to compute pricing in the broader market.

This is a genuinely novel design. DIEM transforms inference from a rented service into an ownable asset. It creates a tokenised compute credit market where developers can hedge against rising inference costs, agents can self-fund their own compute needs, and DeFi protocols could eventually collateralise DIEM to offer loans against future AI usage. The access layer becomes financialised in ways that are impossible with centralised API providers.

The dual-token flywheel works as follows: usage growth increases protocol revenue, which funds larger buyback-and-burns, which reduces VVV supply, which makes DIEM more expensive to mint (because minting requires locked VVV), which increases the value of existing DIEM, which attracts more developers who want to lock in inference access, which drives more usage. Each component reinforces the others.

The Macro Bet

The thesis does not require Venice — or any single protocol — to displace OpenAI or Google. The global inference market will likely exceed $250 billion by 2030. If decentralised inference captures even 2–3% of that market, the total addressable value flowing through decentralised coordination layers would be measured in billions of dollars per year.

The structural drivers pushing inference toward decentralisation are durable and strengthening:

Privacy regulation is tightening globally. GDPR, the EU AI Act, and emerging frameworks in Asia and Latin America are creating compliance burdens that favour architectures where user data never touches centralised servers. Decentralised inference, where prompts are encrypted and processed without retention, is not just a philosophical preference — it is becoming a compliance advantage.

AI agents need permissionless infrastructure. The agentic web — autonomous AI systems that transact, reason, and operate independently — cannot function on infrastructure that requires human authentication, enforces content policies designed for human users, or can be unilaterally shut down. Agents need inference that is always available, uncensored, and accessible through programmatic interfaces. Decentralised inference is structurally suited to this use case in ways that centralised providers are not.

The cost curve favours decentralisation over time. As hardware improves and open-weight models become more efficient, the cost of serving inference on distributed GPU networks declines. Meanwhile, centralised providers face rising costs from real estate, power infrastructure, compliance, and the engineering overhead of maintaining proprietary model advantages. The economic gap between centralised and decentralised inference narrows with each hardware generation.

Censorship resistance has genuine demand. This is not a libertarian abstraction. Researchers working on sensitive topics, journalists in authoritarian countries, medical professionals seeking frank AI guidance, and developers building applications that push against platform content policies all have practical demand for inference that cannot be filtered or surveilled. This demand exists today and will grow as AI becomes more deeply integrated into professional workflows.

The Risks

Any honest thesis must confront what could go wrong. The risks in decentralised inference are real and should not be dismissed.

The quality gap could widen. If frontier closed-source models pull dramatically ahead of open-weight alternatives, the performance case for centralised inference strengthens. The thesis depends on open-weight models remaining competitive for the majority of use cases. This has been the trend, but it is not guaranteed to continue.

Reliability and latency matter. Enterprise adoption requires service-level agreements. Decentralised networks face inherent challenges in guaranteeing uptime, latency, and consistent output quality across a permissionless set of hardware operators. If centralised providers simply offer a more reliable service, developers will pay the premium and accept the trade-offs.

Regulatory risk cuts both ways. While privacy regulation favours decentralised architectures, regulation targeting uncensored AI content could create headwinds. If governments mandate content filtering for AI inference, permissionless networks face a compliance challenge that centralised providers can more easily navigate.

Centralisation vectors exist within decentralised networks. Dominant node operators, concentrated governance, and single points of failure in smart contract infrastructure can erode the decentralisation thesis from within. A network that is decentralised in name but concentrated in practice offers the worst of both worlds.

Demand may not materialise at scale. The agentic web is still nascent. Privacy-sensitive enterprise adoption of decentralised inference is early. If the macro adoption curve is slower than expected, the economic flywheel that drives token burns and supply reduction may not spin fast enough to justify current valuations.

The Arxonic Position

Approximately 70% of the Arxonic portfolio is allocated to decentralised AI infrastructure through VVV and DIEM. This is a concentrated bet, and it is intentional.

The thesis is structural, not speculative. Inference demand will grow dramatically. A meaningful percentage of that demand will flow through infrastructure that is private, permissionless, and not controlled by a single entity. The value of owning coordination and access layer exposure to that flow is not yet priced into the market.

VVV provides coordination layer exposure. As the network it coordinates processes more inference, the token's supply shrinks through burns while demand for staking increases. The emission reduction schedule — dropping from 6 million to 3 million per year across May to July 2026 — compresses the timeline to net deflation.

DIEM is held for utility, not speculation. The Arxonic DIEM position generates daily API credits that subsidise all development work across multiple projects at zero marginal cost. The decision to hold rather than sell reflects a conviction that guaranteed, permissionless API access at a fixed rate is more valuable than the current market price implies.

The position would be reconsidered if: the quality gap between open-weight and closed-source models widens materially; Venice loses significant node operator capacity or user traction; a regulatory crackdown specifically targets permissionless inference in ways that cannot be navigated; or a competing protocol emerges with a demonstrably superior coordination mechanism and economic design.

Until one of those conditions is met, the thesis holds: inference will decentralise, value will concentrate in the coordination and access layers, and owning those layers early is the asymmetric bet that defines this portfolio.