# Use Pod — Inference Marketplace

> Use Pod is the clearing house for the inference economy. Independent operators run open-weight models on their own GPUs, set prices, and earn USDC. Users send standard OpenAI- or Anthropic-compatible requests; the marketplace matches them with the best-priced provider. Centralized providers stay available as a tier-zero fallback.

## Status

- **Phase:** v2.0 marketplace, in active development.
- **Whitepaper:** v0.6 — see [/llms-full.txt](/llms-full.txt) (Markdown) or [/whitepaper.pdf](/whitepaper.pdf) (typeset PDF).
- **Last updated:** 2026-05-12.

This is a marketplace play first. Privacy, attestation, and token-economic mechanisms are real engineering tracks on the roadmap, but they come after density, not before. eBay shipped before PayPal escrow. Uber shipped before standardized driver checks. Airbnb shipped before ID verification. Use Pod follows the same sequence.

## What it is

A two-sided marketplace for AI inference, settled in USDC on Solana. The protocol is OpenAI- and Anthropic-compatible at the wire level — a user with an existing client (Claude Code, Cursor, custom agent, LangChain pipeline) changes a base URL and is done. No new SDK, no new auth scheme, no monthly subscription.

## Architecture

Three components in v2.0:

1. **Coordinator** (`services/api`, Rust/axum) — matching engine, settlement layer, and SSE relay between users and providers.
2. **Provider agent** — standalone Rust binary each operator runs. Auto-detects local backends (vLLM, llama.cpp, LM Studio, Ollama) and supports BYOK upstream proxies (OpenRouter, Venice, Together, Groq, Morpheus). No inbound network port required; the agent is fully outbound.
3. **Demand-side proxy** — requests enter at `/proxy/{token}/v1/messages` (or the OpenAI equivalent), are matched to a qualifying provider, and the streaming response is relayed back to the user's SSE stream.

User and provider balances update in a single PostgreSQL transaction per request. All amounts in USDC microunits (1 USDC = 1,000,000 microunits), integer math, no floats in the billing path.

## Phased roadmap

| Phase | Scope | Status |
|---|---|---|
| v1.x | Centralized proxy + USDC billing | shipped |
| v2.0 | Marketplace, provider agents, lightweight trust, USDC settlement | in active development |
| v2.x | Polish — agent auto-update, suggested-pricing engine, x402/MPP when the ecosystem supports them | planned |
| v3.0 | Verified TEE attestation (Intel TDX, AMD SEV-SNP, AWS Nitro, NVIDIA Confidential Computing); content-addressed model registry; first security audit | planned |
| v4.0 | E2EE prompt delivery (X25519 + ChaCha20-Poly1305); onion routing via permissionless relay network; region/residency filters; Stake-for-Access enterprise tier with SLAs | planned |
| v5.0 | Token-economy mechanisms — only if a concrete utility justifies them; multi-agent coordination; cross-chain payments | conditional |

Standing rule across all phases: nothing in v3+ ships before v2 has demonstrated meaningful two-sided traction.

## Supply side — node operator requirements

Three categories at v2.0:

- **Datacenter operators** — A100, H100, B200 in colocation or home lab; running inference as a business.
- **Idle hardware operators** — 4090, 5090, Mac Studio (M-series) during off-hours; sunk-cost hardware earning passive USDC.
- **BYOK proxies** — operators with paid OpenRouter, Venice, Together, Groq, or Morpheus accounts who resell with a configurable markup; no hardware investment required.

Onboarding (≤10 minutes start to first served request):

1. `curl https://usepod.ai/install.sh | sh`
2. `usepod-agent enroll` — generates an Ed25519 keypair on the host and prints an enrollment code
3. Paste the enrollment code into the host UI at usepod.ai/host
4. Send **$50 USDC bond** to the on-chain bond address using the enrollment-specific deposit code
5. Configure pricing per model in the host UI

Trust layer by phase:

- **v2.0** — tokenizer-side count enforcement (coordinator re-tokenizes; provider counts are advisory), reputation scoring (recomputed every 60s, drives routing weight), hidden benchmark canaries (~1% of traffic, statistical comparison to reference outputs), the $50 bond (seized on confirmed fraud, released after a 90-day cooldown on graceful retirement).
- **v3.0** — TEE attestation reports as an opt-in `verified=true` flag with routing-weight boost and a "verified-only" routing mode for users who request it.
- **v4.0** — Stake-for-Access enterprise tier with reserved capacity and SLAs; formal slashing for SLA violations.

There is no hardware attestation requirement in v2.0. SGX / SEV-SNP / TDX is a v3 opt-in, not a v2 baseline.

## Demand side — developer SDK

Drop-in. Existing OpenAI/Anthropic clients work unchanged:

```bash
ANTHROPIC_BASE_URL=https://api.usepod.ai/proxy/<token> claude
OPENAI_BASE_URL=https://api.usepod.ai/proxy/<token>/v1 cursor
```

Optional per-request price ceilings:

```
X-Pod-Max-Price-Input: 400000      # max 0.40 USDC per million input tokens
X-Pod-Max-Price-Output: 600000     # max 0.60 USDC per million output tokens
```

If no marketplace provider meets the ceiling, the request falls through to the centralized router (Anthropic, OpenAI, Venice, Together, Groq, OpenRouter, Bedrock). If the centralized price also exceeds the ceiling, the server returns HTTP 402 with a structured body indicating the lowest available price; the client can retry with a higher ceiling, route to a cheaper model, or surface the price to the user.

Default behaviour with no headers: marketplace consulted first, centralized router as fallback. Streaming, tool use, and vision all work unchanged.

**x402 (Coinbase) and MPP (Stripe / Tempo)** are not implemented at v2.0 — the major AI clients (OpenAI SDK, Anthropic SDK, Claude Code, Cursor, LangChain, LlamaIndex, CrewAI) do not consume them yet. Use Pod will light up these paths the day a real ecosystem use case appears, not before.

## Revenue model (v2.0)

- **80/20 split.** On every request served by a marketplace operator, the operator receives **80%** of the gross inference fee and the Use Pod treasury receives **20%**.
- **No routing fees.** No payment-processor cuts. No tiered rates. No hidden math.
- **No token-based revenue share.** No staking yield. No "percentage of token supply." The 20% take on inference fees is the entire monetization mechanism.
- **Comparison:** existing inference clearinghouses (notably OpenRouter) operate at a higher take rate than 20% with less transparency about the split. Use Pod's 80/20 is the most operator-favorable split among comparable platforms because supply density is the v2.0 strategic priority.

Speculative value is absent from the model by design. Higher inference volume produces more treasury revenue. Higher operator earnings produce more supply density. Both grow together.

## Tokenomics

**v2.0 is deliberately tokenless.** No native token, no halving schedule, no allocation, no staking yield, no revenue share by token. The system runs entirely on USDC and Solana.

v5.0 may introduce a native token if and only if a concrete utility justifies it (governance, slashing collateral, reserved-capacity prepayment). The bar is "real engineering need," not "marketing benefit." Standing rule in the roadmap: no token-economy features ship before v2 has demonstrated meaningful two-sided traction.

If you are evaluating Use Pod against protocols that lead with a token launch — that is not the same product.

## Venice.ai comparison

Venice.ai and Use Pod are aligned on the thesis that AI inference deserves a privacy story that doesn't depend on trusting a single operator's privacy policy. They diverge on sequencing and on how privacy is enforced.

- **Venice.ai** ships privacy at launch as a single-operator product. It's vertically integrated: Venice owns the operator role. Privacy in this model is good-faith attestation, not a cryptographic guarantee — a user trusts that Venice will not log, retain, or analyze, and the trust mechanism is corporate policy.
- **Use Pod** is marketplace-first. v2.0 does not ship privacy as a launch feature because the marketplace does not yet have the density to justify the operational complexity. Privacy in Use Pod becomes a *cryptographic* guarantee at v4.0 via E2EE delivery to TEE-attested nodes (v3.0) over onion-routed relays. The trust mechanism is hardware attestation + transport encryption, not corporate policy.

Why the "Base was wrong" framing from earlier drafts is misleading: the issue isn't a specific L2 — it's that single-operator privacy trades user reach for ideological purity. Use Pod's bet is that marketplace density plus layered cryptographic trust earns more durable privacy than any single operator's promise — but only after the marketplace works.

## About the architect

**Christopher Ryan Gilbert** — Founder and Principal Engineer at 100X CTO, LLC ([100x.dev](https://100x.dev)).

Career at the intersection of high-throughput distributed systems and cryptographic infrastructure, including engineering leadership at **Elixir Protocol** on decentralized orderbook infrastructure — systems that must maintain consistency, resist manipulation, and operate at financial-grade reliability under adversarial conditions. Every assumption is an attack surface, every centralized component is a future liability, latency is a product decision, and traction precedes trust mechanisms in the order of operations.

100X CTO, LLC operates as a **focused team of one** with external collaborators on specific components. The trade is speed and architectural coherence for organizational scale: a single vision executed without committee, with the full context of every design decision in one head.

The earlier v0.5 draft of this whitepaper led with privacy and tokenomics. v0.6 reorders the sequence: marketplace first, trust mechanisms layered on top in the order density demands. The reorder is the contribution.

## Agent skills

Two skills, demand-side and supply-side. Each is fetchable directly from the web (no plugin install required) and also ships as a Claude Code plugin under `plugins/usepod/skills/` in the source repo. Both stay in sync via `make sync-skill`.

### Client onboarding (demand side)

For AI agents that want to onboard their end users onto Use Pod — register a token, monitor for the first USDC deposit, emit the per-harness wiring snippet:

- **Entry point:** [/skill/client-onboard/SKILL.md](/skill/client-onboard/SKILL.md) — 4-step workflow: register → display funding dashboard → poll for deposit → ask which harness and emit the matching snippet.
- **API reference:** [/skill/client-onboard/references/api.md](/skill/client-onboard/references/api.md) — `POST /v1/register`, `GET /proxy/{token}/balance`, pricing-ceiling headers.
- **Harness snippets:** [/skill/client-onboard/references/harness-snippets.md](/skill/client-onboard/references/harness-snippets.md) — Claude Code, Cursor, OpenAI Python/Node, Anthropic Python/Node, LangChain, LangGraph, Continue.dev, Cline, Aider, Hermes, Codex CLI, raw curl.
- **Scripts:** [/skill/client-onboard/scripts/register.sh](/skill/client-onboard/scripts/register.sh) (mints a token), [/skill/client-onboard/scripts/wait-for-funding.sh](/skill/client-onboard/scripts/wait-for-funding.sh) (polls until funded).

### Host onboarding (supply side)

For operators (or agents acting for them) standing up a GPU host that earns USDC by serving inference:

- **Entry point:** [/skill/host-onboard/SKILL.md](/skill/host-onboard/SKILL.md) — 9-step workflow: preflight → install agent → author `agent.toml` → enroll & pair → post the $50 USDC bond (needs SOL for gas) → run under systemd → health monitoring → upgrades → uninstall.
- **API & config reference:** [/skill/host-onboard/references/api.md](/skill/host-onboard/references/api.md) — host-side endpoints (`/v1/host/enroll`, `/v1/host/pair/*`, `/v1/host/balance`, `/v1/host/withdraw`) and the full `agent.toml` schema.
- **Inference backends:** [/skill/host-onboard/references/inference-backends.md](/skill/host-onboard/references/inference-backends.md) — per-backend setup for vLLM (NVIDIA/AMD ROCm), Ollama (cross-platform), llama.cpp / LM Studio (CPU-friendly), MLX for Apple Silicon, BYOK upstreams.
- **systemd / supervisors:** [/skill/host-onboard/references/systemd.md](/skill/host-onboard/references/systemd.md) — canonical unit file with hardening flags, plus Docker, launchd, NSSM equivalents.
- **Health monitoring:** [/skill/host-onboard/references/health.md](/skill/host-onboard/references/health.md) — Prometheus metrics dictionary, log patterns, alert thresholds.
- **Upgrades:** [/skill/host-onboard/references/upgrade.md](/skill/host-onboard/references/upgrade.md) — safe upgrade with rollback on non-zero exit, canary pattern for fleets.
- **Uninstall:** [/skill/host-onboard/references/uninstall.md](/skill/host-onboard/references/uninstall.md) — graceful retirement (90-day cooldown, bond returns) vs. immediate teardown.
- **Scripts:** [/skill/host-onboard/scripts/preflight.sh](/skill/host-onboard/scripts/preflight.sh) (GPU + backend + network checks), [/skill/host-onboard/scripts/health-check.sh](/skill/host-onboard/scripts/health-check.sh) (one-screen liveness probe).

## Key links

- [/llms-full.txt](/llms-full.txt) — full whitepaper as Markdown (canonical long-form content)
- [/whitepaper.pdf](/whitepaper.pdf) — typeset PDF
- [/skill/client-onboard/SKILL.md](/skill/client-onboard/SKILL.md) — demand-side onboarding skill
- [/skill/host-onboard/SKILL.md](/skill/host-onboard/SKILL.md) — supply-side onboarding skill
- [https://usepod.ai/](https://usepod.ai/) — site
- [https://100x.dev](https://100x.dev) — contact / architect