ApiaryActive
Try: pause · settings · learn · wipe
DocsFoundationsbyo-llm-explained

BYO-LLM, explained

Candidate
Last updated 2026-05-21 · source: claude-conversation-2026-05-21

BYO-LLM, explained

Status: Candidate — awaiting founder verification. Why this page exists: New users ask "where's the API key field?" — this is the long answer.

TL;DR

Apiary doesn't ship with a baked-in LLM key. You bring your own. That decision has three effects: no token surprises, no provider lock-in, and no centralized key that could leak. The tradeoff is one extra setup step. We think it's worth it.

The deeper read

Almost every AI tool you've used has a hidden cost: somebody is paying for the inference, and eventually that bill becomes a subscription, a rate limit, or a quality downgrade. Apiary refuses that pattern. The substrate is open, the orchestration is open, and the LLM is yours. You pick the brain, you pay (or don't pay) for the tokens, and you can swap providers without re-installing anything.

This is BYO-LLM: Bring Your Own Large Language Model.

The five paths

There are more options than just "pasting a Claude or OpenAI key." Apiary supports the full menu:

Tier 1 — Paid API keys (most common)

  • Anthropic (Claude)sk-ant-...
  • OpenAI (GPT)sk-...
  • Google (Gemini)AI...

User signs up, adds a card, gets a key, pastes it. Best quality, costs real money per token.

Tier 2 — Local LLM (no key, no cost, runs on your machine)

  • Ollama — download, start, done. No account.
  • LM Studio — Mac/Windows app with a GUI.
  • llama.cpp — command-line, more advanced.

Runs entirely on the user's hardware. No internet needed for inference. Free forever. Slower than cloud — your laptop versus a datacenter — but private.

Tier 3 — Aggregators (one key, access to many models)

  • OpenRouter — one key, access to Claude, GPT, Gemini, Llama, 100+ others. Pay-as-you-go.
  • Together.ai — open-source models hosted (Llama, Mistral, etc.).
  • Groq — ultra-fast inference, generous free tier.
  • Fireworks.ai, Replicate — similar aggregators.

All OpenAI-compatible endpoints. Apiary's "Custom (OpenAI-compatible)" path works with every one.

Tier 4 — In-browser models (zero key, zero install, limited quality)

  • WebLLM — runs small Llama / Phi / Mistral models right in the browser via WebGPU. No key, no server, no bill, ever. Limited to ~7B-parameter models that fit in browser memory.

The cleanest "zero-setup" path. Quality is below Claude or GPT, but legitimately free and private. Not yet enabled in Apiary — planned.

Tier 5 — Free trial credits

Most providers give starter credit ($1–$25). Useful to try without committing a card. You still need a key, just not yet a credit card.

The strategic picture

| Optimize for | Recommended path | |--------------|----------------------------------------| | Cheapest | Ollama or WebLLM (free) | | Easiest | OpenRouter ($1 + one key = everything) | | Best quality | Direct Anthropic (Claude Opus) | | Fastest | Groq (Llama variants) | | Most private | Local (Ollama / WebLLM) |

Why we won't bake one in

A baked-in key means:

1. Somebody pays the bill — either we hide it in a subscription, or we burn capital, or we throttle quality. 2. One leak is a catastrophe — a centralized key getting exfiltrated affects every user at once. 3. Provider lock-in — when the bill comes due, you can't switch without rewriting the product.

BYO-LLM dodges all three. The cost of "one extra setup step" is real, but the surface for cost surprises is zero.

Related

Source quotes

"Don't have an API key? You have options: run a free local model (Ollama, 5 min setup), or one key → access to all major models (OpenRouter, a dollar to start), or direct to Claude / GPT for best quality."
Candidate. This page was seeded from a building-session conversation and has not yet been founder-verified. The shape is right; the wording is a draft. Once Austin reads + stamps, the status flips to verified and the page becomes canonical.