ApiaryActive
Try: pause · settings · learn · wipe
DocsLLM Knowledgefine-tuning-explained

Fine-tuning, explained

Candidate
Last updated 2026-05-21 · source: claude-conversation-2026-05-21

Fine-tuning, explained

Status: Candidate — awaiting founder verification. Why this page exists: Fine-tuning sounds expensive and exotic. For Apiary's scale, it isn't either — but the terminology matters.

TL;DR

Fine-tuning is the process of adjusting a pre-trained LLM's weights using your own data so it speaks in your voice or specializes in your domain. For an indie developer with a few thousand high-quality examples, a fine-tune of an open model like Llama 3.1 8B costs $30–$500 and takes a few hours on a rented GPU.

Training vs inference — the foundational distinction

Most people conflate two very different things:

  • Training is creating or updating the model. Expensive (millions of dollars to train Claude from scratch; hundreds to fine-tune Llama). Done once or rarely. Requires beefy GPU clusters.
  • Inference is using the trained model to answer a prompt. Cheap (pennies per query). Done constantly. Can run on a laptop for small models.

When you "use ChatGPT," you're paying for inference. When OpenAI trained GPT-4, that was training. The two have completely different cost structures and hardware profiles.

What you rent: GPUs, not CPUs

CPUs are for general computing. GPUs are purpose-built for the matrix multiplication that AI training and inference do. Fine-tuning on CPU would take weeks; on GPU it takes hours.

Specifically — NVIDIA H100 or A100 cards. The same chips that trained Claude and Llama in the first place.

Where you rent GPUs (the actual playground)

Indie-friendly (cheap, devs love them):
  🟢 RunPod         $0.50-3/hr per H100, easy UI, hourly billing
  🟢 Vast.ai        $0.30-2/hr (peer-to-peer, sometimes spottier)
  🟢 Modal          serverless, pay per second, great for short jobs
  🟢 Together.ai    fine-tuning as a service (zero infra setup)

Specialized (handle it for you):
  🟢 Replicate          managed fine-tuning, "drop in data + go"
  🟢 OpenPipe           specifically for OpenAI/Llama fine-tunes
  🟢 Predibase          enterprise-y, low-code interface

Corporate (overkill for indie scale):
  🟡 AWS / GCP / Azure  expensive, reliable, big-shop default
  🟡 Lambda Labs        H100 clusters, premium pricing

Real numbers for indie scale

Fine-tuning Llama 3.1 8B on ~50,000 example pairs:
  • LoRA fine-tune (cheap):     ~$30-100 on RunPod (4-8 hrs on H100)
  • Full fine-tune (better):    ~$200-600 (12-24 hrs on H100)

Fine-tuning Llama 3.1 70B (heavier, smarter result):
  • LoRA:                       ~$200-500
  • Full:                       ~$2,000-5,000 (multiple H100s)

For an Apiary-class project — an "apiary-llama" trained on the substrate's voice and decisions — $100-500 is realistic.

The seven steps, in order

1. PREPARE TRAINING DATA (the slowest part)
   Format logs as JSONL pairs:
     {"prompt": "Committee debates feature X...",
      "completion": "After review, the vote was 4-1 to ship..."}
   Tools: pandas, datasets library, manual cleanup.
   Time: days-to-weeks if you want quality.

2. PICK A BASE MODEL
   Llama 3.1 8B is the indie default (good baseline, fits on one H100).
   Available on Hugging Face for free download.

3. SPIN UP GPU INSTANCE
   RunPod: pick "H100 80GB", $1.50/hr, sign in, SSH access.
   Or use Together.ai's fine-tune API (zero infra).

4. RUN FINE-TUNING SCRIPT
   Tools: Unsloth (cheap), Axolotl (powerful), MLX (Apple Silicon).
   Watch the loss curve drop. Stop when it plateaus.

5. EVALUATE THE RESULT
   Run test prompts. Compare to vanilla Llama. Does it "speak Apiary"?
   If not — tune hyperparameters, try again.

6. QUANTIZE + PACKAGE
   Convert to .gguf format (the file Ollama uses).
   A 16-bit 8B model (~16GB) becomes ~4GB in 4-bit quantization.

7. DISTRIBUTE
   Upload to Hugging Face. Add to Ollama registry.
   Users run: `ollama pull your-name/apiary-llama`.

What the result is called

  • The technique — "fine-tuning." Variants: full fine-tune, LoRA, QLoRA, instruction tuning.
  • The output model — "domain-adapted model" or just "your-name-llama."
  • The lineage — derivative work. Llama is licensed for commercial use under the Llama 3 Community License. Your fine-tuned weights are yours.

Open-source release vs. private

OPEN-SOURCE RELEASE
  • Publish weights publicly on Hugging Face.
  • Anyone can download, use, modify.
  • Marketing flywheel — devs talk about it, blog posts, etc.
  • Free distribution = visibility multiplier.
  • Llama's license permits this.

PRIVATE / PROPRIETARY
  • Keep weights private.
  • Charge users for access (cloud API or signed model files).
  • Smaller audience, higher per-user price.

HYBRID (recommended for Apiary)
  • Open-source the community model as marketing.
  • Per-user custom fine-tunes as the paid tier.
  • Best of both: visibility AND revenue.

Related

Source quotes

"Fine-tuning Llama 3.1 8B on roughly 50,000 example pairs: LoRA on RunPod is hours and tens-to-low-hundreds of dollars. Full fine-tune is a day and a few hundred. For Apiary's likely scale, hundreds, not thousands."
Candidate. This page was seeded from a building-session conversation and has not yet been founder-verified. The shape is right; the wording is a draft. Once Austin reads + stamps, the status flips to verified and the page becomes canonical.