Fine-tuning, explained
Status: Candidate — awaiting founder verification. Why this page exists: Fine-tuning sounds expensive and exotic. For Apiary's scale, it isn't either — but the terminology matters.
TL;DR
Fine-tuning is the process of adjusting a pre-trained LLM's weights using your own data so it speaks in your voice or specializes in your domain. For an indie developer with a few thousand high-quality examples, a fine-tune of an open model like Llama 3.1 8B costs $30–$500 and takes a few hours on a rented GPU.
Training vs inference — the foundational distinction
Most people conflate two very different things:
- Training is creating or updating the model. Expensive (millions of dollars to train Claude from scratch; hundreds to fine-tune Llama). Done once or rarely. Requires beefy GPU clusters.
- Inference is using the trained model to answer a prompt. Cheap (pennies per query). Done constantly. Can run on a laptop for small models.
When you "use ChatGPT," you're paying for inference. When OpenAI trained GPT-4, that was training. The two have completely different cost structures and hardware profiles.
What you rent: GPUs, not CPUs
CPUs are for general computing. GPUs are purpose-built for the matrix multiplication that AI training and inference do. Fine-tuning on CPU would take weeks; on GPU it takes hours.
Specifically — NVIDIA H100 or A100 cards. The same chips that trained Claude and Llama in the first place.
Where you rent GPUs (the actual playground)
Indie-friendly (cheap, devs love them):
🟢 RunPod $0.50-3/hr per H100, easy UI, hourly billing
🟢 Vast.ai $0.30-2/hr (peer-to-peer, sometimes spottier)
🟢 Modal serverless, pay per second, great for short jobs
🟢 Together.ai fine-tuning as a service (zero infra setup)
Specialized (handle it for you):
🟢 Replicate managed fine-tuning, "drop in data + go"
🟢 OpenPipe specifically for OpenAI/Llama fine-tunes
🟢 Predibase enterprise-y, low-code interface
Corporate (overkill for indie scale):
🟡 AWS / GCP / Azure expensive, reliable, big-shop default
🟡 Lambda Labs H100 clusters, premium pricingReal numbers for indie scale
Fine-tuning Llama 3.1 8B on ~50,000 example pairs:
• LoRA fine-tune (cheap): ~$30-100 on RunPod (4-8 hrs on H100)
• Full fine-tune (better): ~$200-600 (12-24 hrs on H100)
Fine-tuning Llama 3.1 70B (heavier, smarter result):
• LoRA: ~$200-500
• Full: ~$2,000-5,000 (multiple H100s)For an Apiary-class project — an "apiary-llama" trained on the substrate's voice and decisions — $100-500 is realistic.
The seven steps, in order
1. PREPARE TRAINING DATA (the slowest part)
Format logs as JSONL pairs:
{"prompt": "Committee debates feature X...",
"completion": "After review, the vote was 4-1 to ship..."}
Tools: pandas, datasets library, manual cleanup.
Time: days-to-weeks if you want quality.
2. PICK A BASE MODEL
Llama 3.1 8B is the indie default (good baseline, fits on one H100).
Available on Hugging Face for free download.
3. SPIN UP GPU INSTANCE
RunPod: pick "H100 80GB", $1.50/hr, sign in, SSH access.
Or use Together.ai's fine-tune API (zero infra).
4. RUN FINE-TUNING SCRIPT
Tools: Unsloth (cheap), Axolotl (powerful), MLX (Apple Silicon).
Watch the loss curve drop. Stop when it plateaus.
5. EVALUATE THE RESULT
Run test prompts. Compare to vanilla Llama. Does it "speak Apiary"?
If not — tune hyperparameters, try again.
6. QUANTIZE + PACKAGE
Convert to .gguf format (the file Ollama uses).
A 16-bit 8B model (~16GB) becomes ~4GB in 4-bit quantization.
7. DISTRIBUTE
Upload to Hugging Face. Add to Ollama registry.
Users run: `ollama pull your-name/apiary-llama`.What the result is called
- The technique — "fine-tuning." Variants: full fine-tune, LoRA, QLoRA, instruction tuning.
- The output model — "domain-adapted model" or just "your-name-llama."
- The lineage — derivative work. Llama is licensed for commercial use under the Llama 3 Community License. Your fine-tuned weights are yours.
Open-source release vs. private
OPEN-SOURCE RELEASE
• Publish weights publicly on Hugging Face.
• Anyone can download, use, modify.
• Marketing flywheel — devs talk about it, blog posts, etc.
• Free distribution = visibility multiplier.
• Llama's license permits this.
PRIVATE / PROPRIETARY
• Keep weights private.
• Charge users for access (cloud API or signed model files).
• Smaller audience, higher per-user price.
HYBRID (recommended for Apiary)
• Open-source the community model as marketing.
• Per-user custom fine-tunes as the paid tier.
• Best of both: visibility AND revenue.Related
Source quotes
"Fine-tuning Llama 3.1 8B on roughly 50,000 example pairs: LoRA on RunPod is hours and tens-to-low-hundreds of dollars. Full fine-tune is a day and a few hundred. For Apiary's likely scale, hundreds, not thousands."