ApiaryActive
Try: pause · settings · learn · wipe
← Community / Reading Room
AK
People in AI · 8 min read

Karpathy and the Decade of Agents

The clearest voice in AI just told everyone to slow down — and why that's bullish.

The Decade, Not the Year

Every cycle of artificial intelligence produces a phrase that everyone repeats until it loses all meaning, and in 2025 that phrase was "the year of agents." Andrej Karpathy, a founding member of OpenAI and the former Director of AI at Tesla, spent a roughly two-and-a-half hour conversation with Dwarkesh Patel late in the year pushing back on exactly that framing. His correction was surgical and, to my ear, deeply bullish if you read it correctly: 2025 is not the year of agents, it is the start of the decade of agents. He also argued that AGI is still something like a decade away. That is not a man talking down the technology. That is a man who has built it telling you how much road is left.

I find this worth a long look because Karpathy is not a hype merchant and he is not a doomer. He now runs Eureka Labs, an AI education venture, and he spends his time actually writing code and teaching how these systems work from first principles. When someone with that resume says the flashes of intelligence are real but the surrounding infrastructure is not there yet, investors should listen differently than they listen to a keynote stage. The gap he describes is not a marketing gap. It is an engineering gap, and engineering gaps get closed by money, compute, and time, on the order of years.

2025 is not the year of agents. It is the first year of the decade of agents.

Before going further I want to clear up a mixup I see constantly, because getting the facts right is the whole point of this publication. Karpathy did not create Ollama. The tool that lets people run language models locally on their own machines was built by Jeffrey Morgan and Michael Chiang. Karpathy's influence on the local, hackable, run-it-yourself model culture is real and large, and his teaching has pushed countless engineers toward building these systems from scratch. But he did not author that specific project, and pretending otherwise is the kind of sloppy attribution that erodes trust. State it plainly, then move on.

Brilliant Interns With Amnesia

The most useful image from the Dwarkesh interview is Karpathy's description of today's agents as brilliant interns with amnesia. They can produce dazzling work in a flash, then forget everything the moment the session ends. There is no durable memory, almost no continual learning, the multimodal perception is shaky, and the ability to actually operate a computer and act in the world is weak. The intelligence is genuine and intermittent; the surrounding machinery that would let it accumulate experience, persist what it learns, and reliably act on its own is simply not built yet. Closing that gap, he argues, is a decade of grinding work rather than a product launch.

This connects directly to one of the six paradigm shifts he laid out in his 2025 "LLM Year in Review," the idea he calls jagged intelligence. His framing is that animals are shaped by the hard constraint of survival, while large language models are summoned ghosts, optimized for imitation, for rewards, for benchmarks. Because of that, their capability is jagged: superhuman in narrow pockets and embarrassingly broken on things a child handles. A system can write a working program and then fail a question about how many letters are in a word. That jaggedness is not a bug that ships away in the next release; it is a property of how these things are made.

A ghost can dazzle you in one sentence and faceplant in the next. That is the technology, not a glitch.

The second big shift he named is the rise of RLVR, Reinforcement Learning from Verifiable Rewards, which became a major new training stage in 2025. By rewarding models for answers that can be automatically verified as correct, RLVR pushes them to develop reasoning-like strategies rather than just pattern-matching. This is real progress and it is why the reasoning models feel sharper this year. But notice the word verifiable. The technique works best precisely where correctness can be checked mechanically, which is another way of saying the jaggedness has a shape: strong where the world is checkable, weak where it is open and ambiguous. That shape is not a detail. It is the map of where this technology can be trusted today and where it cannot.

Three Eras of Software

To understand where the durable value is going, it helps to use Karpathy's own map of how software itself is changing, which he laid out in his YC "AI Startup School" keynote titled "Software Is Changing (Again)." In his telling there are three eras. Software 1.0 is hand-written code, the explicit instructions humans have authored for seventy years. Software 2.0 is neural network weights, where behavior is learned from data rather than written by hand. And Software 3.0 is English prompts steering a large language model, where natural language becomes the programming interface.

The third era is where 2025's most concrete shifts live, and the rest of his six shifts cluster here. Karpathy points to Anthropic's Claude Code as the first genuinely functional LLM agent, a ghost on your computer that runs on localhost with real access to the development environment. That is a meaningful milestone, and it pairs with what he calls vibe coding: in 2025, anyone can build software by describing what they want in plain language. He also notes the LLM interface itself is maturing, generating text and images together instead of dumping walls of text at the user. These are not small things. They are the application layer growing up around the raw model.

On that application layer Karpathy draws a distinction I think is badly underrated by investors. The foundation models, he says, are maturing into generally capable college students, broadly competent across many subjects but not deeply expert in any single one. The real value accrues when specialized applications turn that raw capability into deployed professionals inside specific verticals, the way a graduate becomes an actual practitioner in a real job with real accountability. The college student is impressive; the deployed professional is what a business actually pays for. That gap between general capability and deployed professional is exactly the gap that takes a decade and a great deal of focused engineering to cross, and it is where I think most of the durable software value of this cycle will be built.

What nanochat Taught Him

Here is the part of the year that I keep coming back to, because it is the most honest data point of all. Karpathy built nanochat, a from-scratch minimal ChatGPT clone, the kind of clean novel codebase that does not look like the ten thousand tutorials scraped off the internet. And in building it he found the coding agents were, in his words, not net useful. They kept misreading his non-standard code because they have too much memory from all the typical ways of doing things on the internet, and they were asymmetrically worse at genuinely novel work, padding the project with defensive boilerplate it did not need.

The agents were worst exactly where the work was most original. That is the tell.

Sit with that, because it is the whole thesis in miniature. The agents shine when the task resembles the average of everything they have seen, and they degrade precisely when the work is original. They are, in effect, drawn toward the most common pattern on the internet even when the job in front of them is uncommon. For an investor that is a flashing sign about where autonomous agents create durable economic value today, and it is not in the frontier, novel, high-judgment work that commands the highest prices. It is in the well-trodden middle. The frontier still needs a human holding the wheel, and it will for a while yet.

This is exactly why, at the desk, we have made the architectural choices we have, and I will be blunt about them because they follow directly from Karpathy's evidence. We use focused tool-models for judgment rather than handing the keys to autonomous agents, because the jagged-intelligence problem means an unsupervised agent will eventually faceplant in a way that costs real money. We pull real prices from a data feed, never from a model's guess, because a ghost with amnesia and shaky perception is the last thing you want generating a number a human will trade on. The intelligence is a tool inside a guarded process, not the process itself. That is not caution for its own sake; it is the only architecture his own evidence actually supports.

The Compute-and-Power Trade

Now the part you actually came for: if Karpathy is right that real agents are a decade out, what is the durable investment read? My answer is that the surest winners are not the ones promising the agent revolution next quarter. They are the companies supplying the picks and shovels for a ten-year grind: the compute and the power that the long slog runs on. A decade of training runs, RLVR stages, larger models, and continual-learning research is a decade of relentless, growing demand for AI infrastructure and the electricity to feed it. The demand does not pause for any single disappointing launch.

That points me toward the unglamorous physical layer: AI infrastructure broadly, and underneath it, power generation, where I keep my eye on nuclear and SMR and on the electrical grid itself. You cannot run a decade of compute on slogans. You run it on watts, and the demand curve for those watts does not blink even if a given model launch disappoints. The thesis does not depend on any single company winning the model race; it depends on the race being long, expensive, and energy-hungry, which is precisely what Karpathy is telling us it will be.

The beauty of the decade-of-agents framing for a patient investor is that it converts hype risk into something more boring and more durable. If you bet on the agent that will replace all knowledge workers by next summer, you are exposed to every missed milestone, every jagged failure, every demo that does not survive contact with reality. If you bet instead on the compute and the power that every serious participant must rent and burn regardless of who wins, you are insulated from the specific winner and exposed instead to the aggregate effort. Karpathy is describing a long, capital-intensive grind, and long capital-intensive grinds reward the suppliers of the inputs more reliably than they reward the contestants.

So the takeaway is not that AI is overhyped or underhyped; it is that the timeline matters more than the temperature. The flashes are real. The interns are brilliant. But they have amnesia, the perception is shaky, and the road to genuine agents is a decade of grinding work, not a launch event. Position for the road, not the ribbon-cutting.

Bet on the watts and the silicon the whole decade must rent, not on the one ghost that wins.

This article is opinion and content from the author. It is not financial, investment, or any other professional advice, and the author and publication accept no liability for decisions made based on it. Do your own research.

From the Apiary Reading Room. Opinion & editorial — not financial advice. We don't overclaim.
More from the Reading Room