What is an agent harness, really?

Strip away the marketing and an agent harness is one thing: a loop that turns a single-shot model into a system that can act.

May 19, 2026 · 6 min read

The model is not the agent

People often say "the agent" when they mean "the model." They are not the same thing. A large language model, taken on its own, is a function: tokens in, tokens out, one pass. It cannot remember what it said yesterday, cannot click a link, cannot try again when it was wrong. An agent is what you get when you wrap that function in code that lets it do those things.

That wrapper is the harness. It is the part nobody screenshots. It is also the part that turned LLMs from clever autocomplete into systems that book flights, write code, and run multi-step research. The model is the engine; the harness is the rest of the car.

The four moving parts

Most harnesses, regardless of branding, are some combination of four ideas. First, structured reasoning: prompting the model to think step by step before answering. Wei et al. 2022 showed that simply asking a model to "think" in intermediate steps — chain-of-thought prompting — improves accuracy on multi-step problems compared to asking for a direct answer. The model is the same; the scaffold around it changes the result.

Second, tool use: giving the model a list of actions it can take and a syntax for taking them. Search the web, run a calculator, query a database, write a file. The model emits a tool call, the harness executes it, the result goes back to the model.

Third, memory: a place to put what was said, decided, or observed earlier, so future turns can refer to it without being limited by the context window. This can be as simple as appending to a list or as elaborate as a vector index.

Fourth, self-reflection: letting the model look at its own output, criticise it, and try again. This is often where the biggest gains live — and the biggest token bills.

ReAct: reasoning and acting in the same loop

The cleanest formulation of a modern harness pattern is ReAct, from Yao et al. 2023. The idea is small but consequential: instead of separating "think" from "do," interleave them. The model writes a thought, then an action, then observes the result, then writes another thought, and so on.

What sounds like a stylistic choice turns out to matter. In the tasks studied in the paper, interleaving reasoning traces with environment interactions tended to outperform either reasoning alone (chain-of-thought) or acting alone. The model could correct its plan after seeing real results, instead of charging through a plan it had committed to before any evidence arrived.

Most production agent frameworks today — whether or not they use the name — are variations on this loop.

Why the loop is doing most of the work

It is tempting to think bigger models simply absorb these capabilities. They do not, at least not entirely. A model with no harness still cannot use a tool it has never been shown. It cannot remember a number you told it ten minutes ago if the context window has rolled over. It cannot try twice.

The Wang et al. 2023 survey on LLM-based autonomous agents catalogues how different systems decompose this problem — profile, memory, planning, action — and it makes the same point in another form: agent behaviour is an architecture, not a model property. Swap the model and the architecture still has to do its job.

This is why two products built on the same base model can feel completely different. The model sets the ceiling. The harness decides how close you get to it.

What a harness is not

A harness is not a prompt. A prompt is a single instruction; a harness is the machinery that runs the model many times, decides what to feed it next, and stitches the outputs into a coherent trajectory. A harness is not a "personality" either — that is just prompt content. And a harness is not a fine-tune. Fine-tuning changes the weights; a harness leaves the weights alone and changes what is around them.

The useful mental model: the model is a phone call to a very smart but forgetful expert. The harness is your notebook, your calendar, your filing cabinet, and the colleague who reads back what the expert said and asks "are you sure?"

Why this matters for on-device AI

If most of the work lives in the loop, model size matters less than people assume. A modest local model wrapped in a thoughtful harness can do work that a much larger model with no harness cannot. The model is small. The harness is yours. The data stays on the device.

References

Wei, J., et al. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903. https://arxiv.org/abs/2201.11903
Yao, S., et al. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arxiv:2210.03629. https://arxiv.org/abs/2210.03629
Wang, L., et al. 2023. A Survey on Large Language Model based Autonomous Agents. arxiv:2308.11432. https://arxiv.org/abs/2308.11432